linux-toradex.git - Linux kernel for Apalis and Colibri modules

Age	Commit message (Collapse)	Author
2012-01-12	Merge commit 'v3.2.1' into rt-3.2.1-rt9v3.2.1-rt9	Clark Williams

2012-01-12	cgroup: fix to allow mounting a hierarchy by name	Li Zefan
	commit 0d19ea866562e46989412a0676412fa0983c9ce7 upstream. If we mount a hierarchy with a specified name, the name is unique, and we can use it to mount the hierarchy without specifying its set of subsystem names. This feature is documented is Documentation/cgroups/cgroups.txt section 2.3 Here's an example: # mount -t cgroup -o cpuset,name=myhier xxx /cgroup1 # mount -t cgroup -o name=myhier xxx /cgroup2 But it was broken by commit 32a8cf235e2f192eb002755076994525cdbaa35a (cgroup: make the mount options parsing more accurate) This fixes the regression. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-05	Merge commit 'v3.2' into rt-3.2-rt9v3.2-rt9	Clark Williams

2012-01-04	ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach	Oleg Nesterov
	This is the temporary simple fix for 3.2, we need more changes in this area. 1. do_signal_stop() assumes that the running untraced thread in the stopped thread group is not possible. This was our goal but it is not yet achieved: a stopped-but-resumed tracee can clone the running thread which can initiate another group-stop. Remove WARN_ON_ONCE(!current->ptrace). 2. A new thread always starts with ->jobctl = 0. If it is auto-attached and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr) in do_jobctl_trap() if another debugger attaches. Change __ptrace_unlink() to set the artificial SIGSTOP for report. Alternatively we could change ptrace_init_task() to copy signr from current, but this means we can copy it for no reason and hide the possible similar problems. Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> [3.1] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-04	ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race	Oleg Nesterov
	Test-case: int main(void) { int pid, status; pid = fork(); if (!pid) { for (;;) { if (!fork()) return 0; if (waitpid(-1, &status, 0) < 0) { printf("ERR!! wait: %m\n"); return 0; } } } assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0); assert(waitpid(-1, NULL, 0) == pid); assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACEFORK) == 0); do { ptrace(PTRACE_CONT, pid, 0, 0); pid = waitpid(-1, NULL, 0); } while (pid > 0); return 1; } It fails because ->real_parent sees its child in EXIT_DEAD state while the tracer is going to change the state back to EXIT_ZOMBIE in wait_task_zombie(). The offending commit is 823b018e which moved the EXIT_DEAD check, but in fact we should not blame it. The original code was not correct as well because it didn't take ptrace_reparented() into account and because we can't really trust ->ptrace. This patch adds the additional check to close this particular race but it doesn't solve the whole problem. We simply can't rely on ->ptrace in this case, it can be cleared if the tracer is multithreaded by the exiting ->parent. I think we should kill EXIT_DEAD altogether, we should always remove the soon-to-be-reaped child from ->children or at least we should never do the DEAD->ZOMBIE transition. But this is too complex for 3.2. Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com> Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> [3.0+] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-03	hung_task: fix false positive during vfork	Mandeep Singh Baines
	vfork parent uninterruptibly and unkillably waits for its child to exec/exit. This wait is of unbounded length. Ignore such waits in the hung_task detector. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> LKML-Reference: <1325344394.28904.43.camel@lappy> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Kacur <jkacur@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-03	Merge commit 'v3.2-rc7' into rt-3.2-rc7-rt9	Clark Williams

2011-12-31	futex: Fix uninterruptible loop due to gate_area	Hugh Dickins
	It was found (by Sasha) that if you use a futex located in the gate area we get stuck in an uninterruptible infinite loop, much like the ZERO_PAGE issue. While looking at this problem, PeterZ realized you'll get into similar trouble when hitting any install_special_pages() mapping. And are there still drivers setting up their own special mmaps without page->mapping, and without special VM or pte flags to make get_user_pages fail? In most cases, if page->mapping is NULL, we do not need to retry at all: Linus points out that even /proc/sys/vm/drop_caches poses no problem, because it ends up using remove_mapping(), which takes care not to interfere when the page reference count is raised. But there is still one case which does need a retry: if memory pressure called shmem_writepage in between get_user_pages_fast dropping page table lock and our acquiring page lock, then the page gets switched from filecache to swapcache (and ->mapping set to NULL) whatever the refcount. Fault it back in to get the page->mapping needed for key->shared.inode. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-30	Revert "clockevents: Set noop handler in clockevents_exchange_device()"	Linus Torvalds
	This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c. It results in resume problems for various people. See for example http://thread.gmane.org/gmane.linux.kernel/1233033 http://thread.gmane.org/gmane.linux.kernel/1233389 http://thread.gmane.org/gmane.linux.kernel/1233159 http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877 and the fedora and ubuntu bug reports https://bugzilla.redhat.com/show_bug.cgi?id=767248 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569 which got bisected down to the stable version of this commit. Reported-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Phil Miller <mille121@illinois.edu> Reported-by: Philip Langdale <philipl@overt.org> Reported-by: Tim Gardner <tim.gardner@canonical.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <gregkh@suse.de> Cc: stable@kernel.org # for stable kernels that applied the original Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-28	kconfig-preempt-rt-full.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	wait-simple: Simple waitqueue implementation	Thomas Gleixner
	wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sysrq: Allow immediate Magic SysRq output for PREEMPT_RT_FULL	Frank Rowand
	Add a CONFIG option to allow the output from Magic SysRq to be output immediately, even if this causes large latencies. If PREEMPT_RT_FULL, printk() will not try to acquire the console lock when interrupts or preemption are disabled. If the console lock is not acquired the printk() output will be buffered, but will not be output immediately. Some drivers call into the Magic SysRq code with interrupts or preemption disabled, so the output of Magic SysRq will be buffered instead of printing immediately if this option is not selected. Even with this option selected, Magic SysRq output will be delayed if the attempt to acquire the console lock fails. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E7CEF60.5020508@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	add /sys/kernel/realtime entry	Clark Williams
	Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-12-28	kgdb/serial: Short term workaround	Jason Wessel
	On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason.
2011-12-28	printk: Disable migration instead of preemption	Richard Weinberger
	There is no need do disable preemption in vprintk(), disable_migrate() is sufficient. This fixes the following bug in -rt: [ 14.759233] BUG: sleeping function called from invalid context at /home/rw/linux-rt/kernel/rtmutex.c:645 [ 14.759235] in_atomic(): 1, irqs_disabled(): 0, pid: 547, name: bash [ 14.759244] Pid: 547, comm: bash Not tainted 3.0.12-rt29+ #3 [ 14.759246] Call Trace: [ 14.759301] [<ffffffff8106fade>] __might_sleep+0xeb/0xf0 [ 14.759318] [<ffffffff810ad784>] rt_spin_lock_fastlock.constprop.9+0x21/0x43 [ 14.759336] [<ffffffff8161fef0>] rt_spin_lock+0xe/0x10 [ 14.759354] [<ffffffff81347ad1>] serial8250_console_write+0x81/0x121 [ 14.759366] [<ffffffff8107ecd3>] __call_console_drivers+0x7c/0x93 [ 14.759369] [<ffffffff8107ef31>] _call_console_drivers+0x5c/0x60 [ 14.759372] [<ffffffff8107f7e5>] console_unlock+0x147/0x1a2 [ 14.759374] [<ffffffff8107fd33>] vprintk+0x3ea/0x462 [ 14.759383] [<ffffffff816160e0>] printk+0x51/0x53 [ 14.759399] [<ffffffff811974e4>] ? proc_reg_poll+0x9a/0x9a [ 14.759403] [<ffffffff81335b42>] __handle_sysrq+0x50/0x14d [ 14.759406] [<ffffffff81335c8a>] write_sysrq_trigger+0x4b/0x53 [ 14.759408] [<ffffffff81335c3f>] ? __handle_sysrq+0x14d/0x14d [ 14.759410] [<ffffffff81197583>] proc_reg_write+0x9f/0xbe [ 14.759426] [<ffffffff811497ec>] vfs_write+0xac/0xf3 [ 14.759429] [<ffffffff8114a9b3>] ? fget_light+0x3a/0x9b [ 14.759431] [<ffffffff811499db>] sys_write+0x4a/0x6e [ 14.759438] [<ffffffff81625d52>] system_call_fastpath+0x16/0x1b Signed-off-by: Richard Weinberger <rw@linutronix.de> Link: http://lkml.kernel.org/r/1323696956-11445-1-git-send-email-rw@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	console-make-rt-friendly.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	x86-no-perf-irq-work-rt.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	hotplug-stuff.patch	Thomas Gleixner
	Do not take lock for non handled cases (might be atomic context) Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	workqueue: Use get_cpu_light() in flush_gcwq()	Yong Zhang
	BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 0, pid: 1739, name: bash Pid: 1739, comm: bash Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c01655a6>] flush_gcwq+0x236/0x320 [<c021c651>] ? kfree+0xe1/0x1a0 [<c05b7178>] ? __cpufreq_remove_dev+0xf8/0x260 [<c0183fad>] ? rt_down_write+0xd/0x10 [<c06cd91e>] workqueue_cpu_down_callback+0x26/0x2d [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c06cbc6f>] _cpu_down+0xdf/0x330 [<c06cbef0>] cpu_down+0x30/0x50 [<c06cd6b0>] store_online+0x50/0xa7 [<c06cd660>] ? acpi_os_map_memory+0xec/0xec [<c04f2faa>] sysdev_store+0x2a/0x40 [<c02887a4>] sysfs_write_file+0xa4/0x100 [<c0229ab2>] vfs_write+0xa2/0x170 [<c0288700>] ? sysfs_poll+0x90/0x90 [<c0229d92>] sys_write+0x42/0x70 [<c06ecedf>] sysenter_do_call+0x12/0x2d CPU 1 is now offline SMP alternatives: switching to UP code SMP alternatives: switching to SMP code Booting Node 0 Processor 1 APIC 0x1 smpboot cpu 1: start_ip = 9b000 Initializing CPU#1 BUG: sleeping function called from invalid context at kernel/rtmutex.c:645 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: kworker/0:0 Pid: 0, comm: kworker/0:0 Not tainted 3.0.6-rt17-00284-gb76d419 #3 Call Trace: [<c06e3b5d>] ? printk+0x1d/0x20 [<c01390b6>] __might_sleep+0xe6/0x110 [<c06e633c>] rt_spin_lock+0x1c/0x30 [<c06cd85b>] workqueue_cpu_up_callback+0x56/0xf3 [<c06e9d65>] notifier_call_chain+0x45/0x60 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30 [<c014c9b4>] __cpu_notify+0x24/0x40 [<c014c9ec>] cpu_notify+0x1c/0x20 [<c06e1d43>] notify_cpu_starting+0x1e/0x20 [<c06e0aad>] smp_callin+0xfb/0x10e [<c06e0ad9>] start_secondary+0x19/0xd7 NMI watchdog enabled, takes one hw-pmu counter. Switched to NOHz mode on CPU #1 Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/1318762607-2261-5-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	workqueue: Fix PF_THREAD_BOUND abuse	Peter Zijlstra
	PF_THREAD_BOUND is set by kthread_bind() and means the thread is bound to a particular cpu for correctness. The workqueue code abuses this flag and blindly sets it for all created threads, including those that are free to migrate. Restore the original semantics now that the worst abuse in the cpu-hotplug path are gone. The only icky bit is the rescue thread for per-cpu workqueues, this cannot use kthread_bind() but will use set_cpus_allowed_ptr() to migrate itself to the desired cpu. Set and clear PF_THREAD_BOUND manually here. XXX: I think worker_maybe_bind_and_lock()/worker_unbind_and_unlock() should also do a get_online_cpus(), this would likely allow us to remove the while loop. XXX: should probably repurpose GCWQ_DISASSOCIATED to warn on adding works after CPU_DOWN_PREPARE -- its dual use to mark unbound gcwqs is a tad annoying though. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	workqueue: Fix cpuhotplug trainwreck	Peter Zijlstra
	The current workqueue code does crazy stuff on cpu unplug, it relies on forced affine breakage, thereby violating per-cpu expectations. Worse, it tries to re-attach to a cpu if the thing comes up again before all previously queued works are finished. This breaks (admittedly bonkers) cpu-hotplug use that relies on a down-up cycle to push all usage away. Introduce a new WQ_NON_AFFINE flag that indicates a per-cpu workqueue will not respect cpu affinity and use this to migrate all its pending works to whatever cpu is doing cpu-down. This also adds a warning for queue_on_cpu() users which warns when its used on WQ_NON_AFFINE workqueues for the API implies you care about what cpu things are ran on when such workqueues cannot guarantee this. For the rest, simply flush all per-cpu works and don't mess about. This also means that currently all workqueues that are manually flushing things on cpu-down in order to provide the per-cpu guarantee no longer need to do so. In short, we tell the WQ what we want it to do, provide validation for this and loose ~250 lines of code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	workqueue-use-get-cpu-light.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rt/rcutree: Move misplaced prototype	Ingo Molnar
	Fix this warning on x86 defconfig: kernel/rcutree.h:433:13: warning: ‘rcu_preempt_qs’ declared ‘static’ but never defined [-Wunused-function] The #ifdefs and prototypes here are a maze, move it closer to the usage site that needs it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rcu: Make ksoftirqd do RCU quiescent states	Paul E. McKenney
	Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rcu-more-fallout.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rcu: Merge RCU-bh into RCU-preempt	Thomas Gleixner
	The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rcu: Frob softirq test	Peter Zijlstra
	With RT_FULL we get the below wreckage: [ 126.060484] ======================================================= [ 126.060486] [ INFO: possible circular locking dependency detected ] [ 126.060489] 3.0.1-rt10+ #30 [ 126.060490] ------------------------------------------------------- [ 126.060492] irq/24-eth0/1235 is trying to acquire lock: [ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060503] [ 126.060504] but task is already holding lock: [ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060511] [ 126.060511] which lock already depends on the new lock. [ 126.060513] [ 126.060514] [ 126.060514] the existing dependency chain (in reverse order) is: [ 126.060516] [ 126.060516] -> #1 (&p->pi_lock){-...-.}: [ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85 [ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f [ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a [ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f [ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde [ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b [ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060551] [ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}: [ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060611] [ 126.060612] other info that might help us debug this: [ 126.060614] [ 126.060615] Possible unsafe locking scenario: [ 126.060616] [ 126.060617] CPU0 CPU1 [ 126.060619] ---- ---- [ 126.060620] lock(&p->pi_lock); [ 126.060623] lock(&(lock)->wait_lock); [ 126.060625] lock(&p->pi_lock); [ 126.060627] lock(&(lock)->wait_lock); [ 126.060629] [ 126.060629] * DEADLOCK * [ 126.060630] [ 126.060632] 1 lock held by irq/24-eth0/1235: [ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060638] [ 126.060638] stack backtrace: [ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30 [ 126.060643] Call Trace: [ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a [ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99 [ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c [ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4 [ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5 [ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90 [ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21 [ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d [ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f [ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f [ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59 [ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a [ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a [ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe [ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c [ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb Because irq_exit() does: void irq_exit(void) { account_system_vtime(current); trace_hardirq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } Which triggers a wakeup, which uses RCU, now if the interrupted task has t->rcu_read_unlock_special set, the rcu usage from the wakeup will end up in rcu_read_unlock_special(). rcu_read_unlock_special() will test for in_irq(), which will fail as we just decremented preempt_count with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for PREEMPT_RT_FULL reads: int in_serving_softirq(void) { int res; preempt_disable(); res = __get_cpu_var(local_softirq_runner) == current; preempt_enable(); return res; } Which will thus also fail, resulting in the above wreckage. The 'somewhat' ugly solution is to open-code the preempt_count() test in rcu_read_unlock_special(). Also, we're not at all sure how ->rcu_read_unlock_special gets set here... so this is very likely a bandaid and more thought is required. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2011-12-28	timer-handle-idle-trylock-in-get-next-timer-irq.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rwlocks: Fix section mismatch	John Kacur
	This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration make[2]: * [kernel/fork.o] Error 1 make[1]: * [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rt: Add the preempt-rt lock replacement APIs	Thomas Gleixner
	Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rt-mutex-add-sleeping-spinlocks-support.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rtmutex-futex-prepare-rt.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	rtmutex-lock-killable.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	tasklet: Prevent tasklets from going into infinite spin in RT	Ingo Molnar
	When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads, and spinlocks turn are mutexes. But this can cause issues with tasks disabling tasklets. A tasklet runs under ksoftirqd, and if a tasklets are disabled with tasklet_disable(), the tasklet count is increased. When a tasklet runs, it checks this counter and if it is set, it adds itself back on the softirq queue and returns. The problem arises in RT because ksoftirq will see that a softirq is ready to run (the tasklet softirq just re-armed itself), and will not sleep, but instead run the softirqs again. The tasklet softirq will still see that the count is non-zero and will not execute the tasklet and requeue itself on the softirq again, which will cause ksoftirqd to run it again and again and again. It gets worse because ksoftirqd runs as a real-time thread. If it preempted the task that disabled tasklets, and that task has migration disabled, or can't run for other reasons, the tasklet softirq will never run because the count will never be zero, and ksoftirqd will go into an infinite loop. As an RT task, it this becomes a big problem. This is a hack solution to have tasklet_disable stop tasklets, and when a tasklet runs, instead of requeueing the tasklet softirqd it delays it. When tasklet_enable() is called, and tasklets are waiting, then the tasklet_enable() will kick the tasklets to continue. This prevents the lock up from ksoftirq going into an infinite loop. [ rostedt@goodmis.org: ported to 3.0-rt ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	softirq-make-fifo.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	softirq: Fix unplug deadlock	Peter Zijlstra
	If ksoftirqd gets woken during hot-unplug, __thread_do_softirq() will call pin_current_cpu() which will block on the held cpu_hotplug.lock. Moving the offline check in __thread_do_softirq() before the pin_current_cpu() call doesn't work, since the wakeup can happen before we mark the cpu offline. So here we have the ksoftirq thread stuck until hotplug finishes, but then the ksoftirq CPU_DOWN notifier issues kthread_stop() which will wait for the ksoftirq thread to go away -- while holding the hotplug lock. Sort this by delaying the kthread_stop() until CPU_POST_DEAD, which is outside of the cpu_hotplug.lock, but still serialized by the cpu_add_remove_lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/1317391156.12973.3.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	softirq: Export in_serving_softirq()	John Kacur
	ERROR: "in_serving_softirq" [net/sched/cls_cgroup.ko] undefined! The above can be fixed by exporting in_serving_softirq Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1321235083-21756-2-git-send-email-jkacur@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	softirq-local-lock.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	mutex-no-spin-on-rt.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	lockdep-rt.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	softirq: Sanitize softirq pending for NOHZ/RT	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	ring-buffer: Convert reader_lock from raw_spin_lock into spin_lock	Steven Rostedt
	The reader_lock is mostly taken in normal context with interrupts enabled. But because ftrace_dump() can happen anywhere, it is used as a spin lock and in some cases a check to in_nmi() is performed to determine if the ftrace_dump() was initiated from an NMI and if it is, the lock is not taken. But having the lock as a raw_spin_lock() causes issues with the real-time kernel as the lock is held during allocation and freeing of the buffer. As memory locks convert into mutexes, keeping the reader_lock as a spin_lock causes problems. Converting the reader_lock is not straight forward as we must still deal with the ftrace_dump() happening not only from an NMI but also from true interrupt context in PREEPMT_RT. Two wrapper functions are created to take and release the reader lock: int read_buffer_lock(cpu_buffer, unsigned long *flags) void read_buffer_unlock(cpu_buffer, unsigned long flags, int locked) The read_buffer_lock() returns 1 if it actually took the lock, disables interrupts and updates the flags. The only time it returns 0 is in the case of a ftrace_dump() happening in an unsafe context. The read_buffer_unlock() checks the return of locked and will simply unlock the spin lock if it was successfully taken. Instead of just having this in specific cases that the NMI might call into, all instances of the reader_lock is converted to the wrapper functions to make this a bit simpler to read and less error prone. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1317146210.26514.33.camel@gandalf.stny.rr.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	ftrace-crap.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched-clear-pf-thread-bound-on-fallback-rq.patch	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched: Have migrate_disable ignore bounded threads	Peter Zijlstra
	Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124423.567944215@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched: Do not compare cpu masks in scheduler	Peter Zijlstra
	Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124423.128129033@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched: Postpone actual migration disalbe to schedule	Steven Rostedt
	The migrate_disable() can cause a bit of a overhead to the RT kernel, as changing the affinity is expensive to do at every lock encountered. As a running task can not migrate, the actual disabling of migration does not need to occur until the task is about to schedule out. In most cases, a task that disables migration will enable it before it schedules making this change improve performance tremendously. [ Frank Rowand: UP compile fix ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Link: http://lkml.kernel.org/r/20110927124422.779693167@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched: teach migrate_disable about atomic contexts	Peter Zijlstra
	<NMI> [<ffffffff812dafd8>] spin_bug+0x94/0xa8 [<ffffffff812db07f>] do_raw_spin_lock+0x43/0xea [<ffffffff814fa9be>] _raw_spin_lock_irqsave+0x6b/0x85 [<ffffffff8106ff9e>] ? migrate_disable+0x75/0x12d [<ffffffff81078aaf>] ? pin_current_cpu+0x36/0xb0 [<ffffffff8106ff9e>] migrate_disable+0x75/0x12d [<ffffffff81115b9d>] pagefault_disable+0xe/0x1f [<ffffffff81047027>] copy_from_user_nmi+0x74/0xe6 [<ffffffff810489d7>] perf_callchain_user+0xf3/0x135 Now clearly we can't go around taking locks from NMI context, cure this by short-circuiting migrate_disable() when we're in an atomic context already. Add some extra debugging to avoid things like: preempt_disable() migrate_disable(); preempt_enable(); migrate_enable(); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314967297.1301.14.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-wbot4vsmwhi8vmbf83hsclk6@git.kernel.org
2011-12-28	sched, rt: Fix migrate_enable() thinko	Mike Galbraith
	Assigning mask = tsk_cpus_allowed(p) after p->migrate_disable = 0 ensures that we won't see a mask change.. no push/pull, we stack tasks on one CPU. Also add a couple fields to sched_debug for the next guy. [ Build fix from Stratos Psomadakis <psomas@gentoo.org> ] Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1314108763.6689.4.camel@marge.simson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-12-28	sched: Generic migrate_disable	Peter Zijlstra
	Make migrate_disable() be a preempt_disable() for !rt kernels. This allows generic code to use it but still enforces that these code sections stay relatively small. A preemptible migrate_disable() accessible for general use would allow people growing arbitrary per-cpu crap instead of clean these things up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-275i87sl8e1jcamtchmehonm@git.kernel.org