linux-toradex.git/kernel, branch v3.10.78

ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

2015-05-06T19:56:27+00:00

commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.

While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
infrastructure"):

    [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

...but despite this note, the patch still calls RCU with IRQs disabled.

This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.

Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.

Signed-off-by: Calvin Owens 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

ptrace: fix race between ptrace_resume() and wait_task_stopped()

2015-05-06T19:56:24+00:00

commit b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed upstream.

ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
tracee->exit_code and then wake_up_state() changes tracee->state.  If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

	#include 
	#include 
	#include 
	#include 
	#include 
	#include 

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov 
Reported-by: Pavel Labath 
Tested-by: Pavel Labath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ring-buffer: Replace this_cpu_() with __this_cpu_()

2015-05-06T19:56:21+00:00

commit 80a9b64e2c156b6523e7a01f2ba6e5d86e722814 upstream.

It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.

   101.356868: preempt_count_add <-ring_buffer_lock_reserve
   101.356870: preempt_count_sub <-ring_buffer_lock_reserve

The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.

The generic version of this_cpu_read() and write() are:

 #define this_cpu_generic_read(pcp)					\
 ({	typeof(pcp) ret__;						\
	preempt_disable();						\
	ret__ = *this_cpu_ptr(&(pcp));					\
	preempt_enable();						\
	ret__;								\
 })

 #define this_cpu_generic_to_op(pcp, val, op)				\
 do {									\
	unsigned long flags;						\
	raw_local_irq_save(flags);					\
	*__this_cpu_ptr(&(pcp)) op val;					\
	raw_local_irq_restore(flags);					\
 } while (0)

Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.

Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.

I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.

Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home

Acked-by: Christoph Lameter 
Reported-by: Uwe Kleine-Koenig 
Tested-by: Uwe Kleine-Koenig 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

move d_rcu from overlapping d_child to overlapping d_alias

2015-04-29T08:34:00+00:00

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro 
Cc: Ben Hutchings 
[hujianyang: Backported to 3.10 refer to the work of Ben Hutchings in 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: hujianyang 
Signed-off-by: Greg Kroah-Hartman

console: Fix console name size mismatch

2015-04-19T08:10:51+00:00

commit 30a22c215a0007603ffc08021f2e8b64018517dd upstream.

commit 6ae9200f2cab7 ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.

Signed-off-by: Peter Hurley 
Signed-off-by: Greg Kroah-Hartman

perf: Fix irq_work 'tail' recursion

2015-04-13T12:02:12+00:00

commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream.

Vince reported a watchdog lockup like:

	[] perf_tp_event+0xc4/0x210
	[] perf_trace_lock+0x12a/0x160
	[] lock_release+0x130/0x260
	[] _raw_spin_unlock_irqrestore+0x24/0x40
	[] do_send_sig_info+0x5d/0x80
	[] send_sigio_to_task+0x12f/0x1a0
	[] send_sigio+0xae/0x100
	[] kill_fasync+0x97/0xf0
	[] perf_event_wakeup+0xd4/0xf0
	[] perf_pending_event+0x33/0x60
	[] irq_work_run_list+0x4c/0x80
	[] irq_work_run+0x18/0x40
	[] smp_trace_irq_work_interrupt+0x3f/0xc0
	[] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver 
Tested-by: Vince Weaver 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Steven Rostedt 
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE

2015-03-26T14:00:58+00:00

commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream.

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo 
Reported-by: Rabin Vincent 
Cc: Tomeu Vizoso 
Tested-by: Jesper Nilsson 
Tested-by: Rabin Vincent 
Signed-off-by: Greg Kroah-Hartman

PM / QoS: remove duplicate call to pm_qos_update_target

2015-03-18T12:22:28+00:00

In 3.10.y backport patch 1dba303727f52ea062580b0a9b3f0c3b462769cf,
the logic to call pm_qos_update_target was moved to __pm_qos_update_request.
However, the original code was left in function pm_qos_update_request.

Currently, if pm_qos_update_request is called where new_value !=
req->node.prio then pm_qos_update_target will be called twice in a row.
Once in pm_qos_update_request and then again in the following call to
_pm_qos_update_request.

Removing the left over code from pm_qos_update_request stops this second
call to pm_qos_update_target where the work of removing / re-adding the
new_value in the constraints list would be duplicated.

Signed-off-by: Michael Scott 
Signed-off-by: Greg Kroah-Hartman

ntp: Fixup adjtimex freq validation on 32-bit systems

2015-03-06T22:40:52+00:00

commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream.

Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer 
Reported-by: George Joseph 
Tested-by: George Joseph 
Signed-off-by: John Stultz 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Sasha Levin 
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

kdb: fix incorrect counts in KDB summary command output