linux-toradex.git/kernel, branch v3.19.4

perf: Fix irq_work 'tail' recursion

2015-04-13T12:03:54+00:00

commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream.

Vince reported a watchdog lockup like:

	[] perf_tp_event+0xc4/0x210
	[] perf_trace_lock+0x12a/0x160
	[] lock_release+0x130/0x260
	[] _raw_spin_unlock_irqrestore+0x24/0x40
	[] do_send_sig_info+0x5d/0x80
	[] send_sigio_to_task+0x12f/0x1a0
	[] send_sigio+0xae/0x100
	[] kill_fasync+0x97/0xf0
	[] perf_event_wakeup+0xd4/0xf0
	[] perf_pending_event+0x33/0x60
	[] irq_work_run_list+0x4c/0x80
	[] irq_work_run+0x18/0x40
	[] smp_trace_irq_work_interrupt+0x3f/0xc0
	[] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver 
Tested-by: Vince Weaver 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Steven Rostedt 
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

cpuset: Fix cpuset sched_relax_domain_level

2015-03-26T12:59:43+00:00

commit 283cb41f426b723a0255702b761b0fc5d1b53a81 upstream.

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26acce ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Signed-off-by: Jason Low 
Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo 
Tested-by: Serge Hallyn 
Signed-off-by: Greg Kroah-Hartman

cpuset: fix a warning when clearing configured masks in old hierarchy

2015-03-26T12:59:42+00:00

commit 79063bffc81f82689bd90e16da1b49408f3bf095 upstream.

When we clear cpuset.cpus, cpuset.effective_cpus won't be cleared:

  # mount -t cgroup -o cpuset xxx /mnt
  # mkdir /mnt/tmp
  # echo 0 > /mnt/tmp/cpuset.cpus
  # echo > /mnt/tmp/cpuset.cpus
  # cat cpuset.cpus

  # cat cpuset.effective_cpus
  0-15

And a kernel warning in update_cpumasks_hier() is triggered:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4028 at kernel/cpuset.c:894 update_cpumasks_hier+0x471/0x650()

Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo 
Tested-by: Serge Hallyn 
Signed-off-by: Greg Kroah-Hartman

cpuset: initialize effective masks when clone_children is enabled

2015-03-26T12:59:42+00:00

commit 790317e1b266c776765a4bdcedefea706ff0fada upstream.

If clone_children is enabled, effective masks won't be initialized
due to the bug:

  # mount -t cgroup -o cpuset xxx /mnt
  # echo 1 > cgroup.clone_children
  # mkdir /mnt/tmp
  # cat /mnt/tmp/
  # cat cpuset.effective_cpus

  # cat cpuset.cpus
  0-15

And then this cpuset won't constrain the tasks in it.

Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.

Reported-by: Christian Brauner 
Reported-by: Serge Hallyn 
Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo 
Tested-by: Serge Hallyn 
Signed-off-by: Greg Kroah-Hartman

workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE

2015-03-26T12:59:42+00:00

commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream.

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo 
Reported-by: Rabin Vincent 
Cc: Tomeu Vizoso 
Tested-by: Jesper Nilsson 
Tested-by: Rabin Vincent 
Signed-off-by: Greg Kroah-Hartman

ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled

2015-03-26T12:59:41+00:00

commit 524a38682573b2e15ab6317ccfe50280441514be upstream.

Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.

That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().

Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.

The problem could be seen by the following commands:

 # echo 0 > /proc/sys/kernel/ftrace_enabled
 # echo function > /sys/kernel/debug/tracing/current_tracer
 # echo 1 > /proc/sys/kernel/ftrace_enabled
 # cat /sys/kernel/debug/tracing/trace

The trace will show that function tracing was not active.

Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl

2015-03-26T12:59:41+00:00

commit 1619dc3f8f555ee1cdd3c75db3885d5715442b12 upstream.

When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().

Consider the following situation.

# echo 0 > /proc/sys/kernel/ftrace_enabled

After this ftrace_enabled = 0.

# echo function_graph > /sys/kernel/debug/tracing/current_tracer

Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.

# echo 1 > /proc/sys/kernel/ftrace_enabled

Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.

Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer

Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.

On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.

Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.

Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com

Signed-off-by: Pratyush Anand
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt
Signed-off-by: Greg Kroah-Hartman

ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl

2015-03-26T12:59:40+00:00

commit b24d443b8f17d9776f5fc1f6c780a0a21eb02913 upstream.

When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.

ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).

When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.

For example, the following will cause ftrace to crash:

 # echo function_graph > /sys/kernel/debug/tracing/current_tracer
 # echo 0 > /proc/sys/kernel/ftrace_enabled
 # echo nop > /sys/kernel/debug/tracing/current_tracer
 # echo 1 > /proc/sys/kernel/ftrace_enabled
 # echo function_graph > /sys/kernel/debug/tracing/current_tracer

As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.

Reported-by: Pratyush Anand 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

console: Fix console name size mismatch

2015-03-26T12:59:39+00:00

commit 30a22c215a0007603ffc08021f2e8b64018517dd upstream.

commit 6ae9200f2cab7 ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.

Signed-off-by: Peter Hurley 
Signed-off-by: Greg Kroah-Hartman

sched/autogroup: Fix failure to set cpu.rt_runtime_us

2015-03-18T13:11:12+00:00

commit 1fe89e1b6d270aa0d3452c60d38461ea589594e3 upstream.

Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.

In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.

Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.

Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.

Reported-by: Zefan Li 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Stefan Bader 
Fixes: 8323f26ce342 ("sched: Fix race in task_group()")
Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman