linux-toradex.git/kernel, branch v3.17.4

rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads

2014-11-21T17:23:15+00:00

commit 2aa792e6faf1a00f5accf1f69e87e11a390ba2cd upstream.

The rcu_gp_kthread_wake() function checks for three conditions before
waking up grace period kthreads:

*  Is the thread we are trying to wake up the current thread?
*  Are the gp_flags zero? (all threads wait on non-zero gp_flags condition)
*  Is there no thread created for this flavour, hence nothing to wake up?

If any one of these condition is true, we do not call wake_up().
It was found that there are quite a few avoidable wake ups both during
idle time and under stress induced by rcutorture.

Idle:

Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0
Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0

rcutorture:

Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0
Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0

Here case{1-3} are the cases listed above. We can avoid these wake
ups by using rcu_gp_kthread_wake() to conditionally wake up the grace
period kthreads.

There is a comment about an implied barrier supplied by the wake_up()
logic.  This barrier is necessary for the awakened thread to see the
updated ->gp_flags.  This flag is always being updated with the root node
lock held. Also, the awakened thread tries to acquire the root node lock
before reading ->gp_flags because of which there is proper ordering.

Hence this commit tries to avoid calling wake_up() whenever we can by
using rcu_gp_kthread_wake() function.

Signed-off-by: Pranith Kumar 
CC: Mathieu Desnoyers 
Signed-off-by: Paul E. McKenney 
Cc: Kamal Mostafa 
Signed-off-by: Greg Kroah-Hartman

tracing: Do not busy wait in buffer splice

2014-11-21T17:23:10+00:00

commit e30f53aad2202b5526c40c36d8eeac8bf290bde5 upstream.

On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
lockup:

 # trace-cmd record -e raw_syscalls:* -F false
 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
 ...
 Call Trace:
  [] ? __wake_up_common+0x90/0x90
  [] wait_on_pipe+0x35/0x40
  [] tracing_buffers_splice_read+0x2e3/0x3c0
  [] ? tracing_stats_read+0x2a0/0x2a0
  [] ? _raw_spin_unlock+0x2b/0x40
  [] ? do_read_fault+0x21b/0x290
  [] ? handle_mm_fault+0x2ba/0xbd0
  [] ? trace_event_buffer_lock_reserve+0x40/0x80
  [] ? trace_buffer_lock_reserve+0x22/0x60
  [] ? trace_event_buffer_lock_reserve+0x40/0x80
  [] do_splice_to+0x6d/0x90
  [] SyS_splice+0x7c1/0x800
  [] tracesys_phase2+0xd3/0xd8

The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers.  The buffers
are not empty so ring_buffer_wait() returns immediately.  But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page.  When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.

Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e.  until full-page reads will succeed.

Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in

Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
Signed-off-by: Rabin Vincent 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

audit: keep inode pinned

2014-11-21T17:23:10+00:00

commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi 
Signed-off-by: Paul Moore 
Signed-off-by: Greg Kroah-Hartman

audit: AUDIT_FEATURE_CHANGE message format missing delimiting space

2014-11-21T17:23:09+00:00

commit 897f1acbb6702ddaa953e8d8436eee3b12016c7e upstream.

Add a space between subj= and feature= fields to make them parsable.

Signed-off-by: Richard Guy Briggs 
Signed-off-by: Paul Moore 
Signed-off-by: Greg Kroah-Hartman

audit: correct AUDIT_GET_FEATURE return message type

2014-11-21T17:23:09+00:00

commit 9ef91514774a140e468f99d73d7593521e6d25dc upstream.

When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature.  The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.

This appears to have been a cut-and-paste-eo in commit b0fed40.

Reported-by: Steve Grubb 
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Greg Kroah-Hartman

sched: Use rq->rd in sched_setaffinity() under RCU read lock

2014-11-14T18:10:38+00:00

commit f1e3a0932f3a9554371792a7daaf1e0eb19f66d5 upstream.

Probability of use-after-free isn't zero in this place.

Signed-off-by: Kirill Tkhai 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Paul E. McKenney 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20140922183636.11015.83611.stgit@localhost
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

posix-timers: Fix stack info leak in timer_create()

2014-11-14T18:10:38+00:00

commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause 
Cc: Oleg Nesterov 
Cc: Brad Spengler 
Cc: PaX Team 
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

PM / Sleep: fix recovery during resuming from hibernation

2014-11-14T18:10:37+00:00

commit 94fb823fcb4892614f57e59601bb9d4920f24711 upstream.

If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Signed-off-by: Imre Deak 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

OOM, PM: OOM killed task shouldn't escape PM suspend

2014-11-14T18:10:32+00:00

commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
  check_frozen_processes and invert the condition to make the code more
  readable as per Rafael

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman

freezer: Do not freeze tasks killed by OOM killer

2014-11-14T18:10:32+00:00

commit 51fae6da640edf9d266c94f36bc806c63c301991 upstream.

Since f660daac474c6f (oom: thaw threads if oom killed thread is frozen
before deferring) OOM killer relies on being able to thaw a frozen task
to handle OOM situation but a3201227f803 (freezer: make freezing() test
freeze conditions in effect instead of TIF_FREEZE) has reorganized the
code and stopped clearing freeze flag in __thaw_task. This means that
the target task only wakes up and goes into the fridge again because the
freezing condition hasn't changed for it. This reintroduces the bug
fixed by f660daac474c6f.

Fix the issue by checking for TIF_MEMDIE thread flag in
freezing_slow_path and exclude the task from freezing completely. If a
task was already frozen it would get woken by __thaw_task from OOM killer
and get out of freezer after rechecking freezing().

Changes since v1
- put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
  as per Oleg
- return __thaw_task into oom_scan_process_thread because
  oom_kill_process will not wake task in the fridge because it is
  sleeping uninterruptible

[mhocko@suse.cz: rewrote the changelog]
Fixes: a3201227f803 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
Signed-off-by: Cong Wang 
Signed-off-by: Michal Hocko 
Acked-by: Oleg Nesterov 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman