linux-toradex.git/include/linux/sched, branch v4.12-rc4

mm: introduce memalloc_noreclaim_{save,restore}

2017-05-09T00:15:15+00:00

The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers.  Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.

There are no known issues with the converted code, but the change makes
it more robust.

Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Suggested-by: Michal Hocko 
Acked-by: Michal Hocko 
Acked-by: Hillf Danton 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Andrey Ryabinin 
Cc: Boris Brezillon 
Cc: Chris Leech 
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Josef Bacik 
Cc: Lee Duncan 
Cc: Michal Hocko 
Cc: Richard Weinberger 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

2017-05-05T18:08:43+00:00

Pull namespace updates from Eric Biederman:
 "This is a set of small fixes that were mostly stumbled over during
  more significant development. This proc fix and the fix to
  posix-timers are the most significant of the lot.

  There is a lot of good development going on but unfortunately it
  didn't quite make the merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Fix unbalanced hard link numbers
  signal: Make kill_proc_info static
  rlimit: Properly call security_task_setrlimit
  signal: Remove unused definition of sig_user_definied
  ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
  ipc: Remove unused declaration of recompute_msgmni
  posix-timers: Correct sanity check in posix_cpu_nsleep
  sysctl: Remove dead register_sysctl_root

mm: introduce memalloc_nofs_{save,restore} API

2017-05-03T22:52:09+00:00

GFP_NOFS context is used for the following 5 reasons currently:

 - to prevent from deadlocks when the lock held by the allocation
   context would be needed during the memory reclaim

 - to prevent from stack overflows during the reclaim because the
   allocation is performed from a deep context already

 - to prevent lockups when the allocation context depends on other
   reclaimers to make a forward progress indirectly

 - just in case because this would be safe from the fs POV

 - silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems to
the MM.  Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily.  We would like to get rid of
those as much as possible.  One way to do that is to use the flag in
scopes rather than isolated cases.  Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context.  This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Cc: Dave Chinner 
Cc: Theodore Ts'o 
Cc: Chris Mason 
Cc: David Sterba 
Cc: Jan Kara 
Cc: Brian Foster 
Cc: Darrick J. Wong 
Cc: Nikolay Borisov 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signal: Make kill_proc_info static

2017-04-22T03:46:25+00:00

There are no users outside of signal.c so make the function static so
the compiler and other developers have that information.

Signed-off-by: "Eric W. Biederman"

Merge branch 'linus' into locking/core, to pick up fixes

2017-04-14T08:29:40+00:00

Signed-off-by: Ingo Molnar

sched/rtmutex: Refactor rt_mutex_setprio()

2017-04-04T09:44:06+00:00

With the introduction of SCHED_DEADLINE the whole notion that priority
is a single number is gone, therefore the @prio argument to
rt_mutex_setprio() doesn't make sense anymore.

So rework the code to pass a pi_task instead.

Note this also fixes a problem with pi_top_task caching; previously we
would not set the pointer (call rt_mutex_update_top_task) if the
priority didn't change, this could lead to a stale pointer.

As for the XXX, I think its fine to use pi_task->prio, because if it
differs from waiter->prio, a PI chain update is immenent.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170323150216.303827095@infradead.org
Signed-off-by: Thomas Gleixner

sched/rtmutex/deadline: Fix a PI crash for deadline tasks

2017-04-04T09:44:05+00:00

A crash happened while I was playing with deadline PI rtmutex.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    IP: [] rt_mutex_get_top_task+0x1f/0x30
    PGD 232a75067 PUD 230947067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 1 PID: 10994 Comm: a.out Not tainted

    Call Trace:
    [] enqueue_task+0x2c/0x80
    [] activate_task+0x23/0x30
    [] pull_dl_task+0x1d5/0x260
    [] pre_schedule_dl+0x16/0x20
    [] __schedule+0xd3/0x900
    [] schedule+0x29/0x70
    [] __rt_mutex_slowlock+0x4b/0xc0
    [] rt_mutex_slowlock+0xd1/0x190
    [] rt_mutex_timed_lock+0x53/0x60
    [] futex_lock_pi.isra.18+0x28c/0x390
    [] do_futex+0x190/0x5b0
    [] SyS_futex+0x80/0x180

This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi()
are only protected by pi_lock when operating pi waiters, while
rt_mutex_get_top_task(), will access them with rq lock held but
not holding pi_lock.

In order to tackle it, we introduce new "pi_top_task" pointer
cached in task_struct, and add new rt_mutex_update_top_task()
to update its value, it can be called by rt_mutex_setprio()
which held both owner's pi_lock and rq lock. Thus "pi_top_task"
can be safely accessed by enqueue_task_dl() under rq lock.

Originally-From: Peter Zijlstra 
Signed-off-by: Xunlei Pang 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Steven Rostedt 
Reviewed-by: Thomas Gleixner 
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170323150216.157682758@infradead.org
Signed-off-by: Thomas Gleixner

sched/clock, x86/perf: Fix "perf test tsc"

2017-03-23T06:31:49+00:00

People reported that commit:

  5680d8094ffa ("sched/clock: Provide better clock continuity")

broke "perf test tsc".

That commit added another offset to the reported clock value; so
take that into account when computing the provided offset values.

Reported-by: Adrian Hunter 
Reported-by: Arnaldo Carvalho de Melo 
Tested-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 5680d8094ffa ("sched/clock: Provide better clock continuity")
Signed-off-by: Ingo Molnar

sched/headers: Remove the include from

2017-03-03T00:45:41+00:00

It's used only by a single (rarely used) inline function (task_node(p)),
which we can move to .

( Add , because we rely on that. )

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Ingo Molnar

sched/headers: Remove from

2017-03-03T00:45:38+00:00

The  file is a self-contained header and users of
it either don't need  - or have already included it.

This reduces the size of the header dependency graph.

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Ingo Molnar