linux-toradex.git/kernel/fork.c, branch v3.16.7

perf: fix perf bug in fork()

2014-10-09T19:23:47+00:00

commit 6c72e3501d0d62fc064d3680e5234f3463ec5a86 upstream.

Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process.  This is bad..

Suggested-by: Oleg Nesterov 
Reported-by: Oleg Nesterov 
Reported-by: Sylvain 'ythier' Hitier 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

tracing: Fix syscall_*regfunc() vs copy_process() race

2014-06-21T04:15:12+00:00

syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.

Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.

Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com

Cc: stable@vger.kernel.org # 2.6.33
Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints"
Acked-by: Frederic Weisbecker 
Acked-by: Paul E. McKenney 
Signed-off-by: Oleg Nesterov 
Signed-off-by: Steven Rostedt

ptrace: fix fork event messages across pid namespaces

2014-06-06T23:08:11+00:00

When tracing a process in another pid namespace, it's important for fork
event messages to contain the child's pid as seen from the tracer's pid
namespace, not the parent's.  Otherwise, the tracer won't be able to
correlate the fork event with later SIGTRAP signals it receives from the
child.

We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value.  However, sending a
bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.

Signed-off-by: Matthew Dempsky 
Acked-by: Oleg Nesterov 
Cc: Kees Cook 
Cc: Julien Tinnes 
Cc: Roland McGrath 
Cc: Jan Kratochvil 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memcg: kill CONFIG_MM_OWNER

2014-06-04T23:54:01+00:00

CONFIG_MM_OWNER makes no sense.  It is not user-selectable, it is only
selected by CONFIG_MEMCG automatically.  So we can kill this option in
init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally.

Signed-off-by: Oleg Nesterov 
Acked-by: Michal Hocko 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: get rid of __GFP_KMEMCG

2014-06-04T23:53:56+00:00

Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator.  The page
allocated is then to be freed by free_memcg_kmem_pages.  Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path.  So let's introduce separate functions that will
alloc/free pages charged to kmemcg.

The new functions are called alloc_kmem_pages and free_kmem_pages.  They
should be used when the caller actually would like to use kmalloc, but
has to fall back to the page allocator for the allocation is large.
They only differ from alloc_pages and free_pages in that besides
allocating or freeing pages they also charge them to the kmem resource
counter of the current memory cgroup.

[sfr@canb.auug.org.au: export kmalloc_order() to modules]
Signed-off-by: Vladimir Davydov 
Acked-by: Greg Thelen 
Cc: Johannes Weiner 
Acked-by: Michal Hocko 
Cc: Glauber Costa 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Signed-off-by: Stephen Rothwell 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kernel: use macros from compiler.h instead of attribute((...))

2014-04-07T23:36:11+00:00

To increase compiler portability there is  which
provides convenience macros for various gcc constructs.  Eg: __weak for
__attribute__((weak)).  I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.

Signed-off-by: Gideon Israel Dsouza 
Cc: "Rafael J. Wysocki" 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, mempolicy: remove per-process flag

2014-04-07T23:35:54+00:00

PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
There's no significant performance degradation to checking
current->mempolicy rather than current->flags & PF_MEMPOLICY in the
allocation path, especially since this is considered unlikely().

Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
64GB of memory and without a mempolicy:

	threads		before		after
	16		1249409		1244487
	32		1281786		1246783
	48		1239175		1239138
	64		1244642		1241841
	80		1244346		1248918
	96		1266436		1254316
	112		1307398		1312135
	128		1327607		1326502

Per-process flags are a scarce resource so we should free them up whenever
possible and make them available.  We'll be using it shortly for memcg oom
reserves.

Signed-off-by: David Rientjes 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: KAMEZAWA Hiroyuki 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: Tejun Heo 
Cc: Mel Gorman 
Cc: Oleg Nesterov 
Cc: Rik van Riel 
Cc: Jianguo Wu 
Cc: Tim Hockin 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fork: collapse copy_flags into copy_process

2014-04-07T23:35:54+00:00

copy_flags() does not use the clone_flags formal and can be collapsed
into copy_process() for cleaner code.

Signed-off-by: David Rientjes 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: KAMEZAWA Hiroyuki 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: Tejun Heo 
Cc: Mel Gorman 
Cc: Oleg Nesterov 
Cc: Rik van Riel 
Cc: Jianguo Wu 
Cc: Tim Hockin 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: per-thread vma caching

2014-04-07T23:35:53+00:00

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso 
Reviewed-by: Rik van Riel 
Acked-by: Linus Torvalds 
Reviewed-by: Michel Lespinasse 
Cc: Oleg Nesterov 
Tested-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE

2014-04-07T23:35:52+00:00

Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
also adds a prctl control which allows us to set the THP disable bit in
mm->def_flags so that VMs will pick up the setting as they are created.

Signed-off-by: Alex Thorlton 
Suggested-by: Oleg Nesterov 
Cc: Gerald Schaefer 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Christian Borntraeger 
Cc: Paolo Bonzini 
Cc: "Kirill A. Shutemov" 
Cc: Mel Gorman 
Acked-by: Rik van Riel 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Andrea Arcangeli 
Cc: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Cc: Alexander Viro 
Cc: Johannes Weiner 
Cc: David Rientjes 
Cc: Paolo Bonzini 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

linux-toradex.git/kernel/fork.c, branch v3.16.7

perf: fix perf bug in fork()

tracing: Fix syscall_*regfunc() vs copy_process() race

ptrace: fix fork event messages across pid namespaces

memcg: kill CONFIG_MM_OWNER

mm: get rid of __GFP_KMEMCG

kernel: use macros from compiler.h instead of __attribute__((...))

mm, mempolicy: remove per-process flag

fork: collapse copy_flags into copy_process

mm: per-thread vma caching

mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE

kernel: use macros from compiler.h instead of attribute((...))