<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/fork.c, branch v3.7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal</title>
<updated>2012-10-10T03:02:25+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-10T03:02:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=42859eea96ba6beabfb0369a1eeffa3c7d2bd9cb'/>
<id>42859eea96ba6beabfb0369a1eeffa3c7d2bd9cb</id>
<content type='text'>
Pull generic execve() changes from Al Viro:
 "This introduces the generic kernel_thread() and kernel_execve()
  functions, and switches x86, arm, alpha, um and s390 over to them."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
  s390: convert to generic kernel_execve()
  s390: switch to generic kernel_thread()
  s390: fold kernel_thread_helper() into ret_from_fork()
  s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
  um: switch to generic kernel_thread()
  x86, um/x86: switch to generic sys_execve and kernel_execve
  x86: split ret_from_fork
  alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  alpha: switch to generic kernel_thread()
  alpha: switch to generic sys_execve()
  arm: get rid of execve wrapper, switch to generic execve() implementation
  arm: optimized current_pt_regs()
  arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
  generic sys_execve()
  generic kernel_execve()
  new helper: current_pt_regs()
  preparation for generic kernel_thread()
  um: kill thread-&gt;forking
  um: let signal_delivered() do SIGTRAP on singlestepping into handler
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull generic execve() changes from Al Viro:
 "This introduces the generic kernel_thread() and kernel_execve()
  functions, and switches x86, arm, alpha, um and s390 over to them."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
  s390: convert to generic kernel_execve()
  s390: switch to generic kernel_thread()
  s390: fold kernel_thread_helper() into ret_from_fork()
  s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
  um: switch to generic kernel_thread()
  x86, um/x86: switch to generic sys_execve and kernel_execve
  x86: split ret_from_fork
  alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  alpha: switch to generic kernel_thread()
  alpha: switch to generic sys_execve()
  arm: get rid of execve wrapper, switch to generic execve() implementation
  arm: optimized current_pt_regs()
  arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
  generic sys_execve()
  generic kernel_execve()
  new helper: current_pt_regs()
  preparation for generic kernel_thread()
  um: kill thread-&gt;forking
  um: let signal_delivered() do SIGTRAP on singlestepping into handler
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: interval tree updates</title>
<updated>2012-10-09T07:22:40+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2012-10-08T23:31:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9826a516ff77c5820e591211e4f3e58ff36f46be'/>
<id>9826a516ff77c5820e591211e4f3e58ff36f46be</id>
<content type='text'>
Update the generic interval tree code that was introduced in "mm: replace
vma prio_tree with an interval tree".

Changes:

- fixed 'endpoing' typo noticed by Andrew Morton

- replaced include/linux/interval_tree_tmpl.h, which was used as a
  template (including it automatically defined the interval tree
  functions) with include/linux/interval_tree_generic.h, which only
  defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
  defines the interval tree functions when invoked. Now that is a very
  long macro which is unfortunate, but it does make the usage sites
  (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

- make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
  instead of duplicating that code in the interval tree template.

- replaced vma_interval_tree_add(), which was actually handling the
  nonlinear and interval tree cases, with vma_interval_tree_insert_after()
  which handles only the interval tree case and has an API that is more
  consistent with the other interval tree handling functions.
  The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Daniel Santos &lt;daniel.santos@pobox.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update the generic interval tree code that was introduced in "mm: replace
vma prio_tree with an interval tree".

Changes:

- fixed 'endpoing' typo noticed by Andrew Morton

- replaced include/linux/interval_tree_tmpl.h, which was used as a
  template (including it automatically defined the interval tree
  functions) with include/linux/interval_tree_generic.h, which only
  defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
  defines the interval tree functions when invoked. Now that is a very
  long macro which is unfortunate, but it does make the usage sites
  (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

- make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
  instead of duplicating that code in the interval tree template.

- replaced vma_interval_tree_add(), which was actually handling the
  nonlinear and interval tree cases, with vma_interval_tree_insert_after()
  which handles only the interval tree case and has an API that is more
  consistent with the other interval tree handling functions.
  The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Daniel Santos &lt;daniel.santos@pobox.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: replace vma prio_tree with an interval tree</title>
<updated>2012-10-09T07:22:39+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2012-10-08T23:31:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6b2dbba8b6ac4df26f72eda1e5ea7bab9f950e08'/>
<id>6b2dbba8b6ac4df26f72eda1e5ea7bab9f950e08</id>
<content type='text'>
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: remove deprecated oom_adj</title>
<updated>2012-10-09T07:22:24+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@gnu.org</email>
</author>
<published>2012-10-08T23:29:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=01dc52ebdf472f77cca623ca693ca24cfc0f1bbe'/>
<id>01dc52ebdf472f77cca623ca693ca24cfc0f1bbe</id>
<content type='text'>
The deprecated /proc/&lt;pid&gt;/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso &lt;dave@gnu.org&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The deprecated /proc/&lt;pid&gt;/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso &lt;dave@gnu.org&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: kill vma flag VM_EXECUTABLE and mm-&gt;num_exe_file_vmas</title>
<updated>2012-10-09T07:22:18+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-10-08T23:28:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e9714acf8c439688884234dcac2bfc38bb607d38'/>
<id>e9714acf8c439688884234dcac2bfc38bb607d38</id>
<content type='text'>
Currently the kernel sets mm-&gt;exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm-&gt;num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm-&gt;exe_file to NULL.  Plus it
resets mm-&gt;exe_file at last mmput() when mm-&gt;mm_users drops to zero.

VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
mm-&gt;exe_file and mmaps executable vmas with this file, they hold
mm-&gt;exe_file while task is running.

comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:

&gt; The kernel implements readlink of /proc/pid/exe by getting the file from
&gt; the first executable VMA.  Then the path to the file is reconstructed and
&gt; reported as the result.
&gt;
&gt; Because of the VMA walk the code is slightly different on nommu systems.
&gt; This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
&gt; walking the VMAs to find the first executable file-backed VMA we store a
&gt; reference to the exec'd file in the mm_struct.
&gt;
&gt; That reference would prevent the filesystem holding the executable file
&gt; from being unmounted even after unmapping the VMAs.  So we track the number
&gt; of VM_EXECUTABLE VMAs and drop the new reference when the last one is
&gt; unmapped.  This avoids pinning the mounted filesystem.

exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.

Seems like currently nobody depends on this behaviour.  We can try to
remove this logic and keep mm-&gt;exe_file until final mmput().

mm-&gt;exe_file is still protected with mm-&gt;mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
task can change its mm-&gt;exe_file and unpin mountpoint explicitly.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Kentaro Takeda &lt;takedakn@nttdata.co.jp&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the kernel sets mm-&gt;exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm-&gt;num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm-&gt;exe_file to NULL.  Plus it
resets mm-&gt;exe_file at last mmput() when mm-&gt;mm_users drops to zero.

VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
mm-&gt;exe_file and mmaps executable vmas with this file, they hold
mm-&gt;exe_file while task is running.

comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:

&gt; The kernel implements readlink of /proc/pid/exe by getting the file from
&gt; the first executable VMA.  Then the path to the file is reconstructed and
&gt; reported as the result.
&gt;
&gt; Because of the VMA walk the code is slightly different on nommu systems.
&gt; This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
&gt; walking the VMAs to find the first executable file-backed VMA we store a
&gt; reference to the exec'd file in the mm_struct.
&gt;
&gt; That reference would prevent the filesystem holding the executable file
&gt; from being unmounted even after unmapping the VMAs.  So we track the number
&gt; of VM_EXECUTABLE VMAs and drop the new reference when the last one is
&gt; unmapped.  This avoids pinning the mounted filesystem.

exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.

Seems like currently nobody depends on this behaviour.  We can try to
remove this logic and keep mm-&gt;exe_file until final mmput().

mm-&gt;exe_file is still protected with mm-&gt;mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
task can change its mm-&gt;exe_file and unpin mountpoint explicitly.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Kentaro Takeda &lt;takedakn@nttdata.co.jp&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: use mm-&gt;exe_file instead of first VM_EXECUTABLE vma-&gt;vm_file</title>
<updated>2012-10-09T07:22:18+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-10-08T23:28:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2dd8ad81e31d0d36a5d448329c646ab43eb17788'/>
<id>2dd8ad81e31d0d36a5d448329c646ab43eb17788</id>
<content type='text'>
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
a task's executable file.  After this patch they will use mm-&gt;exe_file
directly.  mm-&gt;exe_file is protected with mm-&gt;mmap_sem, so locking stays
the same.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Acked-by: Chris Metcalf &lt;cmetcalf@tilera.com&gt;			[arch/tile]
Acked-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;	[tomoyo]
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Kentaro Takeda &lt;takedakn@nttdata.co.jp&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
a task's executable file.  After this patch they will use mm-&gt;exe_file
directly.  mm-&gt;exe_file is protected with mm-&gt;mmap_sem, so locking stays
the same.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Acked-by: Chris Metcalf &lt;cmetcalf@tilera.com&gt;			[arch/tile]
Acked-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;	[tomoyo]
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Kentaro Takeda &lt;takedakn@nttdata.co.jp&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2012-10-02T20:38:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-02T20:38:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aecdc33e111b2c447b622e287c6003726daa1426'/>
<id>aecdc33e111b2c447b622e287c6003726daa1426</id>
<content type='text'>
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-10-01T17:43:39+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-01T17:43:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0b981cb94bc63a2d0e5eccccdca75fe57643ffce'/>
<id>0b981cb94bc63a2d0e5eccccdca75fe57643ffce</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:
 "Continued quest to clean up and enhance the cputime code by Frederic
  Weisbecker, in preparation for future tickless kernel features.

  Other than that, smallish changes."

Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  cputime: Make finegrained irqtime accounting generally available
  cputime: Gather time/stats accounting config options into a single menu
  ia64: Reuse system and user vtime accounting functions on task switch
  ia64: Consolidate user vtime accounting
  vtime: Consolidate system/idle context detection
  cputime: Use a proper subsystem naming for vtime related APIs
  sched: cpu_power: enable ARCH_POWER
  sched/nohz: Clean up select_nohz_load_balancer()
  sched: Fix load avg vs. cpu-hotplug
  sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix nohz_idle_balance()
  sched: Remove useless code in yield_to()
  sched: Add time unit suffix to sched sysctl knobs
  sched/debug: Limit sd-&gt;*_idx range on sysctl
  sched: Remove AFFINE_WAKEUPS feature flag
  s390: Remove leftover account_tick_vtime() header
  cputime: Consolidate vtime handling on context switch
  sched: Move cputime code to its own file
  cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
  tile: Remove SD_PREFER_LOCAL leftover
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler changes from Ingo Molnar:
 "Continued quest to clean up and enhance the cputime code by Frederic
  Weisbecker, in preparation for future tickless kernel features.

  Other than that, smallish changes."

Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  cputime: Make finegrained irqtime accounting generally available
  cputime: Gather time/stats accounting config options into a single menu
  ia64: Reuse system and user vtime accounting functions on task switch
  ia64: Consolidate user vtime accounting
  vtime: Consolidate system/idle context detection
  cputime: Use a proper subsystem naming for vtime related APIs
  sched: cpu_power: enable ARCH_POWER
  sched/nohz: Clean up select_nohz_load_balancer()
  sched: Fix load avg vs. cpu-hotplug
  sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix nohz_idle_balance()
  sched: Remove useless code in yield_to()
  sched: Add time unit suffix to sched sysctl knobs
  sched/debug: Limit sd-&gt;*_idx range on sysctl
  sched: Remove AFFINE_WAKEUPS feature flag
  s390: Remove leftover account_tick_vtime() header
  cputime: Consolidate vtime handling on context switch
  sched: Move cputime code to its own file
  cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
  tile: Remove SD_PREFER_LOCAL leftover
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>preparation for generic kernel_thread()</title>
<updated>2012-09-30T17:35:55+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-09-21T23:55:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2aa3a7f8660355c3dddead17e224545c1a3d5a5f'/>
<id>2aa3a7f8660355c3dddead17e224545c1a3d5a5f</id>
<content type='text'>
Let architectures select GENERIC_KERNEL_THREAD and have their copy_thread()
treat NULL regs as "it came from kernel_thread(), sp argument contains
the function new thread will be calling and stack_size - the argument for
that function".  Switching the architectures begins shortly...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let architectures select GENERIC_KERNEL_THREAD and have their copy_thread()
treat NULL regs as "it came from kernel_thread(), sp argument contains
the function new thread will be calling and stack_size - the argument for
that function".  Switching the architectures begins shortly...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: use a per task frag allocator</title>
<updated>2012-09-24T20:31:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-09-23T23:04:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81'/>
<id>5640f7685831e088fe6c2e1f863a6805962f8e81</id>
<content type='text'>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk-&gt;sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk-&gt;sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
