<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sys.c, branch v2.6.27-rc1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>kexec jump</title>
<updated>2008-07-26T19:00:04+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2008-07-26T02:45:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3ab83521378268044a448113c6aa9a9e245f4d2f'/>
<id>3ab83521378268044a448113c6aa9a9e245f4d2f</id>
<content type='text'>
This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying &lt;ying.huang@intel.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Nigel Cunningham &lt;nigel@nigel.suspend2.net&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying &lt;ying.huang@intel.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Nigel Cunningham &lt;nigel@nigel.suspend2.net&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unexport uts_sem</title>
<updated>2008-07-25T17:53:45+00:00</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@kernel.org</email>
</author>
<published>2008-07-25T08:48:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7394f0f6c0baab650ea9194cb1be847df646fb57'/>
<id>7394f0f6c0baab650ea9194cb1be847df646fb57</id>
<content type='text'>
With the removal of the Solaris binary emulation the export of
uts_sem became unused.

Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the removal of the Solaris binary emulation the export of
uts_sem became unused.

Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>call_usermodehelper(): increase reliability</title>
<updated>2008-07-25T17:53:28+00:00</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2008-07-25T08:45:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ac331d158e198d2a91a5b0a3ec4ca9991fdb57af'/>
<id>ac331d158e198d2a91a5b0a3ec4ca9991fdb57af</id>
<content type='text'>
Presently call_usermodehelper_setup() uses GFP_ATOMIC.  but it can return
NULL _very_ easily.

GFP_ATOMIC is needed only when we can't sleep.  and, GFP_KERNEL is robust
and better.

thus, I add gfp_mask argument to call_usermodehelper_setup().

So, its callers pass the gfp_t as below:

call_usermodehelper() and call_usermodehelper_keys():
	depend on 'wait' argument.
call_usermodehelper_pipe():
	always GFP_KERNEL because always run under process context.
orderly_poweroff():
	pass to GFP_ATOMIC because may run under interrupt context.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: "Paul Menage" &lt;menage@google.com&gt;
Reviewed-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Presently call_usermodehelper_setup() uses GFP_ATOMIC.  but it can return
NULL _very_ easily.

GFP_ATOMIC is needed only when we can't sleep.  and, GFP_KERNEL is robust
and better.

thus, I add gfp_mask argument to call_usermodehelper_setup().

So, its callers pass the gfp_t as below:

call_usermodehelper() and call_usermodehelper_keys():
	depend on 'wait' argument.
call_usermodehelper_pipe():
	always GFP_KERNEL because always run under process context.
orderly_poweroff():
	pass to GFP_ATOMIC because may run under interrupt context.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: "Paul Menage" &lt;menage@google.com&gt;
Reviewed-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sys_prctl(): fix return of uninitialized value</title>
<updated>2008-05-24T16:56:13+00:00</updated>
<author>
<name>Shi Weihua</name>
<email>shiwh@cn.fujitsu.com</email>
</author>
<published>2008-05-23T20:04:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b26655f6208fdefa9ab0adc016116324f8d4ba8'/>
<id>7b26655f6208fdefa9ab0adc016116324f8d4ba8</id>
<content type='text'>
If none of the switch cases match, the PR_SET_PDEATHSIG and
PR_SET_DUMPABLE cases of the switch statement will never write to local
variable `error'.

Signed-off-by: Shi Weihua &lt;shiwh@cn.fujitsu.com&gt;
Cc: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Acked-by: "Serge E. Hallyn" &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If none of the switch cases match, the PR_SET_PDEATHSIG and
PR_SET_DUMPABLE cases of the switch statement will never write to local
variable `error'.

Signed-off-by: Shi Weihua &lt;shiwh@cn.fujitsu.com&gt;
Cc: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Acked-by: "Serge E. Hallyn" &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pids: sys_getpgid: fix unsafe *pid usage, s/tasklist/rcu/</title>
<updated>2008-04-30T15:29:49+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-04-30T07:54:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=12a3de0a965826096d8adc593bcf4392a7d5b459'/>
<id>12a3de0a965826096d8adc593bcf4392a7d5b459</id>
<content type='text'>
1. sys_getpgid() needs rcu_read_lock() to derive the pgrp _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setpgid().

2. Use rcu_read_lock() instead of tasklist_lock when pid != 0, make sure
   that we don't use the NULL pid if the task exits right after successful
   find_task_by_vpid().

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1. sys_getpgid() needs rcu_read_lock() to derive the pgrp _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setpgid().

2. Use rcu_read_lock() instead of tasklist_lock when pid != 0, make sure
   that we don't use the NULL pid if the task exits right after successful
   find_task_by_vpid().

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pids: sys_getsid: fix unsafe *pid usage, fix possible 0 instead of -ESRCH</title>
<updated>2008-04-30T15:29:49+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-04-30T07:54:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1dd768c0815334d2319d6377f0750ace075b6142'/>
<id>1dd768c0815334d2319d6377f0750ace075b6142</id>
<content type='text'>
1. sys_getsid() needs rcu_read_lock() to derive the session _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setsid().

2. The task can exit between find_task_by_vpid() and task_session_vnr(),
   in that unlikely case sys_getsid() returns 0 instead of -ESRCH.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1. sys_getsid() needs rcu_read_lock() to derive the session _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setsid().

2. The task can exit between find_task_by_vpid() and task_session_vnr(),
   in that unlikely case sys_getsid() returns 0 instead of -ESRCH.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pids: sys_setpgid: use change_pid() helper</title>
<updated>2008-04-30T15:29:49+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-04-30T07:54:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=83beaf3c6c75b36b7c9be7f555c8cf7797842cc5'/>
<id>83beaf3c6c75b36b7c9be7f555c8cf7797842cc5</id>
<content type='text'>
Use change_pid() instead of detach_pid() + attach_pid() in sys_setpgid().

This way task_pgrp() is not NULL in between.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc:  "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use change_pid() instead of detach_pid() + attach_pid() in sys_setpgid().

This way task_pgrp() is not NULL in between.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc:  "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>k_getrusage: don't take rcu_read_lock()</title>
<updated>2008-04-30T15:29:34+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-04-30T07:52:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d6cf723a142f63ccb92272bc0e9bfffd3c3a5cac'/>
<id>d6cf723a142f63ccb92272bc0e9bfffd3c3a5cac</id>
<content type='text'>
Just a trivial example, more to come.

k_getrusage() holds rcu_read_lock() because it was previously required by
lock_task_sighand().  Unneeded now.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just a trivial example, more to come.

k_getrusage() holds rcu_read_lock() because it was previously required by
lock_task_sighand().  Unneeded now.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>add RUSAGE_THREAD</title>
<updated>2008-04-29T15:05:59+00:00</updated>
<author>
<name>Sripathi Kodi</name>
<email>sripathik@in.ibm.com</email>
</author>
<published>2008-04-29T07:58:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=679c9cd4acc2cf2872171813752eab3320273339'/>
<id>679c9cd4acc2cf2872171813752eab3320273339</id>
<content type='text'>
Add the RUSAGE_THREAD option for the getrusage system call.  This is
essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the
line about RUSAGE_LWP line has been removed, as suggested by Ulrich and
Christoph.

Signed-off-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Sripathi Kodi &lt;sripathik@in.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the RUSAGE_THREAD option for the getrusage system call.  This is
essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the
line about RUSAGE_LWP line has been removed, as suggested by Ulrich and
Christoph.

Signed-off-by: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Sripathi Kodi &lt;sripathik@in.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>capabilities: implement per-process securebits</title>
<updated>2008-04-28T15:58:26+00:00</updated>
<author>
<name>Andrew G. Morgan</name>
<email>morgan@kernel.org</email>
</author>
<published>2008-04-28T09:13:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3898b1b4ebff8dcfbcf1807e0661585e06c9a91c'/>
<id>3898b1b4ebff8dcfbcf1807e0661585e06c9a91c</id>
<content type='text'>
Filesystem capability support makes it possible to do away with (set)uid-0
based privilege and use capabilities instead.  That is, with filesystem
support for capabilities but without this present patch, it is (conceptually)
possible to manage a system with capabilities alone and never need to obtain
privilege via (set)uid-0.

Of course, conceptually isn't quite the same as currently possible since few
user applications, certainly not enough to run a viable system, are currently
prepared to leverage capabilities to exercise privilege.  Further, many
applications exist that may never get upgraded in this way, and the kernel
will continue to want to support their setuid-0 base privilege needs.

Where pure-capability applications evolve and replace setuid-0 binaries, it is
desirable that there be a mechanisms by which they can contain their
privilege.  In addition to leveraging the per-process bounding and inheritable
sets, this should include suppressing the privilege of the uid-0 superuser
from the process' tree of children.

The feature added by this patch can be leveraged to suppress the privilege
associated with (set)uid-0.  This suppression requires CAP_SETPCAP to
initiate, and only immediately affects the 'current' process (it is inherited
through fork()/exec()).  This reimplementation differs significantly from the
historical support for securebits which was system-wide, unwieldy and which
has ultimately withered to a dead relic in the source of the modern kernel.

With this patch applied a process, that is capable(CAP_SETPCAP), can now drop
all legacy privilege (through uid=0) for itself and all subsequently
fork()'d/exec()'d children with:

  prctl(PR_SET_SECUREBITS, 0x2f);

This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is
enabled at configure time.

[akpm@linux-foundation.org: fix uninitialised var warning]
[serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY]
Signed-off-by: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Reviewed-by: James Morris &lt;jmorris@namei.org&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: Paul Moore &lt;paul.moore@hp.com&gt;
Signed-off-by: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Filesystem capability support makes it possible to do away with (set)uid-0
based privilege and use capabilities instead.  That is, with filesystem
support for capabilities but without this present patch, it is (conceptually)
possible to manage a system with capabilities alone and never need to obtain
privilege via (set)uid-0.

Of course, conceptually isn't quite the same as currently possible since few
user applications, certainly not enough to run a viable system, are currently
prepared to leverage capabilities to exercise privilege.  Further, many
applications exist that may never get upgraded in this way, and the kernel
will continue to want to support their setuid-0 base privilege needs.

Where pure-capability applications evolve and replace setuid-0 binaries, it is
desirable that there be a mechanisms by which they can contain their
privilege.  In addition to leveraging the per-process bounding and inheritable
sets, this should include suppressing the privilege of the uid-0 superuser
from the process' tree of children.

The feature added by this patch can be leveraged to suppress the privilege
associated with (set)uid-0.  This suppression requires CAP_SETPCAP to
initiate, and only immediately affects the 'current' process (it is inherited
through fork()/exec()).  This reimplementation differs significantly from the
historical support for securebits which was system-wide, unwieldy and which
has ultimately withered to a dead relic in the source of the modern kernel.

With this patch applied a process, that is capable(CAP_SETPCAP), can now drop
all legacy privilege (through uid=0) for itself and all subsequently
fork()'d/exec()'d children with:

  prctl(PR_SET_SECUREBITS, 0x2f);

This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is
enabled at configure time.

[akpm@linux-foundation.org: fix uninitialised var warning]
[serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY]
Signed-off-by: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Reviewed-by: James Morris &lt;jmorris@namei.org&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: Paul Moore &lt;paul.moore@hp.com&gt;
Signed-off-by: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
