linux-toradex.git/security/commoncap.c, branch v2.6.25-rc2

capabilities: introduce per-process capability bounding set

2008-02-05T17:44:20+00:00

The capability bounding set is a set beyond which capabilities cannot grow.
 Currently cap_bset is per-system.  It can be manipulated through sysctl,
but only init can add capabilities.  Root can remove capabilities.  By
default it includes all caps except CAP_SETPCAP.

This patch makes the bounding set per-process when file capabilities are
enabled.  It is inherited at fork from parent.  Noone can add elements,
CAP_SETPCAP is required to remove them.

One example use of this is to start a safer container.  For instance, until
device namespaces or per-container device whitelists are introduced, it is
best to take CAP_MKNOD away from a container.

The bounding set will not affect pP and pE immediately.  It will only
affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
and exec() does not constrain pI'.  So to really start a shell with no way
of regain CAP_MKNOD, you would do

	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
	cap_t cap = cap_get_proc();
	cap_value_t caparray[1];
	caparray[0] = CAP_MKNOD;
	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
	cap_set_proc(cap);
	cap_free(cap);

The following test program will get and set the bounding
set (but not pI).  For instance

	./bset get
		(lists capabilities in bset)
	./bset drop cap_net_raw
		(starts shell with new bset)
		(use capset, setuid binary, or binary with
		file capabilities to try to increase caps)

************************************************************
cap_bound.c
************************************************************
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 #ifndef PR_CAPBSET_READ
 #define PR_CAPBSET_READ 23
 #endif

 #ifndef PR_CAPBSET_DROP
 #define PR_CAPBSET_DROP 24
 #endif

int usage(char *me)
{
	printf("Usage: %s get\n", me);
	printf("       %s drop \n", me);
	return 1;
}

 #define numcaps 32
char *captable[numcaps] = {
	"cap_chown",
	"cap_dac_override",
	"cap_dac_read_search",
	"cap_fowner",
	"cap_fsetid",
	"cap_kill",
	"cap_setgid",
	"cap_setuid",
	"cap_setpcap",
	"cap_linux_immutable",
	"cap_net_bind_service",
	"cap_net_broadcast",
	"cap_net_admin",
	"cap_net_raw",
	"cap_ipc_lock",
	"cap_ipc_owner",
	"cap_sys_module",
	"cap_sys_rawio",
	"cap_sys_chroot",
	"cap_sys_ptrace",
	"cap_sys_pacct",
	"cap_sys_admin",
	"cap_sys_boot",
	"cap_sys_nice",
	"cap_sys_resource",
	"cap_sys_time",
	"cap_sys_tty_config",
	"cap_mknod",
	"cap_lease",
	"cap_audit_write",
	"cap_audit_control",
	"cap_setfcap"
};

int getbcap(void)
{
	int comma=0;
	unsigned long i;
	int ret;

	printf("i know of %d capabilities\n", numcaps);
	printf("capability bounding set:");
	for (i=0; i
Signed-off-by: Andrew G. Morgan 
Cc: Stephen Smalley 
Cc: James Morris 
Cc: Chris Wright 
Cc: Casey Schaufler a
Signed-off-by: "Serge E. Hallyn" 
Tested-by: Jiri Slaby 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Add 64-bit capability support to the kernel

2008-02-05T17:44:20+00:00

The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities.  If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.

FWIW libcap-2.00 supports this change (and earlier capability formats)

 http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/

[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan 
Cc: Stephen Smalley 
Acked-by: Serge Hallyn 
Cc: Chris Wright 
Cc: James Morris 
Cc: Casey Schaufler 
Signed-off-by: Erez Zadok 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

revert "capabilities: clean up file capability reading"

2008-02-05T17:44:20+00:00

Revert b68680e4731abbd78863063aaa0dca2a6d8cc723 to make way for the next
patch: "Add 64-bit capability support to the kernel".

We want to keep the vfs_cap_data.data[] structure, using two 'data's for
64-bit caps (and later three for 96-bit caps), whereas
b68680e4731abbd78863063aaa0dca2a6d8cc723 had gotten rid of the 'data' struct
made its members inline.

The 64-bit caps patch keeps the stack abuse fix at get_file_caps(), which was
the more important part of that patch.

[akpm@linux-foundation.org: coding-style fixes]
Cc: Stephen Smalley 
Cc: Serge Hallyn 
Cc: Chris Wright 
Cc: James Morris 
Cc: Casey Schaufler 
Cc: Andrew Morgan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Fix filesystem capability support

2008-01-22T03:39:41+00:00

In linux-2.6.24-rc1, security/commoncap.c:cap_inh_is_capped() was
introduced. It has the exact reverse of its intended behavior. This
led to an unintended privilege esculation involving a process'
inheritable capability set.

To be exposed to this bug, you need to have Filesystem Capabilities
enabled and in use. That is:

- CONFIG_SECURITY_FILE_CAPABILITIES must be defined for the buggy code
  to be compiled in.

- You also need to have files on your system marked with fI bits raised.

Signed-off-by: Andrew G. Morgan 

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

file capabilities: don't prevent signaling setuid root programs

2007-11-29T17:24:53+00:00

An unprivileged process must be able to kill a setuid root program started
by the same user.  This is legacy behavior needed for instance for xinit to
kill X when the window manager exits.

When an unprivileged user runs a setuid root program in !SECURE_NOROOT
mode, fP, fI, and fE are set full on, so pP' and pE' are full on.  Then
cap_task_kill() prevents the user from signaling the setuid root task.
This is a change in behavior compared to when
!CONFIG_SECURITY_FILE_CAPABILITIES.

This patch introduces a special check into cap_task_kill() just to check
whether a non-root user is signaling a setuid root program started by the
same user.  If so, then signal is allowed.

Signed-off-by: Serge E. Hallyn 
Cc: Andrew Morgan 
Cc: Stephen Smalley 
Cc: Chris Wright 
Cc: James Morris 
Cc: Casey Schaufler 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

file capabilities: allow sigcont within session

2007-11-15T02:45:44+00:00

Fix http://bugzilla.kernel.org/show_bug.cgi?id=9247

Allow sigcont to be sent to a process with greater capabilities if it is in
the same session.  Otherwise, a shell from which I've started a root shell
and done 'suspend' can't be restarted by the parent shell.

Also don't do file-capabilities signaling checks when uids for the
processes don't match, since the standard check_kill_permission will have
done those checks.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Serge E. Hallyn 
Acked-by: Andrew Morgan 
Cc: Chris Wright 
Tested-by: "Theodore Ts'o" 
Cc: Stephen Smalley 
Cc: "Rafael J. Wysocki" 
Cc: Chris Wright 
Cc: James Morris 
Cc: Stephen Smalley 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

capabilities: clean up file capability reading

2007-10-22T15:13:18+00:00

Simplify the vfs_cap_data structure.

Also fix get_file_caps which was declaring
__le32 v1caps[XATTR_CAPS_SZ] on the stack, but
XATTR_CAPS_SZ is already * sizeof(__le32).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn 
Cc: Andrew Morgan 
Cc: Chris Wright 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

pid namespaces: define is_global_init() and is_container_init()

2007-10-19T18:53:37+00:00

is_init() is an ambiguous name for the pid==1 check.  Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns.  But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

	2.6.22-rc4-mm2-pidns1:
	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
	  global init (/sbin/init) process. This would improve performance
	  and remove dependence on the task_pid().

	2.6.21-mm2-pidns2:

	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
	  This way, we kill only the cgroup if the cgroup's init has a
	  bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn 
Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Pavel Emelianov 
Cc: Eric W. Biederman 
Cc: Cedric Le Goater 
Cc: Dave Hansen 
Cc: Herbert Poetzel 
Cc: Kirill Korotaev 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

V3 file capabilities: alter behavior of cap_setpcap

2007-10-18T21:37:24+00:00

The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
can change the capabilities of another process, p2.  This is not the
meaning that was intended for this capability at all, and this
implementation came about purely because, without filesystem capabilities,
there was no way to use capabilities without one process bestowing them on
another.

Since we now have a filesystem support for capabilities we can fix the
implementation of CAP_SETPCAP.

The most significant thing about this change is that, with it in effect, no
process can set the capabilities of another process.

The capabilities of a program are set via the capability convolution
rules:

   pI(post-exec) = pI(pre-exec)
   pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI)
   pE(post-exec) = fE ? pP(post-exec) : 0

at exec() time.  As such, the only influence the pre-exec() program can
have on the post-exec() program's capabilities are through the pI
capability set.

The correct implementation for CAP_SETPCAP (and that enabled by this patch)
is that it can be used to add extra pI capabilities to the current process
- to be picked up by subsequent exec()s when the above convolution rules
are applied.

Here is how it works:

Let's say we have a process, p. It has capability sets, pE, pP and pI.
Generally, p, can change the value of its own pI to pI' where

   (pI' & ~pI) & ~pP = 0.

That is, the only new things in pI' that were not present in pI need to
be present in pP.

The role of CAP_SETPCAP is basically to permit changes to pI beyond
the above:

   if (pE & CAP_SETPCAP) {
      pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0  */
   }

This capability is useful for things like login, which (say, via
pam_cap) might want to raise certain inheritable capabilities for use
by the children of the logged-in user's shell, but those capabilities
are not useful to or needed by the login program itself.

One such use might be to limit who can run ping. You set the
capabilities of the 'ping' program to be "= cap_net_raw+i", and then
only shells that have (pI & CAP_NET_RAW) will be able to run
it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
would have to also have (pP & CAP_NET_RAW) in order to raise this
capability and pass it on through the inheritable set.

Signed-off-by: Andrew Morgan 
Signed-off-by: Serge E. Hallyn 
Cc: Stephen Smalley 
Cc: James Morris 
Cc: Casey Schaufler 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

security/ cleanups

2007-10-17T15:43:07+00:00

This patch contains the following cleanups that are now possible:
- remove the unused security_operations->inode_xattr_getsuffix
- remove the no longer used security_operations->unregister_security
- remove some no longer required exit code
- remove a bunch of no longer used exports

Signed-off-by: Adrian Bunk 
Acked-by: James Morris 
Cc: Chris Wright 
Cc: Stephen Smalley 
Cc: Serge Hallyn 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds