linux-toradex.git/fs/namespace.c, branch v4.4.16

mnt: If fs_fully_visible fails call put_filesystem.

2016-07-27T16:47:28+00:00

commit 97c1df3e54e811aed484a036a798b4b25d002ecf upstream.

Add this trivial missing error handling.

Fixes: 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace")
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

mnt: Account for MS_RDONLY in fs_fully_visible

2016-07-27T16:47:28+00:00

commit 695e9df010e40f407f4830dc11d53dce957710ba upstream.

In rare cases it is possible for s_flags & MS_RDONLY to be set but
MNT_READONLY to be clear.  This starting combination can cause
fs_fully_visible to fail to ensure that the new mount is readonly.
Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY
is set on the source filesystem of the mount.

In general both MS_RDONLY and MNT_READONLY are set at the same for
mounts so I don't expect any programs to care.  Nor do I expect
MS_RDONLY to be set on proc or sysfs in the initial user namespace,
which further decreases the likelyhood of problems.

Which means this change should only affect system configurations by
paranoid sysadmins who should welcome the additional protection
as it keeps people from wriggling out of their policies.

Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime")
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

mnt: fs_fully_visible test the proper mount for MNT_LOCKED

2016-07-27T16:47:28+00:00

commit d71ed6c930ac7d8f88f3cef6624a7e826392d61f upstream.

MNT_LOCKED implies on a child mount implies the child is locked to the
parent.  So while looping through the children the children should be
tested (not their parent).

Typically an unshare of a mount namespace locks all mounts together
making both the parent and the slave as locked but there are a few
corner cases where other things work.

Fixes: ceeb0e5d39fc ("vfs: Ignore unlocked mounts in fs_fully_visible")
Reported-by: Seth Forshee 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

2015-09-01T23:13:25+00:00

Pull user namespace updates from Eric Biederman:
 "This finishes up the changes to ensure proc and sysfs do not start
  implementing executable files, as the there are application today that
  are only secure because such files do not exist.

  It akso fixes a long standing misfeature of /proc//mountinfo that
  did not show the proper source for files bind mounted from
  /proc//ns/*.

  It also straightens out the handling of clone flags related to user
  namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER)
  when files such as /proc//environ are read while  is calling
  unshare.  This winds up fixing a minor bug in unshare flag handling
  that dates back to the first version of unshare in the kernel.

  Finally, this fixes a minor regression caused by the introduction of
  sysfs_create_mount_point, which broke someone's in house application,
  by restoring the size of /sys/fs/cgroup to 0 bytes.  Apparently that
  application uses the directory size to determine if a tmpfs is mounted
  on /sys/fs/cgroup.

  The bind mount escape fixes are present in Al Viros for-next branch.
  and I expect them to come from there.  The bind mount escape is the
  last of the user namespace related security bugs that I am aware of"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs: Set the size of empty dirs to 0.
  userns,pidns: Force thread group sharing, not signal handler sharing.
  unshare: Unsharing a thread does not require unsharing a vm
  nsfs: Add a show_path method to fix mountinfo
  mnt: fs_fully_visible enforce noexec and nosuid  if !SB_I_NOEXEC
  vfs: Commit to never having exectuables on proc and sysfs.

mnt: In detach_mounts detach the appropriate unmounted mount

2015-07-23T16:31:15+00:00

The handling of in detach_mounts of unmounted but connected mounts is
buggy and can lead to an infinite loop.

Correct the handling of unmounted mounts in detach_mount.  When the
mountpoint of an unmounted but connected mount is connected to a
dentry, and that dentry is deleted we need to disconnect that mount
from the parent mount and the deleted dentry.

Nothing changes for the unmounted and connected children.  They can be
safely ignored.

Cc: stable@vger.kernel.org
Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Signed-off-by: "Eric W. Biederman"

mnt: Clarify and correct the disconnect logic in umount_tree

2015-07-23T01:33:27+00:00

rmdir mntpoint will result in an infinite loop when there is
a mount locked on the mountpoint in another mount namespace.

This is because the logic to test to see if a mount should
be disconnected in umount_tree is buggy.

Move the logic to decide if a mount should remain connected to
it's mountpoint into it's own function disconnect_mount so that
clarity of expression instead of terseness of expression becomes
a virtue.

When the conditions where it is invalid to leave a mount connected
are first ruled out, the logic for deciding if a mount should
be disconnected becomes much clearer and simpler.

Fixes: e0c9c0afd2fc958ffa34b697972721d81df8a56f mnt: Update detach_mounts to leave mounts connected
Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC

2015-07-10T15:41:13+00:00

The filesystems proc and sysfs do not have executable files do not
have exectuable files today and portions of userspace break if we do
enforce nosuid and noexec consistency of nosuid and noexec flags
between previous mounts and new mounts of proc and sysfs.

Add the code to enforce consistency of the nosuid and noexec flags,
and use the presence of SB_I_NOEXEC to signal that there is no need to
bother.

This results in a completely userspace invisible change that makes it
clear fs_fully_visible can only skip the enforcement of noexec and
nosuid because it is known the filesystems in question do not support
executables.

Signed-off-by: "Eric W. Biederman"

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

2015-07-03T22:20:57+00:00

Pull user namespace updates from Eric Biederman:
 "Long ago and far away when user namespaces where young it was realized
  that allowing fresh mounts of proc and sysfs with only user namespace
  permissions could violate the basic rule that only root gets to decide
  if proc or sysfs should be mounted at all.

  Some hacks were put in place to reduce the worst of the damage could
  be done, and the common sense rule was adopted that fresh mounts of
  proc and sysfs should allow no more than bind mounts of proc and
  sysfs.  Unfortunately that rule has not been fully enforced.

  There are two kinds of gaps in that enforcement.  Only filesystems
  mounted on empty directories of proc and sysfs should be ignored but
  the test for empty directories was insufficient.  So in my tree
  directories on proc, sysctl and sysfs that will always be empty are
  created specially.  Every other technique is imperfect as an ordinary
  directory can have entries added even after a readdir returns and
  shows that the directory is empty.  Special creation of directories
  for mount points makes the code in the kernel a smidge clearer about
  it's purpose.  I asked container developers from the various container
  projects to help test this and no holes were found in the set of mount
  points on proc and sysfs that are created specially.

  This set of changes also starts enforcing the mount flags of fresh
  mounts of proc and sysfs are consistent with the existing mount of
  proc and sysfs.  I expected this to be the boring part of the work but
  unfortunately unprivileged userspace winds up mounting fresh copies of
  proc and sysfs with noexec and nosuid clear when root set those flags
  on the previous mount of proc and sysfs.  So for now only the atime,
  read-only and nodev attributes which userspace happens to keep
  consistent are enforced.  Dealing with the noexec and nosuid
  attributes remains for another time.

  This set of changes also addresses an issue with how open file
  descriptors from /proc//ns/* are displayed.  Recently readlink of
  /proc//fd has been triggering a WARN_ON that has not been
  meaningful since it was added (as all of the code in the kernel was
  converted) and is not now actively wrong.

  There is also a short list of issues that have not been fixed yet that
  I will mention briefly.

  It is possible to rename a directory from below to above a bind mount.
  At which point any directory pointers below the renamed directory can
  be walked up to the root directory of the filesystem.  With user
  namespaces enabled a bind mount of the bind mount can be created
  allowing the user to pick a directory whose children they can rename
  to outside of the bind mount.  This is challenging to fix and doubly
  so because all obvious solutions must touch code that is in the
  performance part of pathname resolution.

  As mentioned above there is also a question of how to ensure that
  developers by accident or with purpose do not introduce exectuable
  files on sysfs and proc and in doing so introduce security regressions
  in the current userspace that will not be immediately obvious and as
  such are likely to require breaking userspace in painful ways once
  they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Remove incorrect debugging WARN in prepend_path
  mnt: Update fs_fully_visible to test for permanently empty directories
  sysfs: Create mountpoints with sysfs_create_mount_point
  sysfs: Add support for permanently empty directories to serve as mount points.
  kernfs: Add support for always empty directories.
  proc: Allow creating permanently empty directories that serve as mount points
  sysctl: Allow creating permanently empty directories that serve as mountpoints.
  fs: Add helper functions for permanently empty directories.
  vfs: Ignore unlocked mounts in fs_fully_visible
  mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
  mnt: Refactor the logic for mounting sysfs and proc in a user namespace

mnt: Update fs_fully_visible to test for permanently empty directories

2015-07-01T15:36:49+00:00

fs_fully_visible attempts to make fresh mounts of proc and sysfs give
the mounter no more access to proc and sysfs than if they could have
by creating a bind mount.  One aspect of proc and sysfs that makes
this particularly tricky is that there are other filesystems that
typically mount on top of proc and sysfs.  As those filesystems are
mounted on empty directories in practice it is safe to ignore them.
However testing to ensure filesystems are mounted on empty directories
has not been something the in kernel data structures have supported so
the current test for an empty directory which checks to see
if nlink <= 2 is a bit lacking.

proc and sysfs have recently been modified to use the new empty_dir
infrastructure to create all of their dedicated mount points.  Instead
of testing for S_ISDIR(inode->i_mode) && i_nlink <= 2 to see if a
directory is empty, test for is_empty_dir_inode(inode).  That small
change guaranteess mounts found on proc and sysfs really are safe to
ignore, because the directories are not only empty but nothing can
ever be added to them.  This guarantees there is nothing to worry
about when mounting proc and sysfs.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

vfs: Ignore unlocked mounts in fs_fully_visible

2015-07-01T15:36:35+00:00

Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"