From c818c03b661cd769e035e41673d5543ba2ebda64 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 13 May 2020 14:11:26 -0700 Subject: seccomp: Report number of loaded filters in /proc/$pid/status A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answer this question through /proc/$pid/status with a "Seccomp_filters" line. Signed-off-by: Kees Cook --- include/linux/seccomp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 4192369b8418..2ec2720f83cc 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -13,6 +13,7 @@ #ifdef CONFIG_SECCOMP #include +#include #include struct seccomp_filter; @@ -29,6 +30,7 @@ struct seccomp_filter; */ struct seccomp { int mode; + atomic_t filter_count; struct seccomp_filter *filter; }; -- cgit v1.2.3 From 3a15fb6ed92cb32b0a83f406aa4a96f28c9adbc3 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Sun, 31 May 2020 13:50:29 +0200 Subject: seccomp: release filter after task is fully dead The seccomp filter used to be released in free_task() which is called asynchronously via call_rcu() and assorted mechanisms. Since we need to inform tasks waiting on the seccomp notifier when a filter goes empty we will notify them as soon as a task has been marked fully dead in release_task(). To not split seccomp cleanup into two parts, move filter release out of free_task() and into release_task() after we've unhashed struct task from struct pid, exited signals, and unlinked it from the threadgroups' thread list. We'll put the empty filter notification infrastructure into it in a follow up patch. This also renames put_seccomp_filter() to seccomp_filter_release() which is a more descriptive name of what we're doing here especially once we've added the empty filter notification mechanism in there. We're also NULL-ing the task's filter tree entrypoint which seems cleaner than leaving a dangling pointer in there. Note that this shouldn't need any memory barriers since we're calling this when the task is in release_task() which means it's EXIT_DEAD. So it can't modify its seccomp filters anymore. You can also see this from the point where we're calling seccomp_filter_release(). It's after __exit_signal() and at this point, tsk->sighand will already have been NULLed which is required for thread-sync and filter installation alike. Cc: Tycho Andersen Cc: Kees Cook Cc: Matt Denton Cc: Sargun Dhillon Cc: Jann Horn Cc: Chris Palmer Cc: Aleksa Sarai Cc: Robert Sesek Cc: Jeffrey Vander Stoep Cc: Linux Containers Signed-off-by: Christian Brauner Link: https://lore.kernel.org/r/20200531115031.391515-2-christian.brauner@ubuntu.com Signed-off-by: Kees Cook --- include/linux/seccomp.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 2ec2720f83cc..babcd6c02d09 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -84,10 +84,10 @@ static inline int seccomp_mode(struct seccomp *s) #endif /* CONFIG_SECCOMP */ #ifdef CONFIG_SECCOMP_FILTER -extern void put_seccomp_filter(struct task_struct *tsk); +extern void seccomp_filter_release(struct task_struct *tsk); extern void get_seccomp_filter(struct task_struct *tsk); #else /* CONFIG_SECCOMP_FILTER */ -static inline void put_seccomp_filter(struct task_struct *tsk) +static inline void seccomp_filter_release(struct task_struct *tsk) { return; } -- cgit v1.2.3 From 6659061045cc93f609e100b128f30581e5f012e9 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 10 Jun 2020 08:20:05 -0700 Subject: fs: Move __scm_install_fd() to __receive_fd() In preparation for users of the "install a received file" logic outside of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from net/core/scm.c to __receive_fd() in fs/file.c, and provide a wrapper named receive_fd_user(), as future patches will change the interface to __receive_fd(). Additionally add a comment to fd_install() as a counterpoint to how __receive_fd() interacts with fput(). Cc: Alexander Viro Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Dmitry Kadashev Cc: Jens Axboe Cc: Arnd Bergmann Cc: Sargun Dhillon Cc: Ido Schimmel Cc: Ioana Ciornei Cc: linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org Reviewed-by: Sargun Dhillon Acked-by: Christian Brauner Signed-off-by: Kees Cook --- include/linux/file.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/file.h b/include/linux/file.h index 122f80084a3e..b14ff2ffd0bd 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -91,6 +91,14 @@ extern void put_unused_fd(unsigned int fd); extern void fd_install(unsigned int fd, struct file *file); +extern int __receive_fd(struct file *file, int __user *ufd, + unsigned int o_flags); +static inline int receive_fd_user(struct file *file, int __user *ufd, + unsigned int o_flags) +{ + return __receive_fd(file, ufd, o_flags); +} + extern void flush_delayed_fput(void); extern void __fput_sync(struct file *); -- cgit v1.2.3 From deefa7f3505ae2fb6a7cb75f50134b65a1dd1494 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 10 Jun 2020 20:47:45 -0700 Subject: fs: Add receive_fd() wrapper for __receive_fd() For both pidfd and seccomp, the __user pointer is not used. Update __receive_fd() to make writing to ufd optional via a NULL check. However, for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT can be returned to avoid changing the SCM_RIGHTS interface behavior. Add new wrapper receive_fd() for pidfd and seccomp that does not use the ufd argument. For the new helper, the allocated fd needs to be returned on success. Update the existing callers to handle it. Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Sargun Dhillon Acked-by: Christian Brauner Signed-off-by: Kees Cook --- include/linux/file.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/file.h b/include/linux/file.h index b14ff2ffd0bd..d9fee9f5c8da 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -9,6 +9,7 @@ #include #include #include +#include struct file; @@ -96,8 +97,14 @@ extern int __receive_fd(struct file *file, int __user *ufd, static inline int receive_fd_user(struct file *file, int __user *ufd, unsigned int o_flags) { + if (ufd == NULL) + return -EFAULT; return __receive_fd(file, ufd, o_flags); } +static inline int receive_fd(struct file *file, unsigned int o_flags) +{ + return __receive_fd(file, NULL, o_flags); +} extern void flush_delayed_fput(void); extern void __fput_sync(struct file *); -- cgit v1.2.3 From 173817151b15d5a72a9bef1d2df7e6e7f6750f2e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 10 Jun 2020 08:46:58 -0700 Subject: fs: Expand __receive_fd() to accept existing fd Expand __receive_fd() with support for replace_fd() for the coming seccomp "addfd" ioctl(). Add new wrapper receive_fd_replace() for the new behavior and update existing wrappers to retain old behavior. Thanks to Colin Ian King for pointing out an uninitialized variable exposure in an earlier version of this patch. Cc: Alexander Viro Cc: Dmitry Kadashev Cc: Jens Axboe Cc: Arnd Bergmann Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Sargun Dhillon Acked-by: Christian Brauner Signed-off-by: Kees Cook --- include/linux/file.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/file.h b/include/linux/file.h index d9fee9f5c8da..225982792fa2 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -92,18 +92,22 @@ extern void put_unused_fd(unsigned int fd); extern void fd_install(unsigned int fd, struct file *file); -extern int __receive_fd(struct file *file, int __user *ufd, +extern int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flags); static inline int receive_fd_user(struct file *file, int __user *ufd, unsigned int o_flags) { if (ufd == NULL) return -EFAULT; - return __receive_fd(file, ufd, o_flags); + return __receive_fd(-1, file, ufd, o_flags); } static inline int receive_fd(struct file *file, unsigned int o_flags) { - return __receive_fd(file, NULL, o_flags); + return __receive_fd(-1, file, NULL, o_flags); +} +static inline int receive_fd_replace(int fd, struct file *file, unsigned int o_flags) +{ + return __receive_fd(fd, file, NULL, o_flags); } extern void flush_delayed_fput(void); -- cgit v1.2.3 From 7cf97b12545503992020796c74bd84078eb39299 Mon Sep 17 00:00:00 2001 From: Sargun Dhillon Date: Tue, 2 Jun 2020 18:10:43 -0700 Subject: seccomp: Introduce addfd ioctl to seccomp user notifier The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over an fd. It is often used in settings where a supervising task emulates syscalls on behalf of a supervised task in userspace, either to further restrict the supervisee's syscall abilities or to circumvent kernel enforced restrictions the supervisor deems safe to lift (e.g. actually performing a mount(2) for an unprivileged container). While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall, only a certain subset of syscalls could be correctly emulated. Over the last few development cycles, the set of syscalls which can't be emulated has been reduced due to the addition of pidfd_getfd(2). With this we are now able to, for example, intercept syscalls that require the supervisor to operate on file descriptors of the supervisee such as connect(2). However, syscalls that cause new file descriptors to be installed can not currently be correctly emulated since there is no way for the supervisor to inject file descriptors into the supervisee. This patch adds a new addfd ioctl to remove this restriction by allowing the supervisor to install file descriptors into the intercepted task. By implementing this feature via seccomp the supervisor effectively instructs the supervisee to install a set of file descriptors into its own file descriptor table during the intercepted syscall. This way it is possible to intercept syscalls such as open() or accept(), and install (or replace, like dup2(2)) the supervisor's resulting fd into the supervisee. One replacement use-case would be to redirect the stdout and stderr of a supervisee into log file descriptors opened by the supervisor. The ioctl handling is based on the discussions[1] of how Extensible Arguments should interact with ioctls. Instead of building size into the addfd structure, make it a function of the ioctl command (which is how sizes are normally passed to ioctls). To support forward and backward compatibility, just mask out the direction and size, and match everything. The size (and any future direction) checks are done along with copy_struct_from_user() logic. As a note, the seccomp_notif_addfd structure is laid out based on 8-byte alignment without requiring packing as there have been packing issues with uapi highlighted before[2][3]. Although we could overload the newfd field and use -1 to indicate that it is not to be used, doing so requires changing the size of the fd field, and introduces struct packing complexity. [1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/ [2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/ [3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal Cc: Christoph Hellwig Cc: Christian Brauner Cc: Tycho Andersen Cc: Jann Horn Cc: Robert Sesek Cc: Chris Palmer Cc: Al Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-api@vger.kernel.org Suggested-by: Matt Denton Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me Signed-off-by: Sargun Dhillon Reviewed-by: Will Drewry Co-developed-by: Kees Cook Signed-off-by: Kees Cook --- include/linux/seccomp.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index babcd6c02d09..881c90b6aa25 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -10,6 +10,10 @@ SECCOMP_FILTER_FLAG_NEW_LISTENER | \ SECCOMP_FILTER_FLAG_TSYNC_ESRCH) +/* sizeof() the first published struct seccomp_notif_addfd */ +#define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24 +#define SECCOMP_NOTIFY_ADDFD_SIZE_LATEST SECCOMP_NOTIFY_ADDFD_SIZE_VER0 + #ifdef CONFIG_SECCOMP #include -- cgit v1.2.3