summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-01-08Linux 3.18.2v3.18.2Greg Kroah-Hartman
2015-01-08Btrfs: fix fs corruption on transaction abort if device supports discardFilipe Manana
commit 678886bdc6378c1cbd5072da2c5a3035000214e3 upstream. When we abort a transaction we iterate over all the ranges marked as dirty in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them from those trees, add them back (unpin) to the free space caches and, if the fs was mounted with "-o discard", perform a discard on those regions. Also, after adding the regions to the free space caches, a fitrim ioctl call can see those ranges in a block group's free space cache and perform a discard on the ranges, so the same issue can happen without "-o discard" as well. This causes corruption, affecting one or multiple btree nodes (in the worst case leaving the fs unmountable) because some of those ranges (the ones in the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are referred by the last committed super block - breaking the rule that anything that was committed by a transaction is untouched until the next transaction commits successfully. I ran into this while running in a loop (for several hours) the fstest that I recently submitted: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim The corruption always happened when a transaction aborted and then fsck complained like this: _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 read block failed check_tree_block Couldn't open file system In this case 94945280 corresponded to the root of a tree. Using frace what I observed was the following sequence of steps happened: 1) transaction N started, fs_info->pinned_extents pointed to fs_info->freed_extents[0]; 2) node/eb 94945280 is created; 3) eb is persisted to disk; 4) transaction N commit starts, fs_info->pinned_extents now points to fs_info->freed_extents[1], and transaction N completes; 5) transaction N + 1 starts; 6) eb is COWed, and btrfs_free_tree_block() called for this eb; 7) eb range (94945280 to 94945280 + 16Kb) is added to fs_info->pinned_extents (fs_info->freed_extents[1]); 8) Something goes wrong in transaction N + 1, like hitting ENOSPC for example, and the transaction is aborted, turning the fs into readonly mode. The stack trace I got for example: [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98 [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50 [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs] [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs] [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs] [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs] (...) [112065.264493] ---[ end trace dd7903a975a31a08 ]--- [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left [112065.264997] BTRFS info (device sdc): forced readonly 9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in fs_info->fs_state and calls btrfs_cleanup_transaction(), which in turn calls btrfs_destroy_pinned_extent(); 10) Then btrfs_destroy_pinned_extent() iterates over all the ranges marked as dirty in fs_info->freed_extents[], and for each one it calls discard, if the fs was mounted with "-o discard", and adds the range to the free space cache of the respective block group; 11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path, sees the free space entries and performs a discard; 12) After an umount and mount (or fsck), our eb's location on disk was full of zeroes, and it should have been untouched, because it was marked as dirty in the fs_info->pinned_extents tree, and therefore used by the trees that the last committed superblock points to. Fix this by not performing a discard and not adding the ranges to the free space caches - it's useless from this point since the fs is now in readonly mode and we won't write free space caches to disk anymore (otherwise we would leak space) nor any new superblock. By not adding the ranges to the free space caches, it prevents other code paths from allocating that space and write to it as well, therefore being safer and simpler. This isn't a new problem, as it's been present since 2011 (git commit acce952b0263825da32cf10489413dec78053347). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08Btrfs: make sure logged extents complete in the current transaction V3Josef Bacik
commit 50d9aa99bd35c77200e0e3dd7a72274f8304701f upstream. Liu Bo pointed out that my previous fix would lose the generation update in the scenario I described. It is actually much worse than that, we could lose the entire extent if we lose power right after the transaction commits. Consider the following write extent 0-4k log extent in log tree commit transaction < power fail happens here ordered extent completes We would lose the 0-4k extent because it hasn't updated the actual fs tree, and the transaction commit will reset the log so it isn't replayed. If we lose power before the transaction commit we are save, otherwise we are not. Fix this by keeping track of all extents we logged in this transaction. Then when we go to commit the transaction make sure we wait for all of those ordered extents to complete before proceeding. This will make sure that if we lose power after the transaction commit we still have our data. This also fixes the problem of the improperly updated extent generation. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08Btrfs: do not move em to modified list when unpinningJosef Bacik
commit a28046956c71985046474283fa3bcd256915fb72 upstream. We use the modified list to keep track of which extents have been modified so we know which ones are candidates for logging at fsync() time. Newly modified extents are added to the list at modification time, around the same time the ordered extent is created. We do this so that we don't have to wait for ordered extents to complete before we know what we need to log. The problem is when something like this happens log extent 0-4k on inode 1 copy csum for 0-4k from ordered extent into log sync log commit transaction log some other extent on inode 1 ordered extent for 0-4k completes and adds itself onto modified list again log changed extents see ordered extent for 0-4k has already been logged at this point we assume the csum has been copied sync log crash On replay we will see the extent 0-4k in the log, drop the original 0-4k extent which is the same one that we are replaying which also drops the csum, and then we won't find the csum in the log for that bytenr. This of course causes us to have errors about not having csums for certain ranges of our inode. So remove the modified list manipulation in unpin_extent_cache, any modified extents should have been added well before now, and we don't want them re-logged. This fixes my test that I could reliably reproduce this problem with. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08btrfs: fix wrong accounting of raid1 data profile in statfsDavid Sterba
commit 0d95c1bec906dd1ad951c9c001e798ca52baeb0f upstream. The sizes that are obtained from space infos are in raw units and have to be adjusted according to the raid factor. This was missing for f_bavail and df reported doubled size for raid1. Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Fixes: ba7b6e62f420 ("btrfs: adjust statfs calculations according to raid profiles") Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08Btrfs: make sure we wait on logged extents when fsycning two subvolsJosef Bacik
commit 9dba8cf128ef98257ca719722280c9634e7e9dc7 upstream. If we have two fsync()'s race on different subvols one will do all of its work to get into the log_tree, wait on it's outstanding IO, and then allow the log_tree to finish it's commit. The problem is we were just free'ing that subvols logged extents instead of waiting on them, so whoever lost the race wouldn't really have their data on disk. Fix this by waiting properly instead of freeing the logged extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08eCryptfs: Remove buggy and unnecessary write in file name decode routineMichael Halcrow
commit 942080643bce061c3dd9d5718d3b745dcb39a8bc upstream. Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the end of the allocated buffer during encrypted filename decoding. This fix corrects the issue by getting rid of the unnecessary 0 write when the current bit offset is 2. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Reported-by: Dmitry Chernenkov <dmitryc@google.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08eCryptfs: Force RO mount when encrypted view is enabledTyler Hicks
commit 332b122d39c9cbff8b799007a825d94b2e7c12f2 upstream. The ecryptfs_encrypted_view mount option greatly changes the functionality of an eCryptfs mount. Instead of encrypting and decrypting lower files, it provides a unified view of the encrypted files in the lower filesystem. The presence of the ecryptfs_encrypted_view mount option is intended to force a read-only mount and modifying files is not supported when the feature is in use. See the following commit for more information: e77a56d [PATCH] eCryptfs: Encrypted passthrough This patch forces the mount to be read-only when the ecryptfs_encrypted_view mount option is specified by setting the MS_RDONLY flag on the superblock. Additionally, this patch removes some broken logic in ecryptfs_open() that attempted to prevent modifications of files when the encrypted view feature was in use. The check in ecryptfs_open() was not sufficient to prevent file modifications using system calls that do not operate on a file descriptor. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Priya Bansal <p.bansal@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08udf: Check component length before reading itJan Kara
commit e237ec37ec154564f8690c5bd1795339955eeef9 upstream. Check that length specified in a component of a symlink fits in the input buffer we are reading. Also properly ignore component length for component types that do not use it. Otherwise we read memory after end of buffer for corrupted udf image. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08udf: Verify symlink size before loading itJan Kara
commit a1d47b262952a45aae62bd49cfaf33dd76c11a2c upstream. UDF specification allows arbitrarily large symlinks. However we support only symlinks at most one block large. Check the length of the symlink so that we don't access memory beyond end of the symlink block. Reported-by: Carl Henrik Lunde <chlunde@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08udf: Verify i_size when loading inodeJan Kara
commit e159332b9af4b04d882dbcfe1bb0117f0a6d4b58 upstream. Verify that inode size is sane when loading inode with data stored in ICB. Otherwise we may get confused later when working with the inode and inode size is too big. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08udf: Check path length when reading symlinkJan Kara
commit 0e5cc9a40ada6046e6bc3bdfcd0c0d7e4b706b14 upstream. Symlink reading code does not check whether the resulting path fits into the page provided by the generic code. This isn't as easy as just checking the symlink size because of various encoding conversions we perform on path. So we have to check whether there is still enough space in the buffer on the fly. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exitingOleg Nesterov
commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream. alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mm/CMA: fix boot regression due to physical address of high_memoryJoonsoo Kim
commit 6b101e2a3ce4d2a0312087598bd1ab4a1db2ac40 upstream. high_memory isn't direct mapped memory so retrieving it's physical address isn't appropriate. But, it would be useful to check physical address of highmem boundary so it's justfiable to get physical address from it. In x86, there is a validation check if CONFIG_DEBUG_VIRTUAL and it triggers following boot failure reported by Ingo. ... BUG: Int 6: CR2 00f06f53 ... Call Trace: dump_stack+0x41/0x52 early_idt_handler+0x6b/0x6b cma_declare_contiguous+0x33/0x212 dma_contiguous_reserve_area+0x31/0x4e dma_contiguous_reserve+0x11d/0x125 setup_arch+0x7b5/0xb63 start_kernel+0xb8/0x3e6 i386_start_kernel+0x79/0x7d To fix boot regression, this patch implements workaround to avoid validation check in x86 when retrieving physical address of high_memory. __pa_nodebug() used by this patch is implemented only in x86 so there is no choice but to use dirty #ifdef. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08ncpfs: return proper error from NCP_IOC_SETROOT ioctlJan Kara
commit a682e9c28cac152e6e54c39efcf046e0c8cfcf63 upstream. If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit 2e54eb96e2c8 ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: 2e54eb96e2c80 ("BKL: Remove BKL from ncpfs") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08crypto: af_alg - fix backlog handlingRabin Vincent
commit 7e77bdebff5cb1e9876c561f69710b9ab8fa1f7e upstream. If a request is backlogged, it's complete() handler will get called twice: once with -EINPROGRESS, and once with the final error code. af_alg's complete handler, unlike other users, does not handle the -EINPROGRESS but instead always completes the completion that recvmsg() is waiting on. This can lead to a return to user space while the request is still pending in the driver. If userspace closes the sockets before the requests are handled by the driver, this will lead to use-after-frees (and potential crashes) in the kernel due to the tfm having been freed. The crashes can be easily reproduced (for example) by reducing the max queue length in cryptod.c and running the following (from http://www.chronox.de/libkcapi.html) on AES-NI capable hardware: $ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \ -k 00000000000000000000000000000000 \ -p 00000000000000000000000000000000 >/dev/null & done Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08audit: restore AUDIT_LOGINUID unset ABIRichard Guy Briggs
commit 041d7b98ffe59c59fdd639931dea7d74f9aa9a59 upstream. A regression was caused by commit 780a7654cee8: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by e1760bd) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08audit: don't attempt to lookup PIDs when changing PID filtering audit rulesPaul Moore
commit 3640dcfa4fd00cd91d88bb86250bdb496f7070c0 upstream. Commit f1dc4867 ("audit: anchor all pid references in the initial pid namespace") introduced a find_vpid() call when adding/removing audit rules with PID/PPID filters; unfortunately this is problematic as find_vpid() only works if there is a task with the associated PID alive on the system. The following commands demonstrate a simple reproducer. # auditctl -D # auditctl -l # autrace /bin/true # auditctl -l This patch resolves the problem by simply using the PID provided by the user without any additional validation, e.g. no calls to check to see if the task/PID exists. Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skbRichard Guy Briggs
commit 54dc77d974a50147d6639dac6f59cb2c29207161 upstream. Eric Paris explains: Since kauditd_send_multicast_skb() gets called in audit_log_end(), which can come from any context (aka even a sleeping context) GFP_KERNEL can't be used. Since the audit_buffer knows what context it should use, pass that down and use that. See: https://lkml.org/lkml/2014/12/16/542 BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin 2 locks held by sulogin/885: #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b #1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30 Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014 ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375 ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006 0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38 Call Trace: [<ffffffff916ba529>] dump_stack+0x50/0xa8 [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be [<ffffffff910632a6>] __might_sleep+0x119/0x128 [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9 [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3 [<ffffffff914e2b62>] skb_copy+0x3e/0xa3 [<ffffffff910c263e>] audit_log_end+0x83/0x100 [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103 [<ffffffff91252a73>] common_lsm_audit+0x441/0x450 [<ffffffff9123c163>] slow_avc_audit+0x63/0x67 [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3 [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65 [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10 [<ffffffff911515e6>] install_exec_creds+0xe/0x79 [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7 [<ffffffff9115198e>] search_binary_handler+0x81/0x18c [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7 [<ffffffff91153669>] do_execve+0x1f/0x21 [<ffffffff91153967>] SyS_execve+0x25/0x29 [<ffffffff916c61a9>] stub_execve+0x69/0xa0 Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Unbreak the unprivileged remount testsEric W. Biederman
commit db86da7cb76f797a1a8b445166a15cb922c6ff85 upstream. A security fix in caused the way the unprivileged remount tests were using user namespaces to break. Tweak the way user namespaces are being used so the test works again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Allow setting gid_maps without privilege when setgroups is disabledEric W. Biederman
commit 66d2f338ee4c449396b6f99f5e75cd18eb6df272 upstream. Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Add a knob to disable setgroups on a per user namespace basisEric W. Biederman
commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream. - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Rename id_map_mutex to userns_state_mutexEric W. Biederman
commit f0d62aec931e4ae3333c797d346dc4f188f454ba upstream. Generalize id_map_mutex so it can be used for more state of a user namespace. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Only allow the creator of the userns unprivileged mappingsEric W. Biederman
commit f95d7918bd1e724675de4940039f2865e5eec5fe upstream. If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Check euid no fsuid when establishing an unprivileged uid mappingEric W. Biederman
commit 80dd00a23784b384ccea049bfb3f259d3f973b9d upstream. setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Don't allow unprivileged creation of gid mappingsEric W. Biederman
commit be7c6dba2332cef0677fbabb606e279ae76652c3 upstream. As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Don't allow setgroups until a gid mapping has been setablishedEric W. Biederman
commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream. setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08userns: Document what the invariant required for safe unprivileged mappings.Eric W. Biederman
commit 0542f17bf2c1f2430d368f44c8fcf2f82ec9e53e upstream. The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08groups: Consolidate the setgroups permission checksEric W. Biederman
commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream. Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08umount: Disallow unprivileged mount forceEric W. Biederman
commit b2f5d4dc38e034eecb7987e513255265ff9aa1cf upstream. Forced unmount affects not just the mount namespace but the underlying superblock as well. Restrict forced unmount to the global root user for now. Otherwise it becomes possible a user in a less privileged mount namespace to force the shutdown of a superblock of a filesystem in a more privileged mount namespace, allowing a DOS attack on root. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mnt: Update unprivileged remount testEric W. Biederman
commit 4a44a19b470a886997d6647a77bb3e38dcbfa8c5 upstream. - MNT_NODEV should be irrelevant except when reading back mount flags, no longer specify MNT_NODEV on remount. - Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts. - Add a test to verify that remount of a prexisting mount with the same flags is allowed and does not change those flags. - Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used when the code is built in an environment without them. - Correct the test error messages when tests fail. There were not 5 tests that tested MS_RELATIME. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mountEric W. Biederman
commit 3e1866410f11356a9fd869beb3e95983dc79c067 upstream. Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount when nodev was implicitly added on mount. Tested-by: Cedric Bosdonnat <cbosdonnat@suse.com> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08thermal: Fix error path in thermal_init()Luis Henriques
commit 9d367e5e7b05c71a8c1ac4e9b6e00ba45a79f2fc upstream. thermal_unregister_governors() and class_unregister() were being called in the wrong order. Fixes: 80a26a5c22b9 ("Thermal: build thermal governors into thermal_sys module") Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mnt: Fix a memory stomp in umountEric W. Biederman
commit c297abfdf15b4480704d6b566ca5ca9438b12456 upstream. While reviewing the code of umount_tree I realized that when we append to a preexisting unmounted list we do not change pprev of the former first item in the list. Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on the former first item of the list will stomp unmounted.first leaving it set to some random mount point which we are likely to free soon. This isn't likely to hit, but if it does I don't know how anyone could track it down. [ This happened because we don't have all the same operations for hlist's as we do for normal doubly-linked lists. In particular, list_splice() is easy on our standard doubly-linked lists, while hlist_splice() doesn't exist and needs both start/end entries of the hlist. And commit 38129a13e6e7 incorrectly open-coded that missing hlist_splice(). We should think about making these kinds of "mindless" conversions easier to get right by adding the missing hlist helpers - Linus ] Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mac80211: free management frame keys when removing stationJohannes Berg
commit 28a9bc68124c319b2b3dc861e80828a8865fd1ba upstream. When writing the code to allow per-station GTKs, I neglected to take into account the management frame keys (index 4 and 5) when freeing the station and only added code to free the first four data frame keys. Fix this by iterating the array of keys over the right length. Fixes: e31b82136d1a ("cfg80211/mac80211: allow per-station GTKs") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mac80211: fix multicast LED blinking and counterAndreas Müller
commit d025933e29872cb1fe19fc54d80e4dfa4ee5779c upstream. As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount" stopped being incremented after the use-after-free fix. Furthermore, the RX-LED will be triggered by every multicast frame (which wouldn't happen before) which wouldn't allow the LED to rest at all. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the patch. Fixes: b8fff407a180 ("mac80211: fix use-after-free in defragmentation") Signed-off-by: Andreas Müller <goo@stapelspeicher.org> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mac80211: avoid using uninitialized stack dataJes Sorensen
commit 7e6225a1604d0c6aa4140289bf5761868ffc9c83 upstream. Avoid a case where we would access uninitialized stack data if the AP advertises HT support without 40MHz channel support. Fixes: f3000e1b43f1 ("mac80211: fix broken use of VHT/20Mhz with some APs") Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08mac80211: copy chandef from AP vif to VLANsFelix Fietkau
commit 2967e031d4d737d9cc8252d878a17924d7b704f0 upstream. Instead of keeping track of all those special cases where VLAN interfaces have no bss_conf.chandef, just make sure they have the same as the AP interface they belong to. Among others, this fixes a crash getting a VLAN's channel from userspace since a NULL channel is returned as a good result (return value 0) for VLANs since the commit below. Fixes: c12bc4885f4b3 ("mac80211: return the vif's chandef in ieee80211_cfg_get_channel()") Signed-off-by: Felix Fietkau <nbd@openwrt.org> [rewrite commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08KEYS: Fix stale key registration at error pathTakashi Iwai
commit b26bdde5bb27f3f900e25a95e33a0c476c8c2c48 upstream. When loading encrypted-keys module, if the last check of aes_get_sizes() in init_encrypted() fails, the driver just returns an error without unregistering its key type. This results in the stale entry in the list. In addition to memory leaks, this leads to a kernel crash when registering a new key type later. This patch fixes the problem by swapping the calls of aes_get_sizes() and register_key_type(), and releasing resources properly at the error paths. Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/microcode/intel: Fish out the stashed microcode for the BSPBorislav Petkov
commit 25cdb9c86826f8d035d8aaa07fc36832e76bd8a0 upstream. I'm such a moron! The simple solution of saving the BSP patch for use on resume was too simple (and wrong!), hint: sizeof(struct microcode_intel). What needs to be done instead is to fish out the microcode patch we have stashed previously and apply that on the BSP in case the late loader hasn't been utilized. So do that instead. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141208110820.GB20057@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86, microcode: Reload microcode on resumeBorislav Petkov
commit fbae4ba8c4a387e306adc9c710e5c225cece7678 upstream. Normally, we do reapply microcode on resume. However, in the cases where that microcode comes from the early loader and the late loader hasn't been utilized yet, there's no easy way for us to go and apply the patch applied during boot by the early loader. Thus, reuse the patch stashed by the early loader for the BSP. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86, microcode: Don't initialize microcode code on paravirtBoris Ostrovsky
commit a18a0f6850d4b286a5ebf02cd5b22fe496b86349 upstream. Paravirtual guests are not expected to load microcode into processors and therefore it is not necessary to initialize microcode loading logic. In fact, under certain circumstances initializing this logic may cause the guest to crash. Specifically, 32-bit kernels use __pa_nodebug() macro which does not work in Xen (the code path that leads to this macro happens during resume when we call mc_bp_resume()->load_ucode_ap() ->check_loader_disabled_ap()) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86, microcode, intel: Drop unused parameterBorislav Petkov
commit 47768626c6db42cd06ff077ba12dd2cb10ab818b upstream. apply_microcode_early() doesn't use mc_saved_data, kill it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86, microcode, AMD: Do not use smp_processor_id() in preemtible contextBorislav Petkov
commit 2ef84b3bb97f03332f0c1edb4466b1750dcf97b5 upstream. Hand down the cpu number instead, otherwise lockdep screams when doing echo 1 > /sys/devices/system/cpu/microcode/reload. BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470 caller is debug_smp_processor_id+0x12/0x20 CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26 ... Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08isofs: Fix unchecked printing of ER recordsJan Kara
commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream. We didn't check length of rock ridge ER records before printing them. Thus corrupted isofs image can cause us to access and print some memory behind the buffer with obvious consequences. Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/tls: Don't validate lm in set_thread_area() after allAndy Lutomirski
commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regsAndy Lutomirski
commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08ARM: mvebu: fix ordering in Armada 370 .dtsiUwe Kleine-König
commit ab1e85372168892387dd1ac171158fc8c3119be4 upstream. Commit a095b1c78a35 ("ARM: mvebu: sort DT nodes by address") missed placing the system-controller in the correct order. Fixes: a095b1c78a35 ("ARM: mvebu: sort DT nodes by address") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Andrew Lunn <andrew@lunn.ch> Link: https://lkml.kernel.org/r/20141114204333.GS27002@pengutronix.de Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08ARM: mvebu: remove conflicting muxing on Armada 370 DBThomas Petazzoni
commit b4607572ef86b288a856b9df410ea593c5371dec upstream. Back when audio was enabled, the muxing of some MPP pins was causing problems. However, since commit fea038ed55ae ("ARM: mvebu: Add proper pin muxing on the Armada 370 DB board"), those problematic MPP pins have been assigned a proper muxing for the Ethernet interfaces. This proper muxing is now conflicting with the hog pins muxing that had been added as part of 249f3822509b ("ARM: mvebu: add audio support to Armada 370 DB"). Therefore, this commit simply removes the hog pins muxing, which solves a warning a boot time due to the conflicting muxing requirements. Fixes: fea038ed55ae ("ARM: mvebu: Add proper pin muxing on the Armada 370 DB board") Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Link: https://lkml.kernel.org/r/1414512524-24466-5-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XPThomas Petazzoni
commit e55355453600a33bb5ca4f71f2d7214875f3b061 upstream. Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada 38x and Armada XP requires a certain number of conditions: - On Armada 370, the cache policy must be set to write-allocate. - On Armada 375, 38x and XP, the cache policy must be set to write-allocate, the pages must be mapped with the shareable attribute, and the SMP bit must be set Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none of these conditions are met. With Armada 370, the situation is worse: since the processor is single core, regardless of whether CONFIG_SMP or !CONFIG_SMP is used, the cache policy will be set to write-back by the kernel and not write-allocate. Since solving this problem turns out to be quite complicated, and we don't want to let users with a mainline kernel known to have infrequent but existing data corruptions, this commit proposes to simply disable hardware I/O coherency in situations where it is known not to work. And basically, the is_smp() function of the kernel tells us whether it is OK to enable hardware I/O coherency or not, so this commit slightly refactors the coherency_type() function to return COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate type of the coherency fabric in the other case. Thanks to this, the I/O coherency fabric will no longer be used at all in !CONFIG_SMP configurations. It will continue to be used in CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x (which are multiple cores processors), but will no longer be used on Armada 370 (which is a single core processor). In the process, it simplifies the implementation of the coherency_type() function, and adds a missing call to of_node_put(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: e60304f8cb7bb545e79fe62d9b9762460c254ec2 ("arm: mvebu: Add hardware I/O Coherency support") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>