Age | Commit message (Collapse) | Author |
|
commit 2a6fba6d2311151598abaa1e7c9abd5f8d024a43 upstream.
Today, the put_listent formatters return either 1 or 0; if
they return 1, some callers treat this as an error and return
it up the stack, despite "1" not being a valid (negative)
error code.
The intent seems to be that if the input buffer is full,
we set seen_enough or set count = -1, and return 1;
but some callers check the return before checking the
seen_enough or count fields of the context.
Fix this by only returning non-zero for actual errors
encountered, and rely on the caller to first check the
return value, then check the values in the context to
decide what to do.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0facef7fb053be4353c0a48c2f48c9dbee91cb19 upstream.
When we're iterating inode xattrs by handle, we have to copy the
cursor back to userspace so that a subsequent invocation actually
retrieves subsequent contents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a4d768e702de224cc85e0c8eac9311763403b368 upstream.
This structure copy was throwing unaligned access warnings on sparc64:
Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]
xfs_btree_copy_ptrs does a memcpy, which avoids it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream.
By run fsstress long enough time enough in RHEL-7, I find an
assertion failure (harder to reproduce on linux-4.11, but problem
is still there):
XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c
The assertion is in xfs_getbmap() funciton:
if (map[i].br_startblock == DELAYSTARTBLOCK &&
--> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
ASSERT((iflags & BMV_IF_DELALLOC) != 0);
When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
startoff is just at EOF. But we only need to make sure delalloc
extents that are within EOF, not include EOF.
Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream.
The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.
If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).
The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.
Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 upstream.
The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.
To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 upstream.
The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.
Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f upstream.
Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.
The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 20e8a063786050083fe05b4f45be338c60b49126 upstream.
The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.
Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream.
Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.
This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 88..95 0 (88..95) 8
1: [8..15]: 80..87 0 (80..87) 8
2: [16..39]: 168..191 0 (168..191) 24
3: [40..63]: 5242952..5242975 1 (72..95) 24
Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.
Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.
Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream.
Carlos had a case where "find" seemed to start spinning
forever and never return.
This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:
0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...
i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.
At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size. During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes. If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.
At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.
The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block. This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.
The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.
There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.
Huge appreciation for Carlos for debugging and isolating
the problem.
Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit be6324c00c4d1e0e665f03ed1fc18863a88da119 upstream.
In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel. struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8affebe16d79ebefb1d9d6d56a46dc89716f9453 upstream.
xfs_find_get_desired_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.
When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size XFS on x86_64 host.
# xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
-c "seek -d 0" /mnt/xfs/testfile
wrote 1024/1024 bytes at offset 2048
1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
Whence Result
DATA EOF
Data at offset 2k was missed, and lseek(2) returned ENXIO.
This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5375023ae1266553a7baa0845e82917d8803f48c upstream.
XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
can be seen by the following command:
xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
-c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
Whence Result
HOLE 139264
Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
implementation. The bug is in xfs_find_get_desired_pgoff() which does
not properly detect the case when pages are not contiguous.
Fix the problem by properly detecting when found page has larger offset
than expected.
Fixes: d126d43f631f996daeee5006714fed914be32368
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f961e3f2acae94b727380c0b74e2d3954d0edf79 upstream.
In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.
This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.
GETDEVICEINFO has the same problem.
Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8179a101eb5f4ef0ac9a915fcea9a9d3109efa90 upstream.
ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().
The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.
Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[luis: introduce __ceph_setattr() and make ceph_set_acl() call it, as
suggested by Yan.]
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: “Yan, Zheng” <zyan@redhat.com>
|
|
commit 6b06cdee81d68a8a829ad8e8d0f31d6836744af9 upstream.
When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes. Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long. Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.
However, there is a bug. It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions. However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped". Thus, it's actually the
second-to-last block which depends on the full plaintext.
This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable. For example, with ext4:
# echo pass | e4crypt add_key -p 16 edir/
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
2004
# rm -rf edir/
rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
...
To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.
Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient. This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations. Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories. They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.
For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet. Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.
Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption")
Reported-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6332cd32c8290a80e929fc044dc5bdba77396e33 upstream.
If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.
Eric reported how to reproduce this issue by:
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999
Cc: <stable@vger.kernel.org>
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 272f98f6846277378e1758a49a49d7bf39343c02 upstream.
To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce). However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys. This can cause two
incorrect behaviors:
1. If we have the parent directory's key but not the child's key, or
vice versa, then fscrypt_has_permitted_context() returned false,
causing applications to see EPERM or ENOKEY. This is incorrect if
the encryption contexts are in fact consistent. Although we'd
normally have either both keys or neither key in that case since the
master_key_descriptors would be the same, this is not guaranteed
because keys can be added or removed from keyrings at any time.
2. If we have neither the parent's key nor the child's key, then
fscrypt_has_permitted_context() returned true, causing applications
to see no error (or else an error for some other reason). This is
incorrect if the encryption contexts are in fact inconsistent, since
in that case we should deny access.
To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.
While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info(). If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4762cc3fbbd89e5fd316d6e4d3244a8984444f8d upstream.
We should be testing for -ENOMEM but the minus sign is missing.
Fixes: c9af28fdd449 ('ext4 crypto: don't let data integrity writebacks fail with ENOMEM')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c9af28fdd44922a6c10c9f8315718408af98e315 upstream.
We don't want the writeback triggered from the journal commit (in
data=writeback mode) to cause the journal to abort due to
generic_writepages() returning an ENOMEM error. In addition, if
fsync() fails with ENOMEM, most applications will probably not do the
right thing.
So if we are doing a data integrity sync, and ext4_encrypt() returns
ENOMEM, we will submit any queued I/O to date, and then retry the
allocation using GFP_NOFAIL.
Google-Bug-Id: 27641567
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d66bb1607e2d8d384e53f3d93db5c18483c8c4f7 upstream.
proc_create_mount_point() forgot to increase the parent's nlink, and
it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
less than expected.
Fixes: eb6d38d5427b ("proc: Allow creating permanently empty directories...")
Reported-by: Tristan Ye <tristan.ye@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 85435d7a15294f9f7ef23469e6aaf7c5dfcc54f0 upstream.
SFM is mapping doublequote to 0xF020
Without this patch creating files with doublequote fails to Windows/Mac
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d8a6e505d6bba2250852fbc1c1c86fe68aaf9af3 upstream.
An open directory may have a NULL private_data pointer prior to readdir.
Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b704e70b7cf48f9b67c07d585168e102dfa30bb4 upstream.
- trailing space maps to 0xF028
- trailing period maps to 0xF029
This fix corrects the mapping of file names which have a trailing character
that would otherwise be illegal (period or space) but is allowed by POSIX.
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7db0a6efdc3e990cdfd4b24820d010e9eb7890ad upstream.
Macs send the maximum buffer size in response on ioctl to validate
negotiate security information, which causes us to fail the mount
as the response buffer is larger than the expected response.
Changed ioctl response processing to allow for padding of validate
negotiate ioctl response and limit the maximum response size to
maximum buffer size.
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 26c9cb668c7fbf9830516b75d8bee70b699ed449 upstream.
Mac requires the unicode flag to be set for cifs, even for the smb
echo request (which doesn't have strings).
Without this Mac rejects the periodic echo requests (when mounting
with cifs) that we use to check if server is down
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a5f6a6a9c72eac38a7fadd1a038532bc8516337c upstream.
invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0
which doen't make any sense.
Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
regardless of mapping->nrpages value.
Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eeca958dce0a9231d1969f86196653eb50fcc9b3 upstream.
The ceph_inode_xattr needs to be released when removing an xattr. Easily
reproducible running the 'generic/020' test from xfstests or simply by
doing:
attr -s attr0 -V 0 /mnt/test && attr -r attr0 /mnt/test
While there, also fix the error path.
Here's the kmemleak splat:
unreferenced object 0xffff88001f86fbc0 (size 64):
comm "attr", pid 244, jiffies 4294904246 (age 98.464s)
hex dump (first 32 bytes):
40 fa 86 1f 00 88 ff ff 80 32 38 1f 00 88 ff ff @........28.....
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
backtrace:
[<ffffffff81560199>] kmemleak_alloc+0x49/0xa0
[<ffffffff810f3e5b>] kmem_cache_alloc+0x9b/0xf0
[<ffffffff812b157e>] __ceph_setxattr+0x17e/0x820
[<ffffffff812b1c57>] ceph_set_xattr_handler+0x37/0x40
[<ffffffff8111fb4b>] __vfs_removexattr+0x4b/0x60
[<ffffffff8111fd37>] vfs_removexattr+0x77/0xd0
[<ffffffff8111fdd1>] removexattr+0x41/0x60
[<ffffffff8111fe65>] path_removexattr+0x75/0xa0
[<ffffffff81120aeb>] SyS_lremovexattr+0xb/0x10
[<ffffffff81564b20>] entry_SYSCALL_64_fastpath+0x13/0x94
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81be3dee96346fbe08c31be5ef74f03f6b63cf68 upstream.
getxattr uses vmalloc to allocate memory if kzalloc fails. This is
filled by vfs_getxattr and then copied to the userspace. vmalloc,
however, doesn't zero out the memory so if the specific implementation
of the xattr handler is sloppy we can theoretically expose a kernel
memory. There is no real sign this is really the case but let's make
sure this will not happen and use vzalloc instead.
Fixes: 779302e67835 ("fs/xattr.c:getxattr(): improve handling of allocation failures")
Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7b4cc9787fe35b3ee2dfb1c35e22eafc32e00c33 upstream.
Currently the case of writing via mmap to a file with inline data is not
handled. This is maybe a rare case since it requires a writable memory
map of a very small file, but it is trivial to trigger with on
inline_data filesystem, and it causes the
'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in
ext4_writepages() to be hit:
mkfs.ext4 -O inline_data /dev/vdb
mount /dev/vdb /mnt
xfs_io -f /mnt/file \
-c 'pwrite 0 1' \
-c 'mmap -w 0 1m' \
-c 'mwrite 0 1' \
-c 'fsync'
kernel BUG at fs/ext4/inode.c:2723!
invalid opcode: 0000 [#1] SMP
CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
task: ffff88003d3a8040 task.stack: ffffc90000300000
RIP: 0010:ext4_writepages+0xc89/0xf8a
RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283
RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc
RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246
RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5
R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698
R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff
FS: 00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0
Call Trace:
? _raw_spin_unlock+0x27/0x2a
? kvm_clock_read+0x1e/0x20
do_writepages+0x23/0x2c
? do_writepages+0x23/0x2c
__filemap_fdatawrite_range+0x80/0x87
filemap_write_and_wait_range+0x67/0x8c
ext4_sync_file+0x20e/0x472
vfs_fsync_range+0x8e/0x9f
? syscall_trace_enter+0x25b/0x2d0
vfs_fsync+0x1c/0x1e
do_fsync+0x31/0x4a
SyS_fsync+0x10/0x14
do_syscall_64+0x69/0x131
entry_SYSCALL64_slow_path+0x25/0x25
We could try to be smart and keep the inline data in this case, or at
least support delayed allocation when allocating the block, but these
solutions would be more complicated and don't seem worthwhile given how
rare this case seems to be. So just fix the bug by calling
ext4_convert_inline_data() when we're asked to make a page writable, so
that any inline data gets evicted, with the block allocated immediately.
Reported-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time. This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.
Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there. This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.
Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[idryomov@gmail.com: backport to < 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9dd46188edc2f0d1f37328637860bb65a771124 upstream.
F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.
Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b5c66bab72a6a65edb15beb60b90d3cb84c5763b upstream.
posix_acl_update_mode() could possibly clear 'acl', if so we leak the
memory pointed by 'acl'. Save this pointer before calling
posix_acl_update_mode() and release the memory if 'acl' really gets
cleared.
Link: http://lkml.kernel.org/r/1486678332-2430-1-git-send-email-xiyou.wangcong@gmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Mark Salyzyn <salyzyn@android.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 13bf9fbff0e5e099e2b6f003a0ab8ae145436309 upstream.
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit db44bac41bbfc0c0d9dd943092d8bded3c9db19b upstream.
Use a couple shortcuts that will simplify a following bugfix.
(Minor backporting required to account for a change from f34b95689d2c
"The NFSv2/NFSv3 server does not handle zero length WRITE requests
correctly".)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 03a8bb0e53d9562276045bdfcf2b5de2e4cff5a1 upstream.
As Al pointed, d_revalidate should return RCU lookup before using d_inode.
This was originally introduced by:
commit 34286d666230 ("fs: rcu-walk aware d_revalidate method").
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3d43bcfef5f0548845a425365011c499875491b0 upstream.
This avoids potential problems caused by a race where the inode gets
renamed out from its parent directory and the parent directory is
deleted while ext4_d_revalidate() is running.
Fixes: 28b4c263961c
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 28b4c263961c47da84ed8b5be0b5116bad1133eb upstream.
Add a validation check for dentries for encrypted directory to make
sure we're not caching stale data after a key has been added or removed.
Also check to make sure that status of the encryption key is updated
when readdir(2) is executed.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a200d075e5d05be1fcad4547a0f8aee4e2f9a04 upstream.
...otherwise an user can enable encryption for certain files even
when the filesystem is unable to support it.
Such a case would be a filesystem created by mkfs.ext4's default
settings, 1KiB block size. Ext4 supports encyption only when block size
is equal to PAGE_SIZE.
But this constraint is only checked when the encryption feature flag
is set.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 38bd49064a1ecb67baad33598e3d824448ab11ec upstream.
A signal can interrupt a SendReceive call which result in incoming
responses to the call being ignored. This is a problem for calls such as
open which results in the successful response being ignored. This
results in an open file resource on the server.
The patch looks into responses which were cancelled after being sent and
in case of successful open closes the open fids.
For this patch, the check is only done in SendReceive2()
RH-bz: 1403319
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1e38da300e1e395a15048b0af1e5305bd91402f6 upstream.
The handling of the might_cancel queueing is not properly protected, so
parallel operations on the file descriptor can race with each other and
lead to list corruptions or use after free.
Protect the context for these operations with a seperate lock.
The wait queue lock cannot be reused for this because that would create a
lock inversion scenario vs. the cancel lock. Replacing might_cancel with an
atomic (atomic_t or atomic bit) does not help either because it still can
race vs. the actual list operation.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "linux-fsdevel@vger.kernel.org"
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311521430.3457@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e6838a29ecb484c97e4efef9429643b9851fba6e upstream.
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream.
We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.
This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a59b62fd88196844cee5fff851bee2cfd7afb6e upstream.
Do more sanity check for superblock during ->mount.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d29216842a85c7970c536108e093963f02714498 upstream.
CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.
mkdir /tmp/1 /tmp/2
mount --make-rshared /
for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.
As such CVE-2016-6213 was assigned.
Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:
> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.
So I am setting the default number of mounts allowed per mount
namespace at 100,000. This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts. Which should be perfect to catch misconfigurations and
malfunctioning programs.
For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 67893f12e5374bbcaaffbc6e570acbc2714ea884 upstream.
We get a bogus warning about a potential uninitialized variable
use in gfs2, because the compiler does not figure out that we
never use the leaf number if get_leaf_nr() returns an error:
fs/gfs2/dir.c: In function 'get_first_leaf':
fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/gfs2/dir.c: In function 'dir_split_leaf':
fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
sufficient to let gcc understand that this is exactly the same
condition as in IS_ERR() so it can optimize the code path enough
to understand it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a0918f1ce6a43ac980b42b300ec443c154970979 upstream.
STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.
Once the only place where it is set is removed, the remaining
bits are rendered moot.
Removing it does not prevent "mount" from failing when a non
existent share is passed.
What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 62a6cfddcc0a5313e7da3e8311ba16226fe0ac10 upstream.
commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") added support for Negotiate requests to
be initiated by echo calls.
To avoid delays in calling echo after a reconnect, I added the patch
introduced by the commit b8c600120fc8 ("Call echo service immediately
after socket reconnect").
This has however caused a regression with cifs shares which do not have
support for echo calls to trigger Negotiate requests. On connections
which need to call Negotiation, the echo calls trigger an error which
triggers a reconnect which in turn triggers another echo call. This
results in a loop which is only broken when an operation is performed on
the cifs share. For an idle share, it can DOS a server.
The patch uses the smb_operation can_echo() for cifs so that it is
called only if connection has been already been setup.
kernel bz: 194531
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Jonathan Liu <net147@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05ac5aa18abd7db341e54df4ae2b4c98ea0e43b7 upstream.
We've fixed the race condition problem in calculating ext4 checksum
value in commit b47820edd163 ("ext4: avoid modifying checksum fields
directly during checksum veficationon"). However, by this change,
when calculating the checksum value of inode whose i_extra_size is
less than 4, we couldn't calculate the checksum value in a proper way.
This problem was found and reported by Nix, Thank you.
Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|