linux-toradex.git/fs/xfs, branch v4.9.54

xfs: remove kmem_zalloc_greedy

2017-10-08T08:26:11+00:00

[ Upstream commit 08b005f1333154ae5b404ca28766e0ffb9f1c150 ]

The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
it to grab 1-4 pages for staging of inobt records.  The infinite loop in
the greedy allocation function is causing hangs[1] in generic/269, so
just get rid of the greedy allocator in favor of kmem_zalloc_large.
This makes bulkstat somewhat more likely to ENOMEM if there's really no
pages to spare, but eliminates a source of hangs.

[1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Greg Kroah-Hartman

xfs: validate bdev support for DAX inode flag

2017-10-05T07:44:03+00:00

commit 6851a3db7e224bbb85e23b3c64a506c9e0904382 upstream.

Currently only the blocksize is checked, but we should really be calling
bdev_dax_supported() which also tests to make sure we can get a
struct dax_device and that the dax_direct_access() path is working.

This is the same check that we do for the "-o dax" mount option in
xfs_fs_fill_super().

This does not fix the race issues that caused the XFS DAX inode option to
be disabled, so that option will still be disabled.  If/when we re-enable
it, though, I think we will want this issue to have been fixed.  I also do
think that we want to fix this in stable kernels.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: fix compiler warnings

2017-09-20T06:20:02+00:00

commit 7bf7a193a90cadccaad21c5970435c665c40fe27 upstream.

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Cc: Arnd Bergmann 
Signed-off-by: Greg Kroah-Hartman

xfs: use kmem_free to free return value of kmem_zalloc

2017-09-20T06:20:02+00:00

commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.

In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: open code end_buffer_async_write in xfs_finish_page_writeback

2017-09-20T06:20:02+00:00

commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.

Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Brian Foster 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: don't set v3 xflags for v2 inodes

2017-09-20T06:20:02+00:00

commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.

Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: fix incorrect log_flushed on fsync

2017-09-20T06:20:02+00:00

commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig 
Cc: Josef Bacik 
Cc: 
Signed-off-by: Amir Goldstein 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: disable per-inode DAX flag

2017-09-20T06:20:02+00:00

commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.

Currently flag switching can be used to easily crash the kernel.  Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: relog dirty buffers during swapext bmbt owner change

2017-09-20T06:20:02+00:00

commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

xfs: disallow marking previously dirty buffers as ordered

2017-09-20T06:20:02+00:00

commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman