<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/xfs, branch v4.9.54</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>xfs: remove kmem_zalloc_greedy</title>
<updated>2017-10-08T08:26:11+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2017-03-06T19:58:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d86f4ea83626d4714c946ffef643a54519987934'/>
<id>d86f4ea83626d4714c946ffef643a54519987934</id>
<content type='text'>
[ Upstream commit 08b005f1333154ae5b404ca28766e0ffb9f1c150 ]

The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
it to grab 1-4 pages for staging of inobt records.  The infinite loop in
the greedy allocation function is causing hangs[1] in generic/269, so
just get rid of the greedy allocator in favor of kmem_zalloc_large.
This makes bulkstat somewhat more likely to ENOMEM if there's really no
pages to spare, but eliminates a source of hangs.

[1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 08b005f1333154ae5b404ca28766e0ffb9f1c150 ]

The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
it to grab 1-4 pages for staging of inobt records.  The infinite loop in
the greedy allocation function is causing hangs[1] in generic/269, so
just get rid of the greedy allocator in favor of kmem_zalloc_large.
This makes bulkstat somewhat more likely to ENOMEM if there's really no
pages to spare, but eliminates a source of hangs.

[1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: validate bdev support for DAX inode flag</title>
<updated>2017-10-05T07:44:03+00:00</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2017-09-18T21:46:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=02c7d98bec6cf34ee7079fa503faffc82e17f575'/>
<id>02c7d98bec6cf34ee7079fa503faffc82e17f575</id>
<content type='text'>
commit 6851a3db7e224bbb85e23b3c64a506c9e0904382 upstream.

Currently only the blocksize is checked, but we should really be calling
bdev_dax_supported() which also tests to make sure we can get a
struct dax_device and that the dax_direct_access() path is working.

This is the same check that we do for the "-o dax" mount option in
xfs_fs_fill_super().

This does not fix the race issues that caused the XFS DAX inode option to
be disabled, so that option will still be disabled.  If/when we re-enable
it, though, I think we will want this issue to have been fixed.  I also do
think that we want to fix this in stable kernels.

Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6851a3db7e224bbb85e23b3c64a506c9e0904382 upstream.

Currently only the blocksize is checked, but we should really be calling
bdev_dax_supported() which also tests to make sure we can get a
struct dax_device and that the dax_direct_access() path is working.

This is the same check that we do for the "-o dax" mount option in
xfs_fs_fill_super().

This does not fix the race issues that caused the XFS DAX inode option to
be disabled, so that option will still be disabled.  If/when we re-enable
it, though, I think we will want this issue to have been fixed.  I also do
think that we want to fix this in stable kernels.

Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: fix compiler warnings</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2017-08-31T22:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ae04a8c4c6fc5b4aabfb166588045e2845b4d4e7'/>
<id>ae04a8c4c6fc5b4aabfb166588045e2845b4d4e7</id>
<content type='text'>
commit 7bf7a193a90cadccaad21c5970435c665c40fe27 upstream.

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7bf7a193a90cadccaad21c5970435c665c40fe27 upstream.

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: use kmem_free to free return value of kmem_zalloc</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Pan Bian</name>
<email>bianpan2016@163.com</email>
</author>
<published>2017-09-17T21:07:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=81cb6f1a2a1964ed4d93604d1a3d49d92db2a01b'/>
<id>81cb6f1a2a1964ed4d93604d1a3d49d92db2a01b</id>
<content type='text'>
commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.

In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian &lt;bianpan2016@163.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.

In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian &lt;bianpan2016@163.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: open code end_buffer_async_write in xfs_finish_page_writeback</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-09-17T21:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=772003c6a4282211487c9d33958594d7f2be7dd2'/>
<id>772003c6a4282211487c9d33958594d7f2be7dd2</id>
<content type='text'>
commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.

Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.

Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: don't set v3 xflags for v2 inodes</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-09-17T21:07:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bb69e8a228a74c9aa7b70f6624e5c4fa1af70533'/>
<id>bb69e8a228a74c9aa7b70f6624e5c4fa1af70533</id>
<content type='text'>
commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.

Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.

Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: fix incorrect log_flushed on fsync</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2017-09-17T21:07:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f46a61f686b0a8042ee4b7cb108ece81e3fb9401'/>
<id>f46a61f686b0a8042ee4b7cb108ece81e3fb9401</id>
<content type='text'>
commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: disable per-inode DAX flag</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-09-17T21:07:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0e8d7e364ec546c44762664d30f4b1f6fd912197'/>
<id>0e8d7e364ec546c44762664d30f4b1f6fd912197</id>
<content type='text'>
commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.

Currently flag switching can be used to easily crash the kernel.  Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.

Currently flag switching can be used to easily crash the kernel.  Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: relog dirty buffers during swapext bmbt owner change</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Brian Foster</name>
<email>bfoster@redhat.com</email>
</author>
<published>2017-09-17T21:07:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a46cf59265cf5282be0a488abc913e94db924e87'/>
<id>a46cf59265cf5282be0a488abc913e94db924e87</id>
<content type='text'>
commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: disallow marking previously dirty buffers as ordered</title>
<updated>2017-09-20T06:20:02+00:00</updated>
<author>
<name>Brian Foster</name>
<email>bfoster@redhat.com</email>
</author>
<published>2017-09-17T21:07:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e2bb92633615ad801c4ab56fdb3eba3c701b2a3c'/>
<id>e2bb92633615ad801c4ab56fdb3eba3c701b2a3c</id>
<content type='text'>
commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
