<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs, branch v4.4.103</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>btrfs: return the actual error value from from btrfs_uuid_tree_iterate</title>
<updated>2017-11-30T08:37:28+00:00</updated>
<author>
<name>Pan Bian</name>
<email>bianpan2016@163.com</email>
</author>
<published>2016-12-04T04:51:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b63209c78f9ec90b1cb43e458918b89f53e0db1b'/>
<id>b63209c78f9ec90b1cb43e458918b89f53e0db1b</id>
<content type='text'>
[ Upstream commit 73ba39ab9307340dc98ec3622891314bbc09cc2e ]

In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret
on errors. However, it directly returns 0. It may be better to return
ret. This patch also removes the warning, because the caller already
prints a warning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
Signed-off-by: Pan Bian &lt;bianpan2016@163.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
[ edited subject ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;

Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 73ba39ab9307340dc98ec3622891314bbc09cc2e ]

In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret
on errors. However, it directly returns 0. It may be better to return
ret. This patch also removes the warning, because the caller already
prints a warning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
Signed-off-by: Pan Bian &lt;bianpan2016@163.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
[ edited subject ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;

Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fscrypt: lock mutex before checking for bounce page pool</title>
<updated>2017-11-30T08:37:25+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-10-29T10:30:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=91bd72dd8c72c603132bbbfd348a4216b8c83f58'/>
<id>91bd72dd8c72c603132bbbfd348a4216b8c83f58</id>
<content type='text'>
commit a0b3bc855374c50b5ea85273553485af48caf2f7 upstream.

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a0b3bc855374c50b5ea85273553485af48caf2f7 upstream.

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>fs/9p: Compare qid.path in v9fs_test_inode</title>
<updated>2017-11-30T08:37:22+00:00</updated>
<author>
<name>Tuomas Tynkkynen</name>
<email>tuomas@tuxera.com</email>
</author>
<published>2017-09-06T14:59:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8709c5386109557e6cb9e3c38f011ce627fe3ce8'/>
<id>8709c5386109557e6cb9e3c38f011ce627fe3ce8</id>
<content type='text'>
commit 8ee031631546cf2f7859cc69593bd60bbdd70b46 upstream.

Commit fd2421f54423 ("fs/9p: When doing inode lookup compare qid details
and inode mode bits.") transformed v9fs_qid_iget() to use iget5_locked()
instead of iget_locked(). However, the test() callback is not checking
fid.path at all, which means that a lookup in the inode cache can now
accidentally locate a completely wrong inode from the same inode hash
bucket if the other fields (qid.type and qid.version) match.

Fixes: fd2421f54423 ("fs/9p: When doing inode lookup compare qid details and inode mode bits.")
Reviewed-by: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
Signed-off-by: Tuomas Tynkkynen &lt;tuomas@tuxera.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8ee031631546cf2f7859cc69593bd60bbdd70b46 upstream.

Commit fd2421f54423 ("fs/9p: When doing inode lookup compare qid details
and inode mode bits.") transformed v9fs_qid_iget() to use iget5_locked()
instead of iget_locked(). However, the test() callback is not checking
fid.path at all, which means that a lookup in the inode cache can now
accidentally locate a completely wrong inode from the same inode hash
bucket if the other fields (qid.type and qid.version) match.

Fixes: fd2421f54423 ("fs/9p: When doing inode lookup compare qid details and inode mode bits.")
Reviewed-by: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
Signed-off-by: Tuomas Tynkkynen &lt;tuomas@tuxera.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix interaction between i_size, fallocate, and delalloc after a crash</title>
<updated>2017-11-30T08:37:21+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2017-10-07T03:09:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=db12d9b5a18143712362b5ccc4b077bf57b040ec'/>
<id>db12d9b5a18143712362b5ccc4b077bf57b040ec</id>
<content type='text'>
commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream.

If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).

If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:

Inode 12, end of extent exceeds allowed value
	(logical block 33, physical block 33441, len 7)

This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size.  Since this situation is
quite rare, and the window in which the crash must take place is
typically &lt; 30 seconds, in practice this condition will rarely happen.

Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.

Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reported-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 51e3ae81ec58e95f10a98ef3dd6d7bce5d8e35a2 upstream.

If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).

If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:

Inode 12, end of extent exceeds allowed value
	(logical block 33, physical block 33441, len 7)

This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size.  Since this situation is
quite rare, and the window in which the crash must take place is
typically &lt; 30 seconds, in practice this condition will rarely happen.

Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.

Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reported-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nfsd: deal with revoked delegations appropriately</title>
<updated>2017-11-30T08:37:21+00:00</updated>
<author>
<name>Andrew Elble</name>
<email>aweits@rit.edu</email>
</author>
<published>2017-11-03T18:06:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a8b8ab79ca4fd69182f5864ddcd59994717e0186'/>
<id>a8b8ab79ca4fd69182f5864ddcd59994717e0186</id>
<content type='text'>
commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream.

If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the &gt;4.1
case, and NFS4ERR_BAD_STATEID otherwise.

The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply.  That can cause the Linux 4.1 client to loop in its
stage manager.

Signed-off-by: Andrew Elble &lt;aweits@rit.edu&gt;
Reviewed-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream.

If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the &gt;4.1
case, and NFS4ERR_BAD_STATEID otherwise.

The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply.  That can cause the Linux 4.1 client to loop in its
stage manager.

Signed-off-by: Andrew Elble &lt;aweits@rit.edu&gt;
Reviewed-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nfs: Fix ugly referral attributes</title>
<updated>2017-11-30T08:37:21+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2017-11-05T20:45:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2a2d4b41472c73439adc6b8b55fed212d85f4faa'/>
<id>2a2d4b41472c73439adc6b8b55fed212d85f4faa</id>
<content type='text'>
commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream.

Before traversing a referral and performing a mount, the mounted-on
directory looks strange:

dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31  1969 dir.0

nfs4_get_referral is wiping out any cached attributes with what was
returned via GETATTR(fs_locations), but the bit mask for that
operation does not request any file attributes.

Retrieve owner and timestamp information so that the memcpy in
nfs4_get_referral fills in more attributes.

Changes since v1:
- Don't request attributes that the client unconditionally replaces
- Request only MOUNTED_ON_FILEID or FILEID attribute, not both
- encode_fs_locations() doesn't use the third bitmask word

Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral")
Suggested-by: Pradeep Thomas &lt;pradeepthomas@gmail.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream.

Before traversing a referral and performing a mount, the mounted-on
directory looks strange:

dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31  1969 dir.0

nfs4_get_referral is wiping out any cached attributes with what was
returned via GETATTR(fs_locations), but the bit mask for that
operation does not request any file attributes.

Retrieve owner and timestamp information so that the memcpy in
nfs4_get_referral fills in more attributes.

Changes since v1:
- Don't request attributes that the client unconditionally replaces
- Request only MOUNTED_ON_FILEID or FILEID attribute, not both
- encode_fs_locations() doesn't use the third bitmask word

Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral")
Suggested-by: Pradeep Thomas &lt;pradeepthomas@gmail.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Fix typo in nomigration mount option</title>
<updated>2017-11-30T08:37:20+00:00</updated>
<author>
<name>Joshua Watt</name>
<email>jpewhacker@gmail.com</email>
</author>
<published>2017-11-07T22:25:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ab33df42eb3c53711c487c839d36379ffca3aeb9'/>
<id>ab33df42eb3c53711c487c839d36379ffca3aeb9</id>
<content type='text'>
commit f02fee227e5f21981152850744a6084ff3fa94ee upstream.

The option was incorrectly masking off all other options.

Signed-off-by: Joshua Watt &lt;JPEWhacker@gmail.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f02fee227e5f21981152850744a6084ff3fa94ee upstream.

The option was incorrectly masking off all other options.

Signed-off-by: Joshua Watt &lt;JPEWhacker@gmail.com&gt;
Signed-off-by: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>isofs: fix timestamps beyond 2027</title>
<updated>2017-11-30T08:37:20+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2017-10-19T14:47:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4e23be6169767d21f95d5bfd5df8a2c6274ee2d7'/>
<id>4e23be6169767d21f95d5bfd5df8a2c6274ee2d7</id>
<content type='text'>
commit 34be4dbf87fc3e474a842305394534216d428f5d upstream.

isofs uses a 'char' variable to load the number of years since
1900 for an inode timestamp. On architectures that use a signed
char type by default, this results in an invalid date for
anything beyond 2027.

This changes the function argument to a 'u8' array, which
is defined the same way on all architectures, and unambiguously
lets us use years until 2155.

This should be backported to all kernels that might still be
in use by that date.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 34be4dbf87fc3e474a842305394534216d428f5d upstream.

isofs uses a 'char' variable to load the number of years since
1900 for an inode timestamp. On architectures that use a signed
char type by default, this results in an invalid date for
anything beyond 2027.

This changes the function argument to a 'u8' array, which
is defined the same way on all architectures, and unambiguously
lets us use years until 2155.

This should be backported to all kernels that might still be
in use by that date.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>eCryptfs: use after free in ecryptfs_release_messaging()</title>
<updated>2017-11-30T08:37:20+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2017-08-22T20:41:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9c093a258350e86f781f23fbbef4e3d761c6b1e3'/>
<id>9c093a258350e86f781f23fbbef4e3d761c6b1e3</id>
<content type='text'>
commit db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 upstream.

We're freeing the list iterator so we should be using the _safe()
version of hlist_for_each_entry().

Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit db86be3a12d0b6e5c5b51c2ab2a48f06329cb590 upstream.

We're freeing the list iterator so we should be using the _safe()
version of hlist_for_each_entry().

Fixes: 88b4a07e6610 ("[PATCH] eCryptfs: Public key transport mechanism")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix race condition that causes file system corruption</title>
<updated>2017-11-30T08:37:20+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2017-11-17T23:29:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d7b05e4ffd5ad7cf54b57dfc2f377d9a543c8c6'/>
<id>7d7b05e4ffd5ad7cf54b57dfc2f377d9a543c8c6</id>
<content type='text'>
commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream.

There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode.  It calls __nilfs_mark_inode_dirty() in a
separate transaction.  __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two.  If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree.  Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption.  Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time.  A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream.

There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode.  It calls __nilfs_mark_inode_dirty() in a
separate transaction.  __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two.  If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree.  Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption.  Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time.  A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
