<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/btrfs, branch v4.2.1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2015-08-09T02:56:31+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-08-09T02:56:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=af0b3152bbfebd3f8291fd61988c12ece4f60f57'/>
<id>af0b3152bbfebd3f8291fd61988c12ece4f60f57</id>
<content type='text'>
Pull btrfs fix from Chris Mason:
 "We have a btrfs quota regression fix.

  I merged this one on Thursday and have run it through tests against
  current master.

  Normally I wouldn't have sent this while you were finalizing rc6, but
  I'm feeding mosquitoes in the adirondacks next week, so I wanted to
  get this one out before leaving.  I'll leave longer tests running and
  check on things during the week, but I don't expect any problems"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: Fix a regression in qgroup reserved space.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fix from Chris Mason:
 "We have a btrfs quota regression fix.

  I merged this one on Thursday and have run it through tests against
  current master.

  Normally I wouldn't have sent this while you were finalizing rc6, but
  I'm feeding mosquitoes in the adirondacks next week, so I wanted to
  get this one out before leaving.  I'll leave longer tests running and
  check on things during the week, but I don't expect any problems"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: Fix a regression in qgroup reserved space.
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: qgroup: Fix a regression in qgroup reserved space.</title>
<updated>2015-08-06T21:51:15+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2015-08-03T06:44:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c05f9429e12da7c7de2649ef8c8c16bf8c12061f'/>
<id>c05f9429e12da7c7de2649ef8c8c16bf8c12061f</id>
<content type='text'>
During the change to new btrfs extent-oriented qgroup implement, due to
it doesn't use the old __qgroup_excl_accounting() for exclusive extent,
it didn't free the reserved bytes.

The bug will cause limit function go crazy as the reserved space is
never freed, increasing limit will have no effect and still cause
EQOUT.

The fix is easy, just free reserved bytes for newly created exclusive
extent as what it does before.

Reported-by: Tsutomu Itoh &lt;t-itoh@jp.fujitsu.com&gt;
Signed-off-by: Yang Dongsheng &lt;yangds.fnst@cn.fujitsu.com&gt;
Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During the change to new btrfs extent-oriented qgroup implement, due to
it doesn't use the old __qgroup_excl_accounting() for exclusive extent,
it didn't free the reserved bytes.

The bug will cause limit function go crazy as the reserved space is
never freed, increasing limit will have no effect and still cause
EQOUT.

The fix is easy, just free reserved bytes for newly created exclusive
extent as what it does before.

Reported-by: Tsutomu Itoh &lt;t-itoh@jp.fujitsu.com&gt;
Signed-off-by: Yang Dongsheng &lt;yangds.fnst@cn.fujitsu.com&gt;
Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2015-08-01T00:05:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-08-01T00:05:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=acea568fa9eaeffbf949d15b2f7c9c346e16aae3'/>
<id>acea568fa9eaeffbf949d15b2f7c9c346e16aae3</id>
<content type='text'>
Pull btrfs fixes from Chris Mason:
 "Filipe fixed up a hard to trigger ENOSPC regression from our merge
  window pull, and we have a few other smaller fixes"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix quick exhaustion of the system array in the superblock
  btrfs: its btrfs_err() instead of btrfs_error()
  btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
  btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from Chris Mason:
 "Filipe fixed up a hard to trigger ENOSPC regression from our merge
  window pull, and we have a few other smaller fixes"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix quick exhaustion of the system array in the superblock
  btrfs: its btrfs_err() instead of btrfs_error()
  btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
  btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix quick exhaustion of the system array in the superblock</title>
<updated>2015-07-23T01:20:54+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-07-20T13:56:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=00d80e342c0f4f1990ab69f594ee1e2348e51dd9'/>
<id>00d80e342c0f4f1990ab69f594ee1e2348e51dd9</id>
<content type='text'>
Omar reported that after commit 4fbcdf669454 ("Btrfs: fix -ENOSPC when
finishing block group creation"), introduced in 4.2-rc1, the following
test was failing due to exhaustion of the system array in the superblock:

  #!/bin/bash

  truncate -s 100T big.img
  mkfs.btrfs big.img
  mount -o loop big.img /mnt/loop

  num=5
  sz=10T
  for ((i = 0; i &lt; $num; i++)); do
      echo fallocate $i $sz
      fallocate -l $sz /mnt/loop/testfile$i
  done
  btrfs filesystem sync /mnt/loop

  for ((i = 0; i &lt; $num; i++)); do
        echo rm $i
        rm /mnt/loop/testfile$i
        btrfs filesystem sync /mnt/loop
  done
  umount /mnt/loop

This made btrfs_add_system_chunk() fail with -EFBIG due to excessive
allocation of system block groups. This happened because the test creates
a large number of data block groups per transaction and when committing
the transaction we start the writeout of the block group caches for all
the new new (dirty) block groups, which results in pre-allocating space
for each block group's free space cache using the same transaction handle.
That in turn often leads to creation of more block groups, and all get
attached to the new_bgs list of the same transaction handle to the point
of getting a list with over 1500 elements, and creation of new block groups
leads to the need of reserving space in the chunk block reserve and often
creating a new system block group too.

So that made us quickly exhaust the chunk block reserve/system space info,
because as of the commit mentioned before, we do reserve space for each
new block group in the chunk block reserve, unlike before where we would
not and would at most allocate one new system block group and therefore
would only ensure that there was enough space in the system space info to
allocate 1 new block group even if we ended up allocating thousands of
new block groups using the same transaction handle. That worked most of
the time because the computed required space at check_system_chunk() is
very pessimistic (assumes a chunk tree height of BTRFS_MAX_LEVEL/8 and
that all nodes/leafs in a path will be COWed and split) and since the
updates to the chunk tree all happen at btrfs_create_pending_block_groups
it is unlikely that a path needs to be COWed more than once (unless
writepages() for the btree inode is called by mm in between) and that
compensated for the need of creating any new nodes/leads in the chunk
tree.

So fix this by ensuring we don't accumulate a too large list of new block
groups in a transaction's handles new_bgs list, inserting/updating the
chunk tree for all accumulated new block groups and releasing the unused
space from the chunk block reserve whenever the list becomes sufficiently
large. This is a generic solution even though the problem currently can
only happen when starting the writeout of the free space caches for all
dirty block groups (btrfs_start_dirty_block_groups()).

Reported-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Tested-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Omar reported that after commit 4fbcdf669454 ("Btrfs: fix -ENOSPC when
finishing block group creation"), introduced in 4.2-rc1, the following
test was failing due to exhaustion of the system array in the superblock:

  #!/bin/bash

  truncate -s 100T big.img
  mkfs.btrfs big.img
  mount -o loop big.img /mnt/loop

  num=5
  sz=10T
  for ((i = 0; i &lt; $num; i++)); do
      echo fallocate $i $sz
      fallocate -l $sz /mnt/loop/testfile$i
  done
  btrfs filesystem sync /mnt/loop

  for ((i = 0; i &lt; $num; i++)); do
        echo rm $i
        rm /mnt/loop/testfile$i
        btrfs filesystem sync /mnt/loop
  done
  umount /mnt/loop

This made btrfs_add_system_chunk() fail with -EFBIG due to excessive
allocation of system block groups. This happened because the test creates
a large number of data block groups per transaction and when committing
the transaction we start the writeout of the block group caches for all
the new new (dirty) block groups, which results in pre-allocating space
for each block group's free space cache using the same transaction handle.
That in turn often leads to creation of more block groups, and all get
attached to the new_bgs list of the same transaction handle to the point
of getting a list with over 1500 elements, and creation of new block groups
leads to the need of reserving space in the chunk block reserve and often
creating a new system block group too.

So that made us quickly exhaust the chunk block reserve/system space info,
because as of the commit mentioned before, we do reserve space for each
new block group in the chunk block reserve, unlike before where we would
not and would at most allocate one new system block group and therefore
would only ensure that there was enough space in the system space info to
allocate 1 new block group even if we ended up allocating thousands of
new block groups using the same transaction handle. That worked most of
the time because the computed required space at check_system_chunk() is
very pessimistic (assumes a chunk tree height of BTRFS_MAX_LEVEL/8 and
that all nodes/leafs in a path will be COWed and split) and since the
updates to the chunk tree all happen at btrfs_create_pending_block_groups
it is unlikely that a path needs to be COWed more than once (unless
writepages() for the btree inode is called by mm in between) and that
compensated for the need of creating any new nodes/leads in the chunk
tree.

So fix this by ensuring we don't accumulate a too large list of new block
groups in a transaction's handles new_bgs list, inserting/updating the
chunk tree for all accumulated new block groups and releasing the unused
space from the chunk block reserve whenever the list becomes sufficiently
large. This is a generic solution even though the problem currently can
only happen when starting the writeout of the free space caches for all
dirty block groups (btrfs_start_dirty_block_groups()).

Reported-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Tested-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: its btrfs_err() instead of btrfs_error()</title>
<updated>2015-07-23T01:20:53+00:00</updated>
<author>
<name>Anand Jain</name>
<email>Anand.Jain@oracle.com</email>
</author>
<published>2015-07-17T15:45:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e303ea60db3f0222c25a13f39a7cca7bf860df0'/>
<id>3e303ea60db3f0222c25a13f39a7cca7bf860df0</id>
<content type='text'>
sorry I indented to use btrfs_err() and I have no idea
how btrfs_error() got there.
infact I was thinking about these kind of oversights
since these two func are too closely named.

Signed-off-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sorry I indented to use btrfs_err() and I have no idea
how btrfs_error() got there.
infact I was thinking about these kind of oversights
since these two func are too closely named.

Signed-off-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail</title>
<updated>2015-07-23T01:20:52+00:00</updated>
<author>
<name>Zhao Lei</name>
<email>zhaolei@cn.fujitsu.com</email>
</author>
<published>2015-07-15T13:02:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=95ab1f64908795a2edd6b847eca94a0c63a44be4'/>
<id>95ab1f64908795a2edd6b847eca94a0c63a44be4</id>
<content type='text'>
When read_tree_block() failed, we can see following dmesg:
 [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000063
 [  134.372236] IP: [&lt;ffffffff813a4a51&gt;] free_extent_buffer+0x21/0x90
 [  134.372236] PGD 0
 [  134.372236] Oops: 0000 [#1] SMP
 [  134.372236] Modules linked in:
 [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115
 [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 [  134.372236] task: ffff88003b6e1a00 ti: ffff880011e60000 task.ti: ffff880011e60000
 [  134.372236] RIP: 0010:[&lt;ffffffff813a4a51&gt;]  [&lt;ffffffff813a4a51&gt;] free_extent_buffer+0x21/0x90
 ...
 [  134.372236] Call Trace:
 [  134.372236]  [&lt;ffffffff81379aa1&gt;] free_root_extent_buffers+0x91/0xb0
 [  134.372236]  [&lt;ffffffff81379c3d&gt;] free_root_pointers+0x17d/0x190
 [  134.372236]  [&lt;ffffffff813801b0&gt;] open_ctree+0x1ca0/0x25b0
 [  134.372236]  [&lt;ffffffff8144d017&gt;] ? disk_name+0x97/0xb0
 [  134.372236]  [&lt;ffffffff813558aa&gt;] btrfs_mount+0x8fa/0xab0
 ...

Reason:
 read_tree_block() changed to return error number on fail,
 and this value(not NULL) is set to tree_root-&gt;node, then subsequent
 code will run to:
  free_root_pointers()
  -&gt;free_root_extent_buffers()
  -&gt;free_extent_buffer()
  -&gt;atomic_read((extent_buffer *)(-E_XXX)-&gt;refs);
 and trigger above error.

Fix:
 Set tree_root-&gt;node to NULL on fail to make error_handle code
 happy.

Signed-off-by: Zhao Lei &lt;zhaolei@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When read_tree_block() failed, we can see following dmesg:
 [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000063
 [  134.372236] IP: [&lt;ffffffff813a4a51&gt;] free_extent_buffer+0x21/0x90
 [  134.372236] PGD 0
 [  134.372236] Oops: 0000 [#1] SMP
 [  134.372236] Modules linked in:
 [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115
 [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 [  134.372236] task: ffff88003b6e1a00 ti: ffff880011e60000 task.ti: ffff880011e60000
 [  134.372236] RIP: 0010:[&lt;ffffffff813a4a51&gt;]  [&lt;ffffffff813a4a51&gt;] free_extent_buffer+0x21/0x90
 ...
 [  134.372236] Call Trace:
 [  134.372236]  [&lt;ffffffff81379aa1&gt;] free_root_extent_buffers+0x91/0xb0
 [  134.372236]  [&lt;ffffffff81379c3d&gt;] free_root_pointers+0x17d/0x190
 [  134.372236]  [&lt;ffffffff813801b0&gt;] open_ctree+0x1ca0/0x25b0
 [  134.372236]  [&lt;ffffffff8144d017&gt;] ? disk_name+0x97/0xb0
 [  134.372236]  [&lt;ffffffff813558aa&gt;] btrfs_mount+0x8fa/0xab0
 ...

Reason:
 read_tree_block() changed to return error number on fail,
 and this value(not NULL) is set to tree_root-&gt;node, then subsequent
 code will run to:
  free_root_pointers()
  -&gt;free_root_extent_buffers()
  -&gt;free_extent_buffer()
  -&gt;atomic_read((extent_buffer *)(-E_XXX)-&gt;refs);
 and trigger above error.

Fix:
 Set tree_root-&gt;node to NULL on fail to make error_handle code
 happy.

Signed-off-by: Zhao Lei &lt;zhaolei@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()</title>
<updated>2015-07-23T01:20:51+00:00</updated>
<author>
<name>Zhao Lei</name>
<email>zhaolei@cn.fujitsu.com</email>
</author>
<published>2015-07-15T03:48:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8a7330130470e173832385861c1aa6e37a4e50d2'/>
<id>8a7330130470e173832385861c1aa6e37a4e50d2</id>
<content type='text'>
Liu Bo &lt;bo.li.liu@oracle.com&gt; reported a lockdep warning of
delayed_iput_sem in xfstests generic/241:
  [ 2061.345955] =============================================
  [ 2061.346027] [ INFO: possible recursive locking detected ]
  [ 2061.346027] 4.1.0+ #268 Tainted: G        W
  [ 2061.346027] ---------------------------------------------
  [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
  [ 2061.346027]  (&amp;fs_info-&gt;delayed_iput_sem){++++..}, at:
  [&lt;ffffffff814063ab&gt;] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] but task is already holding lock:
  [ 2061.346027]  (&amp;fs_info-&gt;delayed_iput_sem){++++..}, at: [&lt;ffffffff814063ab&gt;] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] other info that might help us debug this:
  [ 2061.346027]  Possible unsafe locking scenario:

  [ 2061.346027]        CPU0
  [ 2061.346027]        ----
  [ 2061.346027]   lock(&amp;fs_info-&gt;delayed_iput_sem);
  [ 2061.346027]   lock(&amp;fs_info-&gt;delayed_iput_sem);
  [ 2061.346027]
   *** DEADLOCK ***
It is rarely happened, about 1/400 in my test env.

The reason is recursion of btrfs_run_delayed_iputs():
  cleaner_kthread
  -&gt; btrfs_run_delayed_iputs() *1
  -&gt; get delayed_iput_sem lock *2
  -&gt; iput()
  -&gt; ...
  -&gt; btrfs_commit_transaction()
  -&gt; btrfs_run_delayed_iputs() *1
  -&gt; get delayed_iput_sem lock (dead lock) *2
  *1: recursion of btrfs_run_delayed_iputs()
  *2: warning of lockdep about delayed_iput_sem

When fs is in high stress, new iputs may added into fs_info-&gt;delayed_iputs
list when btrfs_run_delayed_iputs() is running, which cause
second btrfs_run_delayed_iputs() run into down_read(&amp;fs_info-&gt;delayed_iput_sem)
again, and cause above lockdep warning.

Actually, it will not cause real problem because both locks are read lock,
but to avoid lockdep warning, we can do a fix.

Fix:
  Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for
  cleaner_kthread thread to break above recursion path.
  cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code,
  and don't need to call btrfs_run_delayed_iputs() again in
  btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow.

Test:
  No above lockdep warning after patch in 1200 generic/241 tests.

Reported-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Zhao Lei &lt;zhaolei@cn.fujitsu.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Liu Bo &lt;bo.li.liu@oracle.com&gt; reported a lockdep warning of
delayed_iput_sem in xfstests generic/241:
  [ 2061.345955] =============================================
  [ 2061.346027] [ INFO: possible recursive locking detected ]
  [ 2061.346027] 4.1.0+ #268 Tainted: G        W
  [ 2061.346027] ---------------------------------------------
  [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
  [ 2061.346027]  (&amp;fs_info-&gt;delayed_iput_sem){++++..}, at:
  [&lt;ffffffff814063ab&gt;] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] but task is already holding lock:
  [ 2061.346027]  (&amp;fs_info-&gt;delayed_iput_sem){++++..}, at: [&lt;ffffffff814063ab&gt;] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] other info that might help us debug this:
  [ 2061.346027]  Possible unsafe locking scenario:

  [ 2061.346027]        CPU0
  [ 2061.346027]        ----
  [ 2061.346027]   lock(&amp;fs_info-&gt;delayed_iput_sem);
  [ 2061.346027]   lock(&amp;fs_info-&gt;delayed_iput_sem);
  [ 2061.346027]
   *** DEADLOCK ***
It is rarely happened, about 1/400 in my test env.

The reason is recursion of btrfs_run_delayed_iputs():
  cleaner_kthread
  -&gt; btrfs_run_delayed_iputs() *1
  -&gt; get delayed_iput_sem lock *2
  -&gt; iput()
  -&gt; ...
  -&gt; btrfs_commit_transaction()
  -&gt; btrfs_run_delayed_iputs() *1
  -&gt; get delayed_iput_sem lock (dead lock) *2
  *1: recursion of btrfs_run_delayed_iputs()
  *2: warning of lockdep about delayed_iput_sem

When fs is in high stress, new iputs may added into fs_info-&gt;delayed_iputs
list when btrfs_run_delayed_iputs() is running, which cause
second btrfs_run_delayed_iputs() run into down_read(&amp;fs_info-&gt;delayed_iput_sem)
again, and cause above lockdep warning.

Actually, it will not cause real problem because both locks are read lock,
but to avoid lockdep warning, we can do a fix.

Fix:
  Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for
  cleaner_kthread thread to break above recursion path.
  cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code,
  and don't need to call btrfs_run_delayed_iputs() again in
  btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow.

Test:
  No above lockdep warning after patch in 1200 generic/241 tests.

Reported-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Zhao Lei &lt;zhaolei@cn.fujitsu.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs</title>
<updated>2015-07-18T04:46:57+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-07-18T04:46:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8be5701342790eb42c67f4f8609e203d6a461a4a'/>
<id>8be5701342790eb42c67f4f8609e203d6a461a4a</id>
<content type='text'>
Pull btrfs fixes from Chris Mason:
 "These are all from Filipe, and cover a few problems we've had reported
  on the list recently (along with ones he found on his own)"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix file corruption after cloning inline extents
  Btrfs: fix order by which delayed references are run
  Btrfs: fix list transaction-&gt;pending_ordered corruption
  Btrfs: fix memory leak in the extent_same ioctl
  Btrfs: fix shrinking truncate when the no_holes feature is enabled
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from Chris Mason:
 "These are all from Filipe, and cover a few problems we've had reported
  on the list recently (along with ones he found on his own)"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix file corruption after cloning inline extents
  Btrfs: fix order by which delayed references are run
  Btrfs: fix list transaction-&gt;pending_ordered corruption
  Btrfs: fix memory leak in the extent_same ioctl
  Btrfs: fix shrinking truncate when the no_holes feature is enabled
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix file corruption after cloning inline extents</title>
<updated>2015-07-14T15:09:39+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-07-14T15:09:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ed958762644b404654a6f5d23e869f496fe127c6'/>
<id>ed958762644b404654a6f5d23e869f496fe127c6</id>
<content type='text'>
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs &gt;&gt;$seqres.full 2&gt;&amp;1
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
      -c "pwrite -S 0xbb 4k 2K" \
      -c "pwrite -S 0xcc 8K 4K" \
      $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
     $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
      $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] ------------[ cut here ]------------
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
  [152154.036424] RIP: 0010:[&lt;ffffffff8111a9d5&gt;]  [&lt;ffffffff8111a9d5&gt;] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
  [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
  [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
  [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
  [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
  [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
  [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
  [152154.036424] Stack:
  [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
  [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
  [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
  [152154.036424] Call Trace:
  [152154.036424]  [&lt;ffffffffa04e97c1&gt;] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [&lt;ffffffffa04ea82c&gt;] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed14b&gt;] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed15a&gt;] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed2c7&gt;] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffff81165a4a&gt;] __vfs_write+0x7c/0xa5
  [152154.036424]  [&lt;ffffffff81165f89&gt;] vfs_write+0xa0/0xe4
  [152154.036424]  [&lt;ffffffff81166855&gt;] SyS_pwrite64+0x64/0x82
  [152154.036424]  [&lt;ffffffff81465197&gt;] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 &lt;0f&gt; 0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [&lt;ffffffff8111a9d5&gt;] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP &lt;ffff880429effc68&gt;
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:

   00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
   3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs &gt;&gt;$seqres.full 2&gt;&amp;1
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
      -c "pwrite -S 0xbb 4k 2K" \
      -c "pwrite -S 0xcc 8K 4K" \
      $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
     $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
      $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] ------------[ cut here ]------------
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
  [152154.036424] RIP: 0010:[&lt;ffffffff8111a9d5&gt;]  [&lt;ffffffff8111a9d5&gt;] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
  [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
  [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
  [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
  [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
  [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
  [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
  [152154.036424] Stack:
  [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
  [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
  [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
  [152154.036424] Call Trace:
  [152154.036424]  [&lt;ffffffffa04e97c1&gt;] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [&lt;ffffffffa04ea82c&gt;] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed14b&gt;] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed15a&gt;] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffffa04ed2c7&gt;] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
  [152154.036424]  [&lt;ffffffff81165a4a&gt;] __vfs_write+0x7c/0xa5
  [152154.036424]  [&lt;ffffffff81165f89&gt;] vfs_write+0xa0/0xe4
  [152154.036424]  [&lt;ffffffff81166855&gt;] SyS_pwrite64+0x64/0x82
  [152154.036424]  [&lt;ffffffff81465197&gt;] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 &lt;0f&gt; 0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [&lt;ffffffff8111a9d5&gt;] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP &lt;ffff880429effc68&gt;
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:

   00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
   3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix order by which delayed references are run</title>
<updated>2015-07-11T21:36:44+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-07-09T12:13:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cffc3374e567ef42954f3c7070b3fa83f20f9684'/>
<id>cffc3374e567ef42954f3c7070b3fa83f20f9684</id>
<content type='text'>
When we have an extent that got N references removed and N new references
added in the same transaction, we must run the insertion of the references
first because otherwise the last removed reference will remove the extent
item from the extent tree, resulting in a failure for the insertions.

This is a regression introduced in the 4.2-rc1 release and this fix just
brings back the behaviour of selecting reference additions before any
reference removals.

The following test case for fstests reproduces the issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_cloner
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs &gt;&gt;$seqres.full 2&gt;&amp;1
  _init_flakey
  _mount_flakey

  # Create prealloc extent covering range [160K, 620K[
  $XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo

  # Now write to the last 80K of the prealloc extent plus 40K to the unallocated
  # space that immediately follows it. This creates a new extent of 40K that spans
  # the range [620K, 660K[.
  $XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now 2 back references to the prealloc extent in our
  # extent tree. Both are for our file offset 160K and one relates to a file
  # extent item with a data offset of 0 and a length of 380K, while the other
  # relates to a file extent item with a data offset of 380K and a length of 80K.

  # Make sure everything done so far is durably persisted (all back references are
  # in the extent tree, etc).
  sync

  # Now clone all extents of our file that cover the offset 160K up to its eof
  # (660K at this point) into itself at offset 2M. This leaves a hole in the file
  # covering the range [660K, 2M[. The prealloc extent will now be referenced by
  # the file twice, once for offset 160K and once for offset 2M. The 40K extent
  # that follows the prealloc extent will also be referenced twice by our file,
  # once for offset 620K and once for offset 2M + 460K.
  $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
	$SCRATCH_MNT/foo

  # Now create one new extent in our file with a size of 100Kb. It will span the
  # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
  # range [2M + 460K, 3M[. Our new file size is 3M + 100K.
  $XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now (in memory) 4 back references to the prealloc
  # extent.
  #
  # Two of them are for file offset 160K, related to file extent items
  # matching the file offsets 160K and 540K respectively, with data offsets of
  # 0 and 380K respectively, and with lengths of 380K and 80K respectively.
  #
  # The other two references are for file offset 2M, related to file extent items
  # matching the file offsets 2M and 2M + 380K respectively, with data offsets of
  # 0 and 380K respectively, and with lengths of 389K and 80K respectively.
  #
  # The 40K extent has 2 back references, one for file offset 620K and the other
  # for file offset 2M + 460K.
  #
  # The 100K extent has a single back reference and it relates to file offset 3M.

  # Now clone our 100K extent into offset 600K. That offset covers the last 20K
  # of the prealloc extent, the whole 40K extent and 40K of the hole starting at
  # offset 660K.
  $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
      $SCRATCH_MNT/foo $SCRATCH_MNT/foo

  # At this point there's only one reference to the 40K extent, at file offset
  # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
  # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
  # file offset 3M and a new one for file offset 600K).

  # Now fsync our file to make all its new data and metadata updates are durably
  # persisted and present if a power failure/crash happens after a successful
  # fsync and before the next transaction commit.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foo | _filter_scratch

  # Silently drop all writes and ummount to simulate a crash/power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger log replay and validate file contents.
  # During log replay, the btrfs delayed references implementation used to run the
  # deletion of back references before the addition of new back references, which
  # made the addition fail as it didn't find the key in the extent tree that it
  # was looking for. The failure triggered by this test was related to the 40K
  # extent, which got 1 reference dropped and 1 reference added during the fsync
  # log replay - when running the delayed references at transaction commit time,
  # btrfs was applying the deletion before the insertion, resulting in a failure
  # of the insertion that ended up turning the fs into read-only mode.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  echo "File digest after log replay:"
  md5sum $SCRATCH_MNT/foo | _filter_scratch

  _unmount_flakey

  status=0
  exit

This issue turned the filesystem into read-only mode (current transaction
aborted) and produced the following traces:

  [ 8247.578385] ------------[ cut here ]------------
  [ 8247.579947] WARNING: CPU: 0 PID: 11341 at fs/btrfs/extent-tree.c:1547 lookup_inline_extent_backref+0x17d/0x45d [btrfs]()
  (...)
  [ 8247.601697] Call Trace:
  [ 8247.602222]  [&lt;ffffffff8145f077&gt;] dump_stack+0x4f/0x7b
  [ 8247.604320]  [&lt;ffffffff8104b3b0&gt;] warn_slowpath_common+0xa1/0xbb
  [ 8247.605488]  [&lt;ffffffffa0506c8d&gt;] ? lookup_inline_extent_backref+0x17d/0x45d [btrfs]
  [ 8247.608226]  [&lt;ffffffffa0506c8d&gt;] lookup_inline_extent_backref+0x17d/0x45d [btrfs]
  [ 8247.617061]  [&lt;ffffffffa0507957&gt;] insert_inline_extent_backref+0x41/0xb2 [btrfs]
  [ 8247.621856]  [&lt;ffffffffa0507c4f&gt;] __btrfs_inc_extent_ref+0x8c/0x20a [btrfs]
  [ 8247.624366]  [&lt;ffffffffa050ee60&gt;] __btrfs_run_delayed_refs+0xb0c/0xd49 [btrfs]
  [ 8247.626176]  [&lt;ffffffffa0510dcd&gt;] btrfs_run_delayed_refs+0x6d/0x1d4 [btrfs]
  [ 8247.627435]  [&lt;ffffffff81155c9b&gt;] ? __cache_free+0x4a7/0x4b6
  [ 8247.628531]  [&lt;ffffffffa0520482&gt;] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
  (...)
  [ 8247.648430] ---[ end trace 2461e55f92c2ac2d ]---

  [ 8247.727263] WARNING: CPU: 3 PID: 11341 at fs/btrfs/extent-tree.c:2771 btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]()
  [ 8247.728954] BTRFS: Transaction aborted (error -5)
  (...)
  [ 8247.760866] Call Trace:
  [ 8247.761534]  [&lt;ffffffff8145f077&gt;] dump_stack+0x4f/0x7b
  [ 8247.764271]  [&lt;ffffffff8104b3b0&gt;] warn_slowpath_common+0xa1/0xbb
  [ 8247.767582]  [&lt;ffffffffa0510e04&gt;] ? btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
  [ 8247.769373]  [&lt;ffffffff8104b410&gt;] warn_slowpath_fmt+0x46/0x48
  [ 8247.770836]  [&lt;ffffffffa0510e04&gt;] btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
  [ 8247.772532]  [&lt;ffffffff81155c9b&gt;] ? __cache_free+0x4a7/0x4b6
  [ 8247.773664]  [&lt;ffffffffa0520482&gt;] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
  [ 8247.775047]  [&lt;ffffffff81087310&gt;] ? trace_hardirqs_on+0xd/0xf
  [ 8247.776176]  [&lt;ffffffff81155dd5&gt;] ? kmem_cache_free+0x12b/0x189
  [ 8247.777427]  [&lt;ffffffffa055a920&gt;] btrfs_recover_log_trees+0x2da/0x33d [btrfs]
  [ 8247.778575]  [&lt;ffffffffa055898e&gt;] ? replay_one_extent+0x4fc/0x4fc [btrfs]
  [ 8247.779838]  [&lt;ffffffffa051e265&gt;] open_ctree+0x1cc0/0x201a [btrfs]
  [ 8247.781020]  [&lt;ffffffff81120f48&gt;] ? register_shrinker+0x56/0x81
  [ 8247.782285]  [&lt;ffffffffa04fb12c&gt;] btrfs_mount+0x5f0/0x734 [btrfs]
  (...)
  [ 8247.793394] ---[ end trace 2461e55f92c2ac2e ]---
  [ 8247.794276] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2771: errno=-5 IO failure
  [ 8247.797335] BTRFS: error (device dm-0) in btrfs_replay_log:2375: errno=-5 IO failure (Failed to recover log tree)

Fixes: c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Acked-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we have an extent that got N references removed and N new references
added in the same transaction, we must run the insertion of the references
first because otherwise the last removed reference will remove the extent
item from the extent tree, resulting in a failure for the insertions.

This is a regression introduced in the 4.2-rc1 release and this fix just
brings back the behaviour of selecting reference additions before any
reference removals.

The following test case for fstests reproduces the issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_cloner
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs &gt;&gt;$seqres.full 2&gt;&amp;1
  _init_flakey
  _mount_flakey

  # Create prealloc extent covering range [160K, 620K[
  $XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo

  # Now write to the last 80K of the prealloc extent plus 40K to the unallocated
  # space that immediately follows it. This creates a new extent of 40K that spans
  # the range [620K, 660K[.
  $XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now 2 back references to the prealloc extent in our
  # extent tree. Both are for our file offset 160K and one relates to a file
  # extent item with a data offset of 0 and a length of 380K, while the other
  # relates to a file extent item with a data offset of 380K and a length of 80K.

  # Make sure everything done so far is durably persisted (all back references are
  # in the extent tree, etc).
  sync

  # Now clone all extents of our file that cover the offset 160K up to its eof
  # (660K at this point) into itself at offset 2M. This leaves a hole in the file
  # covering the range [660K, 2M[. The prealloc extent will now be referenced by
  # the file twice, once for offset 160K and once for offset 2M. The 40K extent
  # that follows the prealloc extent will also be referenced twice by our file,
  # once for offset 620K and once for offset 2M + 460K.
  $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
	$SCRATCH_MNT/foo

  # Now create one new extent in our file with a size of 100Kb. It will span the
  # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
  # range [2M + 460K, 3M[. Our new file size is 3M + 100K.
  $XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now (in memory) 4 back references to the prealloc
  # extent.
  #
  # Two of them are for file offset 160K, related to file extent items
  # matching the file offsets 160K and 540K respectively, with data offsets of
  # 0 and 380K respectively, and with lengths of 380K and 80K respectively.
  #
  # The other two references are for file offset 2M, related to file extent items
  # matching the file offsets 2M and 2M + 380K respectively, with data offsets of
  # 0 and 380K respectively, and with lengths of 389K and 80K respectively.
  #
  # The 40K extent has 2 back references, one for file offset 620K and the other
  # for file offset 2M + 460K.
  #
  # The 100K extent has a single back reference and it relates to file offset 3M.

  # Now clone our 100K extent into offset 600K. That offset covers the last 20K
  # of the prealloc extent, the whole 40K extent and 40K of the hole starting at
  # offset 660K.
  $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
      $SCRATCH_MNT/foo $SCRATCH_MNT/foo

  # At this point there's only one reference to the 40K extent, at file offset
  # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
  # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
  # file offset 3M and a new one for file offset 600K).

  # Now fsync our file to make all its new data and metadata updates are durably
  # persisted and present if a power failure/crash happens after a successful
  # fsync and before the next transaction commit.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foo | _filter_scratch

  # Silently drop all writes and ummount to simulate a crash/power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger log replay and validate file contents.
  # During log replay, the btrfs delayed references implementation used to run the
  # deletion of back references before the addition of new back references, which
  # made the addition fail as it didn't find the key in the extent tree that it
  # was looking for. The failure triggered by this test was related to the 40K
  # extent, which got 1 reference dropped and 1 reference added during the fsync
  # log replay - when running the delayed references at transaction commit time,
  # btrfs was applying the deletion before the insertion, resulting in a failure
  # of the insertion that ended up turning the fs into read-only mode.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  echo "File digest after log replay:"
  md5sum $SCRATCH_MNT/foo | _filter_scratch

  _unmount_flakey

  status=0
  exit

This issue turned the filesystem into read-only mode (current transaction
aborted) and produced the following traces:

  [ 8247.578385] ------------[ cut here ]------------
  [ 8247.579947] WARNING: CPU: 0 PID: 11341 at fs/btrfs/extent-tree.c:1547 lookup_inline_extent_backref+0x17d/0x45d [btrfs]()
  (...)
  [ 8247.601697] Call Trace:
  [ 8247.602222]  [&lt;ffffffff8145f077&gt;] dump_stack+0x4f/0x7b
  [ 8247.604320]  [&lt;ffffffff8104b3b0&gt;] warn_slowpath_common+0xa1/0xbb
  [ 8247.605488]  [&lt;ffffffffa0506c8d&gt;] ? lookup_inline_extent_backref+0x17d/0x45d [btrfs]
  [ 8247.608226]  [&lt;ffffffffa0506c8d&gt;] lookup_inline_extent_backref+0x17d/0x45d [btrfs]
  [ 8247.617061]  [&lt;ffffffffa0507957&gt;] insert_inline_extent_backref+0x41/0xb2 [btrfs]
  [ 8247.621856]  [&lt;ffffffffa0507c4f&gt;] __btrfs_inc_extent_ref+0x8c/0x20a [btrfs]
  [ 8247.624366]  [&lt;ffffffffa050ee60&gt;] __btrfs_run_delayed_refs+0xb0c/0xd49 [btrfs]
  [ 8247.626176]  [&lt;ffffffffa0510dcd&gt;] btrfs_run_delayed_refs+0x6d/0x1d4 [btrfs]
  [ 8247.627435]  [&lt;ffffffff81155c9b&gt;] ? __cache_free+0x4a7/0x4b6
  [ 8247.628531]  [&lt;ffffffffa0520482&gt;] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
  (...)
  [ 8247.648430] ---[ end trace 2461e55f92c2ac2d ]---

  [ 8247.727263] WARNING: CPU: 3 PID: 11341 at fs/btrfs/extent-tree.c:2771 btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]()
  [ 8247.728954] BTRFS: Transaction aborted (error -5)
  (...)
  [ 8247.760866] Call Trace:
  [ 8247.761534]  [&lt;ffffffff8145f077&gt;] dump_stack+0x4f/0x7b
  [ 8247.764271]  [&lt;ffffffff8104b3b0&gt;] warn_slowpath_common+0xa1/0xbb
  [ 8247.767582]  [&lt;ffffffffa0510e04&gt;] ? btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
  [ 8247.769373]  [&lt;ffffffff8104b410&gt;] warn_slowpath_fmt+0x46/0x48
  [ 8247.770836]  [&lt;ffffffffa0510e04&gt;] btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
  [ 8247.772532]  [&lt;ffffffff81155c9b&gt;] ? __cache_free+0x4a7/0x4b6
  [ 8247.773664]  [&lt;ffffffffa0520482&gt;] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
  [ 8247.775047]  [&lt;ffffffff81087310&gt;] ? trace_hardirqs_on+0xd/0xf
  [ 8247.776176]  [&lt;ffffffff81155dd5&gt;] ? kmem_cache_free+0x12b/0x189
  [ 8247.777427]  [&lt;ffffffffa055a920&gt;] btrfs_recover_log_trees+0x2da/0x33d [btrfs]
  [ 8247.778575]  [&lt;ffffffffa055898e&gt;] ? replay_one_extent+0x4fc/0x4fc [btrfs]
  [ 8247.779838]  [&lt;ffffffffa051e265&gt;] open_ctree+0x1cc0/0x201a [btrfs]
  [ 8247.781020]  [&lt;ffffffff81120f48&gt;] ? register_shrinker+0x56/0x81
  [ 8247.782285]  [&lt;ffffffffa04fb12c&gt;] btrfs_mount+0x5f0/0x734 [btrfs]
  (...)
  [ 8247.793394] ---[ end trace 2461e55f92c2ac2e ]---
  [ 8247.794276] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2771: errno=-5 IO failure
  [ 8247.797335] BTRFS: error (device dm-0) in btrfs_replay_log:2375: errno=-5 IO failure (Failed to recover log tree)

Fixes: c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Acked-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
