<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/btrfs/ordered-data.h, branch v3.0.18</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>btrfs: Allow to add new compression algorithm</title>
<updated>2010-12-22T15:15:45+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2010-12-17T06:21:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=261507a02ccba9afda919852263b6bc1581ce1ef'/>
<id>261507a02ccba9afda919852263b6bc1581ce1ef</id>
<content type='text'>
Make the code aware of compression type, instead of always assuming
zlib compression.

Also make the zlib workspace function as common code for all
compression types.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make the code aware of compression type, instead of always assuming
zlib compression.

Also make the zlib workspace function as common code for all
compression types.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: deal with DIO bios that span more than one ordered extent</title>
<updated>2010-11-29T00:56:33+00:00</updated>
<author>
<name>Chris Mason</name>
<email>chris.mason@oracle.com</email>
</author>
<published>2010-11-29T00:56:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=163cf09c2a0ee5cac6285f9347975bd1e97725da'/>
<id>163cf09c2a0ee5cac6285f9347975bd1e97725da</id>
<content type='text'>
The new DIO bio splitting code has problems when the bio
spans more than one ordered extent.  This will happen as the
generic DIO code merges our get_blocks calls together into
a bigger single bio.

This fixes things by walking forward in the ordered extent
code finding all the overlapping ordered extents and completing them
all at once.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The new DIO bio splitting code has problems when the bio
spans more than one ordered extent.  This will happen as the
generic DIO code merges our get_blocks calls together into
a bigger single bio.

This fixes things by walking forward in the ordered extent
code finding all the overlapping ordered extents and completing them
all at once.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: add basic DIO read/write support</title>
<updated>2010-05-25T14:34:57+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2010-05-23T15:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4b46fce23349bfca781a32e2707a18328ca5ae22'/>
<id>4b46fce23349bfca781a32e2707a18328ca5ae22</id>
<content type='text'>
This provides basic DIO support for reading and writing.  It does not do the
work to recover from mismatching checksums, that will come later.  A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code.  Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents.  Jim's code did
it's own buffering to make dio with compressed extents work.  Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great.  This patch depends on my
dio and filemap.c patches to work.  Thanks,

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This provides basic DIO support for reading and writing.  It does not do the
work to recover from mismatching checksums, that will come later.  A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code.  Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents.  Jim's code did
it's own buffering to make dio with compressed extents work.  Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great.  This patch depends on my
dio and filemap.c patches to work.  Thanks,

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: cache ordered extent when completing io</title>
<updated>2010-03-15T15:00:13+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2010-02-02T20:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5a1a3df1f6c86926cfe8657e6f9b4b4c2f467d60'/>
<id>5a1a3df1f6c86926cfe8657e6f9b4b4c2f467d60</id>
<content type='text'>
When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
already, so we're searching twice when we don't have to.  This patch lets us
pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
complete io on that ordered extent we can just use the one we found then instead
of having to do another btrfs_lookup_ordered_extent.  This made my fio job with
the other patch go from 24 mb/s to 29 mb/s.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
already, so we're searching twice when we don't have to.  This patch lets us
pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
complete io on that ordered extent we can just use the one we found then instead
of having to do another btrfs_lookup_ordered_extent.  This made my fio job with
the other patch go from 24 mb/s to 29 mb/s.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: change the ordered tree to use a spinlock instead of a mutex</title>
<updated>2010-03-15T15:00:12+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2010-02-02T21:48:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=49958fd7dbb83cd4d65179d025940e01fe1fbacd'/>
<id>49958fd7dbb83cd4d65179d025940e01fe1fbacd</id>
<content type='text'>
The ordered tree used to need a mutex, but currently all we use it for is to
protect the rb_tree, and a spin_lock is just fine for that.  Using a spin_lock
instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
less latency, 3445.138 ms instead of 3820.633 ms.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ordered tree used to need a mutex, but currently all we use it for is to
protect the rb_tree, and a spin_lock is just fine for that.  Using a spin_lock
instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
less latency, 3445.138 ms instead of 3820.633 ms.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: use RB_ROOT to intialize rb_trees instead of setting rb_node to NULL</title>
<updated>2010-03-08T21:26:50+00:00</updated>
<author>
<name>Eric Paris</name>
<email>eparis@redhat.com</email>
</author>
<published>2010-02-23T19:43:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6bef4d317193d3badbbfa3f3c593758ace84a629'/>
<id>6bef4d317193d3badbbfa3f3c593758ace84a629</id>
<content type='text'>
btrfs inialize rb trees in quite a number of places by settin rb_node =
NULL;  The problem with this is that 17d9ddc72fb8bba0d4f678 in the
linux-next tree adds a new field to that struct which needs to be NULL for
the new rbtree library code to work properly.  This patch uses RB_ROOT as
the intializer so all of the relevant fields will be NULL'd.  Without the
patch I get a panic.

Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Acked-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
btrfs inialize rb trees in quite a number of places by settin rb_node =
NULL;  The problem with this is that 17d9ddc72fb8bba0d4f678 in the
linux-next tree adds a new field to that struct which needs to be NULL for
the new rbtree library code to work properly.  This patch uses RB_ROOT as
the intializer so all of the relevant fields will be NULL'd.  Without the
patch I get a panic.

Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Acked-by: Venkatesh Pallipadi &lt;venkatesh.pallipadi@intel.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: Add delayed iput</title>
<updated>2009-12-17T17:33:35+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.yan@oracle.com</email>
</author>
<published>2009-11-12T09:36:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=24bbcf0442ee04660a5a030efdbb6d03f1c275cb'/>
<id>24bbcf0442ee04660a5a030efdbb6d03f1c275cb</id>
<content type='text'>
iput() can trigger new transactions if we are dropping the
final reference, so calling it in btrfs_commit_transaction
may end up deadlock. This patch adds delayed iput to avoid
the issue.

Signed-off-by: Yan Zheng &lt;zheng.yan@oracle.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iput() can trigger new transactions if we are dropping the
final reference, so calling it in btrfs_commit_transaction
may end up deadlock. This patch adds delayed iput to avoid
the issue.

Signed-off-by: Yan Zheng &lt;zheng.yan@oracle.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: Fix disk_i_size update corner case</title>
<updated>2009-12-17T17:33:24+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.yan@oracle.com</email>
</author>
<published>2009-11-12T09:34:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c216775458a2ee345d9412a2770c2916acfb5d30'/>
<id>c216775458a2ee345d9412a2770c2916acfb5d30</id>
<content type='text'>
There are some cases file extents are inserted without involving
ordered struct. In these cases, we update disk_i_size directly,
without checking pending ordered extent and DELALLOC bit. This
patch extends btrfs_ordered_update_i_size() to handle these cases.

Signed-off-by: Yan Zheng &lt;zheng.yan@oracle.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are some cases file extents are inserted without involving
ordered struct. In these cases, we update disk_i_size directly,
without checking pending ordered extent and DELALLOC bit. This
patch extends btrfs_ordered_update_i_size() to handle these cases.

Signed-off-by: Yan Zheng &lt;zheng.yan@oracle.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: remove duplicates of filemap_ helpers</title>
<updated>2009-10-01T16:58:30+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2009-10-01T16:58:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8aa38c31b7659e338fee4d9af4c3805acbd9806f'/>
<id>8aa38c31b7659e338fee4d9af4c3805acbd9806f</id>
<content type='text'>
Use filemap_fdatawrite_range and filemap_fdatawait_range instead of
local copies of the functions.  For filemap_fdatawait_range that
also means replacing the awkward old wait_on_page_writeback_range
calling convention with the regular filemap byte offsets.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use filemap_fdatawrite_range and filemap_fdatawait_range instead of
local copies of the functions.  For filemap_fdatawait_range that
also means replacing the awkward old wait_on_page_writeback_range
calling convention with the regular filemap byte offsets.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: Use PagePrivate2 to track pages in the data=ordered code.</title>
<updated>2009-09-11T17:31:07+00:00</updated>
<author>
<name>Chris Mason</name>
<email>chris.mason@oracle.com</email>
</author>
<published>2009-09-02T20:53:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b62b72b26bcd72082c4a69d179dd906bcc22200'/>
<id>8b62b72b26bcd72082c4a69d179dd906bcc22200</id>
<content type='text'>
Btrfs writes go through delalloc to the data=ordered code.  This
makes sure that all of the data is on disk before the metadata
that references it.  The tracking means that we have to make sure
each page in an extent is fully written before we add that extent into
the on-disk btree.

This was done in the past by setting the EXTENT_ORDERED bit for the
range of an extent when it was added to the data=ordered code, and then
clearing the EXTENT_ORDERED bit in the extent state tree as each page
finished IO.

One of the reasons we had to do this was because sometimes pages are
magically dirtied without page_mkwrite being called.  The EXTENT_ORDERED
bit is checked at writepage time, and if it isn't there, our page become
dirty without going through the proper path.

These bit operations make for a number of rbtree searches for each page,
and can cause considerable lock contention.

This commit switches from the EXTENT_ORDERED bit to use PagePrivate2.
As pages go into the ordered code, PagePrivate2 is set on each one.
This is a cheap operation because we already have all the pages locked
and ready to go.

As IO finishes, the PagePrivate2 bit is cleared and the ordered
accoutning is updated for each page.

At writepage time, if the PagePrivate2 bit is missing, we go into the
writepage fixup code to handle improperly dirtied pages.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Btrfs writes go through delalloc to the data=ordered code.  This
makes sure that all of the data is on disk before the metadata
that references it.  The tracking means that we have to make sure
each page in an extent is fully written before we add that extent into
the on-disk btree.

This was done in the past by setting the EXTENT_ORDERED bit for the
range of an extent when it was added to the data=ordered code, and then
clearing the EXTENT_ORDERED bit in the extent state tree as each page
finished IO.

One of the reasons we had to do this was because sometimes pages are
magically dirtied without page_mkwrite being called.  The EXTENT_ORDERED
bit is checked at writepage time, and if it isn't there, our page become
dirty without going through the proper path.

These bit operations make for a number of rbtree searches for each page,
and can cause considerable lock contention.

This commit switches from the EXTENT_ORDERED bit to use PagePrivate2.
As pages go into the ordered code, PagePrivate2 is set on each one.
This is a cheap operation because we already have all the pages locked
and ready to go.

As IO finishes, the PagePrivate2 bit is cleared and the ordered
accoutning is updated for each page.

At writepage time, if the PagePrivate2 bit is missing, we go into the
writepage fixup code to handle improperly dirtied pages.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
