<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/xfs/xfs_alloc.c, branch v3.4.96</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>xfs: fix fstrim offset calculations</title>
<updated>2012-03-27T21:07:03+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-03-22T05:15:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a66d636385d621e98a915233250356c394a437de'/>
<id>a66d636385d621e98a915233250356c394a437de</id>
<content type='text'>
xfs_ioc_fstrim() doesn't treat the incoming offset and length
correctly. It treats them as a filesystem block address, rather than
a disk address. This is wrong because the range passed in is a
linear representation, while the filesystem block address notation
is a sparse representation. Hence we cannot convert the range direct
to filesystem block units and then use that for calculating the
range to trim.

While this sounds dangerous, the problem is limited to calculating
what AGs need to be trimmed. The code that calcuates the actual
ranges to trim gets the right result (i.e. only ever discards free
space), even though it uses the wrong ranges to limit what is
trimmed. Hence this is not a bug that endangers user data.

Fix this by treating the range as a disk address range and use the
appropriate functions to convert the range into the desired formats
for calculations.

Further, fix the first free extent lookup (the longest) to actually
find the largest free extent. Currently this lookup uses a &lt;=
lookup, which results in finding the extent to the left of the
largest because we can never get an exact match on the largest
extent. This is due to the fact that while we know it's size, we
don't know it's location and so the exact match fails and we move
one record to the left to get the next largest extent. Instead, use
a &gt;= search so that the lookup returns the largest extent regardless
of the fact we don't get an exact match on it.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
xfs_ioc_fstrim() doesn't treat the incoming offset and length
correctly. It treats them as a filesystem block address, rather than
a disk address. This is wrong because the range passed in is a
linear representation, while the filesystem block address notation
is a sparse representation. Hence we cannot convert the range direct
to filesystem block units and then use that for calculating the
range to trim.

While this sounds dangerous, the problem is limited to calculating
what AGs need to be trimmed. The code that calcuates the actual
ranges to trim gets the right result (i.e. only ever discards free
space), even though it uses the wrong ranges to limit what is
trimmed. Hence this is not a bug that endangers user data.

Fix this by treating the range as a disk address range and use the
appropriate functions to convert the range into the desired formats
for calculations.

Further, fix the first free extent lookup (the longest) to actually
find the largest free extent. Currently this lookup uses a &lt;=
lookup, which results in finding the extent to the left of the
largest because we can never get an exact match on the largest
extent. This is due to the fact that while we know it's size, we
don't know it's location and so the exact match fails and we move
one record to the left to get the next largest extent. Instead, use
a &gt;= search so that the lookup returns the largest extent regardless
of the fact we don't get an exact match on it.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: introduce an allocation workqueue</title>
<updated>2012-03-22T21:12:24+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-03-22T05:15:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c999a223c2f0d31c64ef7379814cea1378b2b800'/>
<id>c999a223c2f0d31c64ef7379814cea1378b2b800</id>
<content type='text'>
We currently have significant issues with the amount of stack that
allocation in XFS uses, especially in the writeback path. We can
easily consume 4k of stack between mapping the page, manipulating
the bmap btree and allocating blocks from the free list. Not to
mention btree block readahead and other functionality that issues IO
in the allocation path.

As a result, we can no longer fit allocation in the writeback path
in the stack space provided on x86_64. To alleviate this problem,
introduce an allocation workqueue and move all allocations to a
seperate context. This can be easily added as an interposing layer
into xfs_alloc_vextent(), which takes a single argument structure
and does not return until the allocation is complete or has failed.

To do this, add a work structure and a completion to the allocation
args structure. This allows xfs_alloc_vextent to queue the args onto
the workqueue and wait for it to be completed by the worker. This
can be done completely transparently to the caller.

The worker function needs to ensure that it sets and clears the
PF_TRANS flag appropriately as it is being run in an active
transaction context. Work can also be queued in a memory reclaim
context, so a rescuer is needed for the workqueue.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We currently have significant issues with the amount of stack that
allocation in XFS uses, especially in the writeback path. We can
easily consume 4k of stack between mapping the page, manipulating
the bmap btree and allocating blocks from the free list. Not to
mention btree block readahead and other functionality that issues IO
in the allocation path.

As a result, we can no longer fit allocation in the writeback path
in the stack space provided on x86_64. To alleviate this problem,
introduce an allocation workqueue and move all allocations to a
seperate context. This can be easily added as an interposing layer
into xfs_alloc_vextent(), which takes a single argument structure
and does not return until the allocation is complete or has failed.

To do this, add a work structure and a completion to the allocation
args structure. This allows xfs_alloc_vextent to queue the args onto
the workqueue and wait for it to be completed by the worker. This
can be done completely transparently to the caller.

The worker function needs to ensure that it sets and clears the
PF_TRANS flag appropriately as it is being run in an active
transaction context. Work can also be queued in a memory reclaim
context, so a rescuer is needed for the workqueue.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF</title>
<updated>2011-10-12T02:15:09+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-10-10T16:52:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=38f23232449c9d2c0bc8e9541cb8ab08b7c2b9ce'/>
<id>38f23232449c9d2c0bc8e9541cb8ab08b7c2b9ce</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: Remove the macro XFS_BUF_ERROR and family</title>
<updated>2011-07-25T19:57:46+00:00</updated>
<author>
<name>Chandra Seetharaman</name>
<email>sekharan@us.ibm.com</email>
</author>
<published>2011-07-22T23:39:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5a52c2a581cddcb676a54a95d99cd39f5577c33b'/>
<id>5a52c2a581cddcb676a54a95d99cd39f5577c33b</id>
<content type='text'>
Remove the definitions and usage of the macros XFS_BUF_ERROR,
XFS_BUF_GETERROR and XFS_BUF_ISERROR.

Signed-off-by: Chandra Seetharaman &lt;sekharan@us.ibm.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the definitions and usage of the macros XFS_BUF_ERROR,
XFS_BUF_GETERROR and XFS_BUF_ISERROR.

Signed-off-by: Chandra Seetharaman &lt;sekharan@us.ibm.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact()</title>
<updated>2011-07-08T16:32:51+00:00</updated>
<author>
<name>Chandra Seetharaman</name>
<email>sekharan@us.ibm.com</email>
</author>
<published>2011-06-09T16:47:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=81463b1ca8dbd2f4f180feac3f49c7640e2b5f79'/>
<id>81463b1ca8dbd2f4f180feac3f49c7640e2b5f79</id>
<content type='text'>
Remove two variables that serve no purpose in
xfs_alloc_ag_vextent_exact().

Signed-off-by: Chandra Seetharaman &lt;sekharan@us.ibm.com&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove two variables that serve no purpose in
xfs_alloc_ag_vextent_exact().

Signed-off-by: Chandra Seetharaman &lt;sekharan@us.ibm.com&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: byteswap constants instead of variables</title>
<updated>2011-07-08T12:36:05+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2011-07-08T12:36:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=69ef921b55cc3788d1d2a27b33b27d04acd0090a'/>
<id>69ef921b55cc3788d1d2a27b33b27d04acd0090a</id>
<content type='text'>
Micro-optimize various comparisms by always byteswapping the constant
instead of the variable, which allows to do the swap at compile instead
of runtime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Alex Elder &lt;aelder@sgi.com&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Micro-optimize various comparisms by always byteswapping the constant
instead of the variable, which allows to do the swap at compile instead
of runtime.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Alex Elder &lt;aelder@sgi.com&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: do not discard alloc btree blocks</title>
<updated>2011-05-24T16:17:22+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-05-04T18:55:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=55a7bc5a30ff2d30d8a34fea2af9fc601b32e61a'/>
<id>55a7bc5a30ff2d30d8a34fea2af9fc601b32e61a</id>
<content type='text'>
Blocks for the allocation btree are allocated from and released to
the AGFL, and thus frequently reused.  Even worse we do not have an
easy way to avoid using an AGFL block when it is discarded due to
the simple FILO list of free blocks, and thus can frequently stall
on blocks that are currently undergoing a discard.

Add a flag to the busy extent tracking structure to skip the discard
for allocation btree blocks.  In normal operation these blocks are
reused frequently enough that there is no need to discard them
anyway, but if they spill over to the allocation btree as part of a
balance we "leak" blocks that we would otherwise discard.  We could
fix this by adding another flag and keeping these block in the
rbtree even after they aren't busy any more so that we could discard
them when they migrate out of the AGFL.  Given that this would cause
significant overhead I don't think it's worthwile for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Blocks for the allocation btree are allocated from and released to
the AGFL, and thus frequently reused.  Even worse we do not have an
easy way to avoid using an AGFL block when it is discarded due to
the simple FILO list of free blocks, and thus can frequently stall
on blocks that are currently undergoing a discard.

Add a flag to the busy extent tracking structure to skip the discard
for allocation btree blocks.  In normal operation these blocks are
reused frequently enough that there is no need to discard them
anyway, but if they spill over to the allocation btree as part of a
balance we "leak" blocks that we would otherwise discard.  We could
fix this by adding another flag and keeping these block in the
rbtree even after they aren't busy any more so that we could discard
them when they migrate out of the AGFL.  Given that this would cause
significant overhead I don't think it's worthwile for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: add online discard support</title>
<updated>2011-05-24T16:17:13+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-05-20T13:45:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e84661aa84e2e003738563f65155d4f12dc474e7'/>
<id>e84661aa84e2e003738563f65155d4f12dc474e7</id>
<content type='text'>
Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard.  Note that we don't bother
supporting discard for the non-delaylog mode.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard.  Note that we don't bother
supporting discard for the non-delaylog mode.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: obey minleft values during extent allocation correctly</title>
<updated>2011-05-19T17:03:48+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2011-04-21T09:34:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bf59170a66bc3eaf3ee513aa6ce9774aa2ab5188'/>
<id>bf59170a66bc3eaf3ee513aa6ce9774aa2ab5188</id>
<content type='text'>
When allocating an extent that is long enough to consume the
remaining free space in an AG, we need to ensure that the allocation
leaves enough space in the AG for any subsequent bmap btree blocks
that are needed to track the new extent. These have to be allocated
in the same AG as we only reserve enough blocks in an allocation
transaction for modification of the freespace trees in a single AG.

xfs_alloc_fix_minleft() has been considering blocks on the AGFL as
free blocks available for extent and bmbt block allocation, which is
not correct - blocks on the AGFL are there exclusively for the use
of the free space btrees. As a result, when minleft is less than the
number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim
the given extent to leave minleft blocks available for bmbt
allocation, and hence we can fail allocation during bmbt record
insertion.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When allocating an extent that is long enough to consume the
remaining free space in an AG, we need to ensure that the allocation
leaves enough space in the AG for any subsequent bmap btree blocks
that are needed to track the new extent. These have to be allocated
in the same AG as we only reserve enough blocks in an allocation
transaction for modification of the freespace trees in a single AG.

xfs_alloc_fix_minleft() has been considering blocks on the AGFL as
free blocks available for extent and bmbt block allocation, which is
not correct - blocks on the AGFL are there exclusively for the use
of the free space btrees. As a result, when minleft is less than the
number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim
the given extent to leave minleft blocks available for bmbt
allocation, and hence we can fail allocation during bmbt record
insertion.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy</title>
<updated>2011-04-28T18:18:09+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-04-24T19:06:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8a072a4d4c6a5b6ec32836c467d2996393c76c6f'/>
<id>8a072a4d4c6a5b6ec32836c467d2996393c76c6f</id>
<content type='text'>
Instead of finding the per-ag and then taking and releasing the pagb_lock
for every single busy extent completed sort the list of busy extents and
only switch betweens AGs where nessecary.  This becomes especially important
with the online discard support which will hit this lock more often.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of finding the per-ag and then taking and releasing the pagb_lock
for every single busy extent completed sort the list of busy extents and
only switch betweens AGs where nessecary.  This becomes especially important
with the online discard support which will hit this lock more often.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
