<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/ocfs2/suballoc.h, branch v3.2.26</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ocfs2: allow return of new inode block location before allocation of the inode</title>
<updated>2010-09-08T06:25:59+00:00</updated>
<author>
<name>Mark Fasheh</name>
<email>mfasheh@suse.com</email>
</author>
<published>2010-08-13T22:15:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e49e27674d1dd2717ad90b21ece8f83102153315'/>
<id>e49e27674d1dd2717ad90b21ece8f83102153315</id>
<content type='text'>
This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir).  Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir).  Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: Set suballoc_loc on allocated metadata.</title>
<updated>2010-03-26T02:09:15+00:00</updated>
<author>
<name>Joel Becker</name>
<email>joel.becker@oracle.com</email>
</author>
<published>2010-03-26T02:09:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2b6cb576aa80611f1f6a3c88708d1e68a8d97985'/>
<id>2b6cb576aa80611f1f6a3c88708d1e68a8d97985</id>
<content type='text'>
Get the suballoc_loc from ocfs2_claim_new_inode() or
ocfs2_claim_metadata().  Store it on the appropriate field of the block
we just allocated.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Get the suballoc_loc from ocfs2_claim_new_inode() or
ocfs2_claim_metadata().  Store it on the appropriate field of the block
we just allocated.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument.</title>
<updated>2010-05-06T05:59:06+00:00</updated>
<author>
<name>Joel Becker</name>
<email>joel.becker@oracle.com</email>
</author>
<published>2010-05-06T05:59:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1ed9b777f77929ae961d6f9cdf828a07200ba71c'/>
<id>1ed9b777f77929ae961d6f9cdf828a07200ba71c</id>
<content type='text'>
They all take an ocfs2_alloc_context, which has the allocation inode.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
They all take an ocfs2_alloc_context, which has the allocation inode.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: Pass suballocation results back via a structure.</title>
<updated>2010-04-13T06:30:19+00:00</updated>
<author>
<name>Joel Becker</name>
<email>joel.becker@oracle.com</email>
</author>
<published>2010-04-13T06:30:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d1fe093bf04124dcc50c5dde1765bd098464bfa'/>
<id>7d1fe093bf04124dcc50c5dde1765bd098464bfa</id>
<content type='text'>
We're going to be adding more info to a suballocator allocation.  Rather
than growing every function in the chain, let's pass a result structure
around.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We're going to be adding more info to a suballocator allocation.  Rather
than growing every function in the chain, let's pass a result structure
around.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: allocation reservations</title>
<updated>2010-05-06T01:17:30+00:00</updated>
<author>
<name>Mark Fasheh</name>
<email>mfasheh@suse.com</email>
</author>
<published>2009-12-07T21:10:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d02f00cc057809d96c044cc72d5b9809d59f7d49'/>
<id>d02f00cc057809d96c044cc72d5b9809d59f7d49</id>
<content type='text'>
This patch improves Ocfs2 allocation policy by allowing an inode to
reserve a portion of the local alloc bitmap for itself. The reserved
portion (allocation window) is advisory in that other allocation
windows might steal it if the local alloc bitmap becomes
full. Otherwise, the reservations are honored and guaranteed to be
free. When the local alloc window is moved to a different portion of
the bitmap, existing reservations are discarded.

Reservation windows are represented internally by a red-black
tree. Within that tree, each node represents the reservation window of
one inode. An LRU of active reservations is also maintained. When new
data is written, we allocate it from the inodes window. When all bits
in a window are exhausted, we allocate a new one as close to the
previous one as possible. Should we not find free space, an existing
reservation is pulled off the LRU and cannibalized.

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch improves Ocfs2 allocation policy by allowing an inode to
reserve a portion of the local alloc bitmap for itself. The reserved
portion (allocation window) is advisory in that other allocation
windows might steal it if the local alloc bitmap becomes
full. Otherwise, the reservations are honored and guaranteed to be
free. When the local alloc window is moved to a different portion of
the bitmap, existing reservations are discarded.

Reservation windows are represented internally by a red-black
tree. Within that tree, each node represents the reservation window of
one inode. An LRU of active reservations is also maintained. When new
data is written, we allocate it from the inodes window. When all bits
in a window are exhausted, we allocate a new one as close to the
previous one as possible. Should we not find free space, an existing
reservation is pulled off the LRU and cannibalized.

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: Clear undo bits when local alloc is freed</title>
<updated>2010-03-24T01:22:40+00:00</updated>
<author>
<name>Mark Fasheh</name>
<email>mfasheh@suse.com</email>
</author>
<published>2010-03-12T02:31:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b4414eea0e7b9c134262c801a87e338bf675962c'/>
<id>b4414eea0e7b9c134262c801a87e338bf675962c</id>
<content type='text'>
When the local alloc file changes windows, unused bits are freed back to the
global bitmap. By defnition, those bits can not be in use by any file. Also,
the local alloc will never have been able to allocate those bits if they
were part of a previous truncate. Therefore it makes sense that we should
clear unused local alloc bits in the undo buffer so that they can be used
immediatly.

[ Modified to call it ocfs2_release_clusters() -- Joel ]

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the local alloc file changes windows, unused bits are freed back to the
global bitmap. By defnition, those bits can not be in use by any file. Also,
the local alloc will never have been able to allocate those bits if they
were part of a previous truncate. Therefore it makes sense that we should
clear unused local alloc bits in the undo buffer so that they can be used
immediatly.

[ Modified to call it ocfs2_release_clusters() -- Joel ]

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: add extent block stealing for ocfs2 v5</title>
<updated>2010-02-26T23:41:07+00:00</updated>
<author>
<name>Tiger Yang</name>
<email>tiger.yang@oracle.com</email>
</author>
<published>2010-01-25T06:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b89c54282db0c8634a2d2dc200f196d571750ce5'/>
<id>b89c54282db0c8634a2d2dc200f196d571750ce5</id>
<content type='text'>
This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang &lt;tiger.yang@oracle.com&gt;
Acked-by: Tao Ma &lt;tao.ma@oracle.com&gt;
Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang &lt;tiger.yang@oracle.com&gt;
Acked-by: Tao Ma &lt;tao.ma@oracle.com&gt;
Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: fix rare stale inode errors when exporting via nfs</title>
<updated>2009-04-03T18:39:25+00:00</updated>
<author>
<name>wengang wang</name>
<email>wen.gang.wang@oracle.com</email>
</author>
<published>2009-03-06T13:29:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ca497a83e592d64e050c4d04b6dedb8c915f39a'/>
<id>6ca497a83e592d64e050c4d04b6dedb8c915f39a</id>
<content type='text'>
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang &lt;wen.gang.wang@oracle.com&gt;
Acked-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang &lt;wen.gang.wang@oracle.com&gt;
Acked-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: Optimize inode allocation by remembering last group</title>
<updated>2009-04-03T18:39:17+00:00</updated>
<author>
<name>Tao Ma</name>
<email>tao.ma@oracle.com</email>
</author>
<published>2009-02-24T16:53:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=138211515c102807a16c02fdc15feef1f6ef8124'/>
<id>138211515c102807a16c02fdc15feef1f6ef8124</id>
<content type='text'>
In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ocfs2: Validate metadata only when it's read from disk.</title>
<updated>2009-01-05T16:36:53+00:00</updated>
<author>
<name>Joel Becker</name>
<email>joel.becker@oracle.com</email>
</author>
<published>2008-11-13T22:49:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=970e4936d7d15f35d00fd15a14f5343ba78b2fc8'/>
<id>970e4936d7d15f35d00fd15a14f5343ba78b2fc8</id>
<content type='text'>
Add an optional validation hook to ocfs2_read_blocks().  Now the
validation function is only called when a block was actually read off of
disk.  It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly.  The group_descriptor validator needs to
be split into two pieces.  The first part only needs the gd buffer and
is passed to ocfs2_read_block().  The second part requires the dinode as
well, and is called every time.  It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an optional validation hook to ocfs2_read_blocks().  Now the
validation function is only called when a block was actually read off of
disk.  It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly.  The group_descriptor validator needs to
be split into two pieces.  The first part only needs the gd buffer and
is passed to ocfs2_read_block().  The second part requires the dinode as
well, and is called every time.  It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker &lt;joel.becker@oracle.com&gt;
Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
