linux-toradex.git/fs/ext4, branch v2.6.30.2

ext4: Fix race in ext4_inode_info.i_cached_extent

2009-05-15T13:07:28+00:00

If two CPU's simultaneously call ext4_ext_get_blocks() at the same
time, there is nothing protecting the i_cached_extent structure from
being used and updated at the same time.  This could potentially cause
the wrong location on disk to be read or written to, including
potentially causing the corruption of the block group descriptors
and/or inode table.

This bug has been in the ext4 code since almost the very beginning of
ext4's development.  Fortunately once the data is stored in the page
cache cache, ext4_get_blocks() doesn't need to be called, so trying to
replicate this problem to the point where we could identify its root
cause was *extremely* difficult.  Many thanks to Kevin Shanahan for
working over several months to be able to reproduce this easily so we
could finally nail down the cause of the corruption.

Signed-off-by: "Theodore Ts'o" 
Reviewed-by: "Aneesh Kumar K.V"

ext4: Clear the unwritten buffer_head flag after the extent is initialized

2009-05-14T21:05:39+00:00

The BH_Unwritten flag indicates that the buffer is allocated on disk
but has not been written; that is, the disk was part of a persistent
preallocation area.  That flag should only be set when a get_blocks()
function is looking up a inode's logical to physical block mapping.

When ext4_get_blocks_wrap() is called with create=1, the uninitialized
extent is converted into an initialized one, so the BH_Unwritten flag
is no longer appropriate.  Hence, we need to make sure the
BH_Unwritten is not left set, since the combination of BH_Mapped and
BH_Unwritten is not allowed; among other things, it will result ext4's
get_block() to be called over and over again during the write_begin
phase of write(2).

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: "Theodore Ts'o"

ext4: Use a fake block number for delayed new buffer_head

2009-05-12T18:40:37+00:00

Use a very large unsigned number (~0xffff) as as the fake block number
for the delayed new buffer. The VFS should never try to write out this
number, but if it does, this will make it obvious.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: "Theodore Ts'o"

ext4: Fix sub-block zeroing for writes into preallocated extents

2009-05-13T22:36:58+00:00

We need to mark the buffer_head mapping preallocated space as new
during write_begin. Otherwise we don't zero out the page cache content
properly for a partial write. This will cause file corruption with
preallocation.

Now that we mark the buffer_head new we also need to have a valid
buffer_head blocknr so that unmap_underlying_metadata() unmaps the
correct block.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: "Theodore Ts'o"

ext4: Do not try to validate extents on special files

2009-04-24T22:45:35+00:00

The EXTENTS_FL flag should never be set on special files, but if it
is, don't bother trying to validate that the extents tree is valid,
since only files, directories, and non-fast symlinks will ever have an
extent data structure.  We perhaps should flag the filesystem as being
corrupted if we see a special file (named pipes, device nodes, Unix
domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
currently check this case, so we'll just ignore this for now, since
it's harmless.

Without this fix, a special device with the extents flag is flagged as
an error by the kernel, so it is impossible to access or delete the
inode, but e2fsck doesn't see it as a problem, leading to
confused/frustrated users.

Signed-off-by: "Theodore Ts'o"

ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is present

2009-04-24T20:11:18+00:00

Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
bit is set.  The field is normally zero, but older versions of e2fsck
didn't automatically check to make sure of this, so in the spirit of
"be liberal in what you accept", don't look at i_file_acl_high unless
we are using a 64-bit filesystem.

Signed-off-by: "Theodore Ts'o"

ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inode

2009-04-24T17:43:20+00:00

If the block containing external extended attributes (which is stored
in i_file_acl and i_file_acl_high) is larger than the on-disk
filesystem, the process which tried to access the extended attributes
will endlessly issue kernel printks complaining that
"__find_get_block_slow() failed", locking up that CPU until the system
is forcibly rebooted.

So when we read in the inode, make sure the i_file_acl value is legal,
and if not, flag the filesystem as being corrupted.

Signed-off-by: "Theodore Ts'o"

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

2009-04-24T15:37:40+00:00

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix potential inode allocation soft lockup in Orlov allocator
  ext4: Make the extent validity check more paranoid
  jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  ext4: really print the find_group_flex fallback warning only once

ext4: Fix potential inode allocation soft lockup in Orlov allocator

2009-04-23T01:00:36+00:00

If the Orlov allocator is having trouble finding an appropriate block
group, the fallback code could loop forever, causing a soft lockup
warning in find_group_orlov():

BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728]
     ...
Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo          
EIP: 0060:[] EFLAGS: 00000246 CPU: 0
EIP is at ext4_get_group_desc+0x54/0x9d
    ...
Call Trace:
 [] find_group_orlov+0x2ee/0x334
 [] ? sched_clock+0x8/0xb
 [] ext4_new_inode+0x2cf/0xb1a

Signed-off-by: "Theodore Ts'o"

ext4: Make the extent validity check more paranoid

2009-04-23T00:52:25+00:00

Instead of just checking that the extent block number is greater or
equal than s_first_data_block, make sure it it is not pointing into
the block group descriptors, since that is clearly wrong.  This helps
prevent filesystem from getting very badly corrupted in case an extent
block is corrupted.

Signed-off-by: "Theodore Ts'o"