linux-toradex.git/drivers/md, branch v2.6.28.8

md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery.

2009-03-17T00:32:09+00:00

commit 09b4068a7fe442efc40e9dcbcf5ff37c3338ab15 upstream.

When doing recovery on a raid10 with a write-intent bitmap, we only
need to recovery chunks that are flagged in the bitmap.

However if we choose to skip a chunk as it isn't flag, the code
currently skips the whole raid10-chunk, thus it might not recovery
some blocks that need recovering.

This patch fixes it.

In case that is confusing, it might help to understand that there
is a 'raid10 chunk size' which guides how data is distributed across
the devices, and a 'bitmap chunk size' which says how much data
corresponds to a single bit in the bitmap.

This bug only affects cases where the bitmap chunk size is smaller
than the raid10 chunk size.



Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery.

2009-03-17T00:32:09+00:00

commit 78200d45cde2a79c0d0ae0407883bb264caa3c18 upstream.

For raid1/4/5/6, resync (fixing inconsistencies between devices) is
very similar to recovery (rebuilding a failed device onto a spare).
The both walk through the device addresses in order.

For raid10 it can be quite different.  resync follows the 'array'
address, and makes sure all copies are the same.  Recover walks
through 'device' addresses and recreates each missing block.

The 'bitmap_cond_end_sync' function allows the write-intent-bitmap
(When present) to be updated to reflect a partially completed resync.
It makes assumptions which mean that it does not work correctly for
raid10 recovery at all.

In particularly, it can cause bitmap-directed recovery of a raid10 to
not recovery some of the blocks that need to be recovered.

So move the call to bitmap_cond_end_sync into the resync path, rather
than being in the common "resync or recovery" path.


Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: avoid races when stopping resync.

2009-03-17T00:32:08+00:00

commit 73d5c38a9536142e062c35997b044e89166e063b upstream.

There has been a race in raid10 and raid1 for a long time
which has only recently started showing up due to a scheduler changed.

When a sync_read request finishes, as soon as reschedule_retry
is called, another thread can mark the resync request as having
completed, so md_do_sync can finish, ->stop can be called, and
->conf can be freed.  So using conf after reschedule_retry is not
safe.

Similarly, when finishing a sync_write, calling md_done_sync must be
the last thing we do, as it allows a chain of events which will free
conf and other data structures.

The first of these requires action in raid10.c
The second requires action in raid1.c and raid10.c

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Fix a bug in linear.c causing which_dev() to return the wrong device.

2009-02-12T17:50:25+00:00

commit 852c8bf484a0e17ee27f413ef26e87f522af5607 upstream.

ab5bd5cbc8d4b868378d062eed3d4240930fbb86 introduced the following
bug in linear software raid for large arrays on 32 bit machines:

which_dev() computes the device holding a given sector by shifting
down the sector number to a 32 bit range, dividing by the array
spacing and looking up the resulting index in the hash table of
the array.

Because the computed index might be slightly too small, a loop at
the end of which_dev() increases the index until the given sector
actually falls into the range of the device associated with that index.

The changes of the above mentioned commit caused this loop to check
whether the _index_ rather than the sector number is small enough,
effectively bypassing the loop and thus possibly returning the wrong
device.

As reported by Simon Kirby, this leads to errors such as

	linear_make_request: Sector 2340486136 out of bounds on dev sdi: 156301312 sectors, offset 2109870464

Fix this bug by introducing a local variable for the index so that
the variable containing the passed sector is left unchanged.

Signed-off-by: Andre Noll 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Ensure an md array never has too many devices.

2009-02-12T17:50:24+00:00

commit de01dfadf25bf83cfe3d85c163005c4320532658 upstream.

Each different metadata format supported by md supports a
different maximum number of devices.
We really should be enforcing this maximum in the kernel, but
we aren't quite doing that properly.

We currently only enforce it at the 'hot_add' point, which is an
older interface which is not used by current userspace.

We need to also enforce it at 'add_new_disk' time for active arrays
and at 'do_md_run' time when starting a new array.

So move the test from 'hot_add' into 'bind_rdev_to_array' which is
called from both 'hot_add' and 'add_new_disk, and add a new
test in 'analyse_sbs' which is called from 'do_md_run'.

This bug (or missing feature) has been around "forever" and so
the patch is suitable for any -stable that is currently maintained.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: fix bitmap-on-external-file bug.

2009-01-18T18:43:47+00:00

commit 538452700d95480c16e7aa6b10ff77cd937d33f4 upstream.

commit a2ed9615e3222645007fc19991aedf30eed3ecfd
fixed a bug with 'internal' bitmaps, but in the process broke
'in a file' bitmaps.  So they are broken in 2.6.28

This fixes it, and needs to go in 2.6.28-stable.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

dm log: fix dm_io_client leak on error paths

2009-01-18T18:43:47+00:00

commit c7a2bd19b7c1e0bd2c7604c53d2583e91e536948 upstream.

In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.

Signed-off-by: Takahiro Yasui 
Acked-by: Jonathan Brassow 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm raid1: fix error count

2009-01-18T18:43:46+00:00

commit d460c65a6a9ec9e0d284864ec3a9a2d1b73f0e43 upstream.

Always increase the error count when I/O on a leg of a mirror fails.

The error count is used to decide whether to select an alternative
mirror leg.  If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.

Fix it by increasing error_count before the handle_errors feature
checking.

Signed-off-by: Milan Broz 
Signed-off-by: Jonathan Brassow 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: Don't read past end of bitmap when reading bitmap.

2008-12-19T05:25:01+00:00

When we read the write-intent-bitmap off the device, we currently
read a whole number of pages.
When PAGE_SIZE is 4K, this works due to the alignment we enforce
on the superblock and bitmap.
When PAGE_SIZE is 64K, this case read past the end-of-device
which causes an error.

When we write the superblock, we ensure to clip the last page
to just be the required size.  Copy that code into the read path
to just read the required number of sectors.

Signed-off-by: Neil Brown 
Cc: stable@kernel.org

block: fix setting of max_segment_size and seg_boundary mask

2008-12-03T11:55:55+00:00

Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.

When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.

If you create MD device over SCSI, these attributes are zeroed.

Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:

request_queue   max_segment_size  seg_boundary_mask
SCSI                65536             0xffffffff
MD RAID1                0                      0
LVM                 65536                 -1 (64bit)

Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
physical segments according to these parameters.

During the generic_make_request() is segment cout recalculated and can
increase bio->bi_phys_segments count over the allowed limit.  (After
bio_clone() in stack operation.)

Thi is specially problem in CCISS driver, where it produce OOPS here

    BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

(MAXSEGENTRIES is 31 by default.)

Sometimes even this command is enough to cause oops:

  dd iflag=direct if=/dev// of=/dev/null bs=128000 count=10

This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).

For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().

The patch tries to fix it by:

 * initializing attributes above in queue request constructor
   blk_queue_make_request()

 * make sure that blk_queue_stack_limits() inherits setting

 (DM uses its own function to set the limits because it
 blk_queue_stack_limits() was introduced later.  It should probably switch
 to use generic stack limit function too.)

 * sets the default seg_boundary value in one place (blkdev.h)

 * use this mask as default in DM (instead of -1, which differs in 64bit)

Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639
http://bugzilla.kernel.org/show_bug.cgi?id=8672

Signed-off-by: Milan Broz 
Reviewed-by: Alasdair G Kergon 
Cc: Neil Brown 
Cc: FUJITA Tomonori 
Cc: Tejun Heo 
Cc: Mike Miller 
Signed-off-by: Jens Axboe