linux-toradex.git/drivers/md, branch v2.6.27.12

md: fix bitmap-on-external-file bug.

2009-01-18T18:35:27+00:00

commit 538452700d95480c16e7aa6b10ff77cd937d33f4 upstream.

commit a2ed9615e3222645007fc19991aedf30eed3ecfd
fixed a bug with 'internal' bitmaps, but in the process broke
'in a file' bitmaps.  So they are broken in 2.6.28

This fixes it, and needs to go in 2.6.28-stable.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

dm raid1: fix error count

2009-01-18T18:35:26+00:00

commit d460c65a6a9ec9e0d284864ec3a9a2d1b73f0e43 upstream.

Always increase the error count when I/O on a leg of a mirror fails.

The error count is used to decide whether to select an alternative
mirror leg.  If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.

Fix it by increasing error_count before the handle_errors feature
checking.

Signed-off-by: Milan Broz 
Signed-off-by: Jonathan Brassow 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm log: fix dm_io_client leak on error paths

2009-01-18T18:35:26+00:00

commit c7a2bd19b7c1e0bd2c7604c53d2583e91e536948 upstream.

In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.

Signed-off-by: Takahiro Yasui 
Acked-by: Jonathan Brassow 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: Don't read past end of bitmap when reading bitmap.

2009-01-14T17:44:05+00:00

commit a2ed9615e3222645007fc19991aedf30eed3ecfd upstream.

When we read the write-intent-bitmap off the device, we currently
read a whole number of pages.
When PAGE_SIZE is 4K, this works due to the alignment we enforce
on the superblock and bitmap.
When PAGE_SIZE is 64K, this case read past the end-of-device
which causes an error.

When we write the superblock, we ensure to clip the last page
to just be the required size.  Copy that code into the read path
to just read the required number of sectors.

Signed-off-by: Neil Brown 
Signed-off-by: Greg Kroah-Hartman

dm raid1: flush workqueue before destruction

2008-11-20T22:54:51+00:00

commit 18776c7316545482a02bfaa2629a2aa1afc48357 upstream.

We queue work on keventd queue --- so this queue must be flushed in the
destructor. Otherwise, keventd could access mirror_set after it was freed.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: fix bug in raid10 recovery.

2008-11-13T17:55:57+00:00

commit a53a6c85756339f82ff19e001e90cfba2d6299a8 upstream

Adding a spare to a raid10 doesn't cause recovery to start.
This is due to an silly type in
  commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
and so is a bug in 2.6.27 and .28-rc.

Thanks to Thomas Backlund for bisecting to find this.

Cc: Thomas Backlund 
Cc: George Spelvin 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: linear: Fix a division by zero bug for very small arrays.

2008-11-13T17:55:57+00:00

commit f1cd14ae52985634d0389e934eba25b5ecf24565 upstream

Date: Thu, 6 Nov 2008 19:41:24 +1100
Subject: md: linear: Fix a division by zero bug for very small arrays.

We currently oops with a divide error on starting a linear software
raid array consisting of at least two very small (< 500K) devices.

The bug is caused by the calculation of the hash table size which
tries to compute sector_div(sz, base) with "base" being zero due to
the small size of the component devices of the array.

Fix this by requiring the hash spacing to be at least one which
implies that also "base" is non-zero.

This bug has existed since about 2.6.14.

Signed-off-by: Andre Noll 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

dm snapshot: fix primary_pe race

2008-10-25T21:32:39+00:00

commit 7c5f78b9d7f21937e46c26db82976df4b459c95c upstream

Fix a race condition with primary_pe ref_count handling.

put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.

__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.

This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.

CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
	origin_bios = bio_list_get(&primary_pe->origin_bios);
.. decrements primary_pe->ref_count to 1. Doesn't load origin_bios

CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
	flush_bios(bio_list_get(&primary_pe->origin_bios));
	free_pending_exception(primary_pe);
	/* If we got here, pe_queue is necessarily empty. */
	return r;
}
.. decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.

CPU1:
if (!primary_pe || primary_pe != pe)
	free_pending_exception(pe);
.. this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
	free_pending_exception(primary_pe);
.. sees ref_count == 0 (written by CPU 2), does double free !!

This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.

If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).

If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.

The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).

This race may be reason for bugs:
  http://bugzilla.kernel.org/show_bug.cgi?id=11636
  https://bugzilla.redhat.com/show_bug.cgi?id=465825

The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm kcopyd: avoid queue shuffle

2008-10-25T21:32:39+00:00

commit b673c3a8192e28f13e2050a4b82c1986be92cc15 upstream

Write throughput to LVM snapshot origin volume is an order
of magnitude slower than those to LV without snapshots or
snapshot target volumes, especially in the case of sequential
writes with O_SYNC on.

The following patch originally written by Kevin Jamieson and
Jan Blunck and slightly modified for the current RCs by myself
tries to improve the performance by modifying the behaviour
of kcopyd, so that it pushes back an I/O job to the head of
the job queue instead of the tail as process_jobs() currently
does when it has to wait for free pages. This way, write
requests aren't shuffled to cause extra seeks.

I tested the patch against 2.6.27-rc5 and got the following results.
The test is a dd command writing to snapshot origin followed by fsync
to the file just created/updated.  A couple of filesystem benchmarks
gave me similar results in case of sequential writes, while random
writes didn't suffer much.

dd if=/dev/zero of= bs=4096 count=...
   [conv=notrunc when updating]

1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
average throughput (MB/s)
                     10M     100M    1000M
create,dd         511.46   610.72    11.81
create,dd+fsync     7.10     6.77     8.13
update,dd         431.63   917.41    12.75
update,dd+fsync     7.79     7.43     8.12

compared with write throughput to LV without any snapshots,
all dd+fsync and 1000 MiB writes perform very poorly.

                     10M     100M    1000M
create,dd         555.03   608.98   123.29
create,dd+fsync   114.27    72.78    76.65
update,dd         152.34  1267.27   124.04
update,dd+fsync   130.56    77.81    77.84

2) linux 2.6.27-rc5 with the patch, write to snapshot origin,
average throughput (MB/s)

                     10M     100M    1000M
create,dd         537.06   589.44    46.21
create,dd+fsync    31.63    29.19    29.23
update,dd         487.59   897.65    37.76
update,dd+fsync    34.12    30.07    26.85

Although still not on par with plain LV performance -
cannot be avoided because it's copy on write anyway -
this simple patch successfully improves throughtput
of dd+fsync while not affecting the rest.

Signed-off-by: Jan Blunck 
Signed-off-by: Kazuo Ito 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: Fix rdev_size_store with size == 0

2008-10-22T21:21:08+00:00

commit 7d3c6f8717ee6c2bf6cba5fa0bda3b28fbda6015 upstream

Fix rdev_size_store with size == 0.
size == 0 means to use the largest size allowed by the
underlying device and is used when modifying an active array.

This fixes a regression introduced by
 commit d7027458d68b2f1752a28016dcf2ffd0a7e8f567

Signed-off-by: Chris Webb 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman