Age | Commit message (Collapse) | Author |
|
commit ebfd32bba9b518d684009d9d21a56742337ca1b3 upstream.
This patch fixes two bugs that revolve around the miscalculation and
misuse of the variable 'overhead_size'. 'overhead_size' is the size of
the various header structures used during communication.
The first bug is the use of 'sizeof' with the pointer of a structure
instead of the structure itself - resulting in the wrong size being
computed. This is then used in a check to see if the payload
(data_size) would be to large for the preallocated structure. Since the
bug produces a smaller value for the overhead, it was possible for the
structure to be breached. (Although the current users of the code do
not currently send enough data to trigger this bug.)
The second bug is that the 'overhead_size' value is used to compute how
much of the preallocated space should be cleared before populating it
with fresh data. This should have simply been 'sizeof(struct cn_msg)'
not overhead_size. The fact that 'overhead_size' was computed
incorrectly made this problem "less bad" - leaving only a pointer's
worth of space at the end uncleared. Thus, this bug was never producing
a bad result, but still needs to be fixed - especially now that the
value is computed correctly.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 781248c1b50c776a9ef4be1130f84ced1cba42fe upstream.
If a table containing zero as stripe count is passed into stripe_ctr
the code attempts to divide by zero.
This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
is zero.
We now get the following error messages:
device-mapper: table: 253:0: striped: Invalid stripe count
device-mapper: ioctl: error adding target to table
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9eb07c259207d048e3ee8be2a77b2a4680b1edd4 upstream.
This code was written long ago when it was not possible to
reshape a degraded array. Now it is so the current level of
degraded-ness needs to be taken in to account. Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.
In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5->6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.
If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".
This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.
Reported-by: Michael Evans <mjevans1983@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0f9552b5dc4fe10da37fa3f4a4ca185d90fa41c9 upstream.
The start_ro modules parameter can be used to force arrays to be
started in 'auto-readonly' in which they are read-only until the first
write. This ensures that no resync/recovery happens until something
else writes to the device. This is important for resume-from-disk
off an md array.
However if an array is started 'readonly' (by writing 'readonly' to
the 'array_state' sysfs attribute) we want it to be really 'readonly',
not 'auto-readonly'.
So strengthen the condition to only set auto-readonly if the
array is not already read-only.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b27d7f16d3c6c27345d4280a739809c1c2c4c0b5 upstream.
Make DM use bdev_stack_limits() function so that partition offsets get
taken into account when calculating alignment. Clarify stacking
warnings.
Also remove obsolete clearing of final alignment_offset and misalignment
flag.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair G. Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cbd1998377504df005302ac90d49db72a48552a6 upstream.
evms configures md arrays by:
open device
send ioctl
close device
for each different ioctl needed.
Since 2.6.29, the device can disappear after the 'close'
unless a significant configuration has happened to the device.
The change made by "SET_ARRAY_INFO" can too minor to stop the device
from disappearing, but important enough that losing the change is bad.
So: make sure SET_ARRAY_INFO sets mddev->ctime, and keep the device
active as long as ctime is non-zero (it gets zeroed with lots of other
things when the array is stopped).
This is suitable for -stable kernels since 2.6.29.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 542da317668c35036e8471822a564b609d05af66 upstream.
The "wipe key" message is used to wipe the volume key from memory
temporarily, for example when suspending to RAM.
But the initialisation vector in ESSIV mode is calculated from the
hashed volume key, so the wipe message should wipe this IV key too and
reinitialise it when the volume key is reinstated.
This patch adds an IV wipe method called from a wipe message callback.
ESSIV is then reinitialised using the init function added by the
last patch.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b95bf2d3d5a48b095bffe2a0cd8c40453cf59557 upstream.
This patch separates the construction of IV from its initialisation.
(For ESSIV it is a hash calculation based on volume key.)
Constructor code now preallocates hash tfm and salt array
and saves it in a private IV structure.
The next patch requires this to reinitialise the wiped IV
without reallocating memory when resuming a suspended device.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8e87b9b81b3c370f7e53c1ab6e1c3519ef37a644 upstream.
Under some special conditions the snapshot hash_size is calculated as zero.
This patch instead sets a minimum value of 64, the same as for the
pending exception table.
rounddown_pow_of_two(0) is an undefined operation (it expands to shift
by -1). init_exception_table with an argument of 0 would fail with -ENOMEM.
The way to trigger the problem is to create a snapshot with a chunk size
that is larger than the origin device.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6076905b5ef39e0ea58db32583c9e0036c05e47b upstream.
Fix a reported deadlock if there are still unprocessed multipath events
on a device that is being removed.
_hash_lock is held during dev_remove while trying to send the
outstanding events. Sending the events requests the _hash_lock
again in dm_copy_name_and_uuid.
This patch introduces a separate lock around regions that modify the
link to the hash table (dm_set_mdptr) or the name or uuid so that
dm_copy_name_and_uuid no longer needs _hash_lock.
Additionally, dm_copy_name_and_uuid can only be called if md exists
so we can drop the dm_get() and dm_put() which can lead to a BUG()
while md is being freed.
The deadlock:
#0 [ffff8106298dfb48] schedule at ffffffff80063035
#1 [ffff8106298dfc20] __down_read at ffffffff8006475d
#2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
#3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
#4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
#5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
#6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
#7 [ffff8106298dfd30] dev_remove at ffffffff88250865
#8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
#9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
#10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
#11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
#12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)
Reported-by: guy keren <choo@actcom.co.il>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5861f1be00b3b70f8ab5e5a81392a6cf69666cd2 upstream.
Use kzfree for salt deallocation because it is derived from the volume
key. Use a common error path in ESSIV constructor.
Required by a later patch which fixes the way key material is wiped
from memory.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6047359277517c4e56d8bfd6ea4966d7a3924151 upstream.
Define private structures for IV so it's easy to add further attributes
in a following patch which fixes the way key material is wiped from
memory. Also move ESSIV destructor and remove unnecessary 'status'
operation.
There are no functional changes in this patch.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 94e76572b5dd37b1f0f4b3742ee8a565daead932 upstream.
Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE.
Commit 4c6fff445d7aa753957856278d4d93bcad6e2c14
(dm-snapshot-lock-snapshot-while-supplying-status.patch)
introduced this use of the lock, but userspace applications using
libdevmapper have been found to request STATUSTYPE_TABLE while the device
is suspended and the lock is already held, leading to deadlock. Since
the lock is not necessary in this case, don't try to take it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 613978f8711c7fd4d0aa856872375d2abd7c92ff upstream.
Error handling code following a kmalloc should free the allocated data.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit aa5cbd103887011b4830355f88fb055f9ad2d556 upstream.
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap. If it is, it can dereference the
bitmap after it has been freed.
So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.
This is suitable for any current -stable kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4706b349f was a forward port of a fix that was needed
for SLES10. But in fact it is not needed in mainline because
the earlier commit dd00a99e7a fixes the same problem in a
better way.
Further, this commit introduces a bug in the way it interacts with
the automatic read-error-correction. If, after a read error is
successfully corrected, the same disk is chosen to re-read - the
re-read won't be attempted but an error will be returned instead.
After reverting that commit, there is the possibility that a
read error on a read-only array (where read errors cannot
be corrected as that requires a write) will repeatedly read the same
device and continue to get an error.
So in the "Array is readonly" case, fail the drive immediately on
a read error.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
|
|
degraded.
Normally is it not safe to allow a raid5 that is both dirty and
degraded to be assembled without explicit request from that admin, as
it can cause hidden data corruption.
This is because 'dirty' means that the parity cannot be trusted, and
'degraded' means that the parity needs to be used.
However, if the device that is missing contains only parity, then
there is no issue and assembly can continue.
This particularly applies when a RAID5 is being converted to a RAID6
and there is an unclean shutdown while the conversion is happening.
So check for whether the degraded space only contains parity, and
in that case, allow the assembly.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
When a reshape finds that it can add spare devices into the array,
those devices might already be 'in_sync' if they are beyond the old
size of the array, or they might not if they are within the array.
The first case happens when we change an N-drive RAID5 to an
N+1-drive RAID5.
The second happens when we convert an N-drive RAID5 to an
N+1-drive RAID6.
So set the flag more carefully.
Also, ->recovery_offset is only meaningful when the flag is clear,
so only set it in that case.
This change needs the preceding two to ensure that the non-in_sync
device doesn't get evicted from the array when it is stopped, in the
case where v0.90 metadata is used.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This is a combination that didn't really make sense before.
However when a reshape is converting e.g. raid5 -> raid6, the extra
device is not fully in-sync, but is certainly active and contains
important data.
So allow that start to be meaningful and in particular get
the 'recovery_offset' value (which is needed for any non-in-sync
active device) from the reshape_position.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Each device has its own 'recovery_offset' showing how far
recovery has progressed on the device.
As the only real significance of this is that fact that it can
be stored in the metadata and recovered at restart, and as
only 1.x metadata can do this, we were only updating
'recovery_offset' to 'curr_resync_completed' when updating
v1.x metadata.
But this is wrong, and we will shortly make limited use of this
field in v0.90 metadata.
So move the update into common code.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This value is visible through sysfs and is used by mdadm
when it manages a reshape (backing up data that is about to be
rearranged). So it is important that it is always correct.
Current it does not get updated properly when a reshape
starts which can cause problems when assembling an array
that is in the middle of being reshaped.
This is suitable for 2.6.31.y stable kernels.
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
If a 'sync_max' has been set (via sysfs), it is wrong to clear it
until a resync (or reshape or recovery ...) actually reached that
point.
So if a resync is interrupted (e.g. by device failure),
leave 'resync_max' unchanged.
This is particularly important for 'reshape' operations that do not
change the size of the array. For such operations mdadm needs to
monitor the reshape taking rolling backups of the section being
reshaped. If resync_max gets cleared, the reshape can get ahead of
mdadm and then the backups that mdadm creates are useless.
This is suitable for 2.6.31.y stable kernels.
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
* 'for-linus' of git://neil.brown.name/md:
async_tx: fix asynchronous raid6 recovery for ddf layouts
async_pq: rename scribble page
async_pq: kill a stray dma_map() call and other cleanups
md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning
raid6/async_tx: handle holes in block list in async_syndrome_val
md/async: don't pass a memory pointer as a page pointer.
md: Fix handling of raid5 array which is being reshaped to fewer devices.
md: fix problems with RAID6 calculations for DDF.
md/raid456: downlevel multicore operations to raid_run_ops
md: drivers/md/unroll.pl replaced with awk analog
md: remove clumsy usage of do_sync_mapping_range from bitmap code
md: raid1/raid10: handle allocation errors during array setup.
md/raid5: initialize conf->device_lock earlier
md/raid1/raid10: add a cond_resched
Revert "md: do not progress the resync process if the stripe was blocked"
|
|
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Allow the snapshot chunk size to be smaller than the page size
The code is now capable of handling this due to some previous
fixes and enhancements.
As the page size varies between computers, prior to this patch,
the chunk size of a snapshot dictated which machines could read it:
Snapshots created on one machine might not be readable on another.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Use unsigned integer chunk size.
Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
This patch locks the snapshot when returning status. It fixes a race
when it could return an invalid number of free chunks if someone
was simultaneously modifying it.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Properly close the device if failing because of an invalid chunk size.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
If we are creating snapshot with memory-stored exception store, fail if
the user didn't specify chunk size. Zero chunk size would probably crash
a lot of places in the rest of snapshot code.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Multiple instances of dec_pending() can run concurrently so a lock is
needed when it saves the first error code.
I have never experienced actual problem without locking and just found
this during code inspection while implementing the barrier support
patch for request-based dm.
This patch adds the locking.
I've done compile, boot and basic I/O testings.
Cc: stable@kernel.org
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Add missing del_gendisk() to error path when creation of workqueue fails.
Otherwice there is a resource leak and following warning is shown:
WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x160()
sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0'
Cc: stable@kernel.org
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
mips:
drivers/md/dm-log-userspace-base.c: In function `userspace_ctr':
drivers/md/dm-log-userspace-base.c:159: warning: cast from pointer to integer of different size
Cc: stable@kernel.org
Cc: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
While initializing the snapshot module, if we fail to register
the snapshot target then we must back-out the exception store
module initialization.
Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.
https://bugzilla.redhat.com/show_bug.cgi?id=182659
For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size. Let's have chunks A, B, C in these snapshots:
snapshot1: ====A==== ====B====
snapshot2: ==========C==========
(Chunk size is a power of 2. Chunks are aligned.)
A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.
Then another write to the origin comes along at a position within B and
C. It creates pending exception for B. C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.
If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.
To avoid this situation, we maintain snapshots sorted in descending
order of chunk size. This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.
Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
md/raid6 passes a list of 'struct page *' to the async_tx routines,
which then either DMA map them for offload, or take the page_address
for CPU based calculations.
For RAID6 we sometime leave 'blanks' in the list of pages.
For CPU based calcs, we want to treat theses as a page of zeros.
For offloaded calculations, we simply don't pass a page to the
hardware.
Currently the 'blanks' are encoded as a pointer to
raid6_empty_zero_page. This is a 4096 byte memory region, not a
'struct page'. This is mostly handled correctly but is rather ugly.
So change the code to pass and expect a NULL pointer for the blanks.
When taking page_address of a page, we need to check for a NULL and
in that case use raid6_empty_zero_page.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
When a raid5 (or raid6) array is being reshaped to have fewer devices,
conf->raid_disks is the latter and hence smaller number of devices.
However sometimes we want to use a number which is the total number of
currently required devices - the larger of the 'old' and 'new' sizes.
Before we implemented reducing the number of devices, this was always
'new' i.e. ->raid_disks.
Now we need max(raid_disks, previous_raid_disks) in those places.
This particularly affects assembling an array that was shutdown while
in the middle of a reshape to fewer devices.
md.c needs a similar fix when interpreting the md metadata.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The percpu conversion allowed a straightforward handoff of stripe
processing to the async subsytem that initially showed some modest gains
(+4%). However, this model is too simplistic and leads to stripes
bouncing between raid5d and the async thread pool for every invocation
of handle_stripe(). As reported by Holger this can fall into a
pathological situation severely impacting throughput (6x performance
loss).
By downleveling the parallelism to raid_run_ops the pathological
stripe_head bouncing is eliminated. This version still exhibits an
average 11% throughput loss for:
mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
echo 1024 > /sys/block/md0/md/stripe_cache_size
dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
...but the results are at least stable and can be used as a base for
further multicore experimentation.
Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
drivers/md/unroll.pl replaced by awk script to drop build-time
dependency on perl
Signed-off-by: Vladimir Dronnikov <dronnikov@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
and replace with vfs_fsync which is much neater (but wasn't exported,
or even in existence at the time the code was written).
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.
So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Deallocating a raid5_conf_t structure requires taking 'device_lock'.
Ensure it is initialized before it is used, i.e. initialize the lock
before attempting any further initializations that might fail.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).
Reported-By: Lee Howard <faxguy@howardsilvan.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This reverts commit df10cfbc4d7ab93260d997df754219d390d62a9d.
This patch was based on a misunderstanding and risks introducing a busy-wait loop.
So revert it.
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.
But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.
So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b
The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;
- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
--
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
Revert "Seperate read and write statistics of in_flight requests"
cfq-iosched: don't delay async queue if it hasn't dispatched at all
block: Topology ioctls
cfq-iosched: use assigned slice sync value, not default
cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
cfq-iosched: implement slower async initiate and queue ramp up
cfq-iosched: delay async IO dispatch, if sync IO was just done
cfq-iosched: add a knob for desktop interactiveness
Add a tracepoint for block request remapping
block: allow large discard requests
block: use normal I/O path for discard requests
swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
fs/bio.c: move EXPORT* macros to line after function
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
cciss: fix build when !PROC_FS
block: Do not clamp max_hw_sectors for stacking devices
block: Set max_sectors correctly for stacking devices
cciss: cciss_host_attr_groups should be const
cciss: Dynamically allocate the drive_info_struct for each logical drive.
cciss: Add usage_count attribute to each logical drive in /sys
...
|
|
This reverts commit a9327cac440be4d8333bba975cbbf76045096275.
Corrado Zoccolo <czoccolo@gmail.com> reports:
"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
10,70 0,00 3,16 15,75 0,00 70,38
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 18,22 0,00 0,67 0,01 14,77 0,02
43,94 0,01 10,53 39043915,03 2629219,87
sdb 60,89 9,68 50,79 3,04 1724,43 50,52
65,95 0,70 13,06 488437,47 2629219,87
avg-cpu: %user %nice %system %iowait %steal %idle
2,72 0,00 0,74 0,00 0,00 96,53
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle
6,68 0,00 0,99 0,00 0,00 92,33
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle
4,40 0,00 0,73 1,47 0,00 93,40
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 4,00 0,00 3,00 0,00 28,00
18,67 0,06 19,50 333,33 100,00
Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.
I bisected it down to:
[a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."
So until this is debugged, revert the bad commit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|