summaryrefslogtreecommitdiff
path: root/drivers/md/raid5.c
AgeCommit message (Collapse)Author
2014-09-13md/raid6: avoid data corruption during recovery of double-degraded RAID6NeilBrown
commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd upstream. During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions the patch must change handle_stripe6(). Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 Cc: Yuri Tikhonov <yur@emcraft.com> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: "Manibalan P" <pmanibalan@amiindia.co.in> Tested-by: "Manibalan P" <pmanibalan@amiindia.co.in> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423 Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02md/raid5: Fix CPU hotplug callback registrationOleg Nesterov
commit 789b5e0315284463617e106baad360cb9e8db3ac upstream. Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Interestingly, the raid5 code can actually prevent double initialization and hence can use the following simplified form of callback registration: register_cpu_notifier(&foobar_cpu_notifier); get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); put_online_cpus(); A hotplug operation that occurs between registering the notifier and calling get_online_cpus(), won't disrupt anything, because the code takes care to perform the memory allocations only once. So reorganize the code in raid5 this way to fix the deadlock with callback registration. Cc: linux-raid@vger.kernel.org Fixes: 36d1c6476be51101778882897b315bd928c8c7b5 Signed-off-by: Oleg Nesterov <oleg@redhat.com> [Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the free_scratch_buffer() helper to condense code further and wrote the changelog.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02md/raid5: fix long-standing problem with bitmap handling on write failure.NeilBrown
commit 9f97e4b128d2ea90a5f5063ea0ee3b0911f4c669 upstream. Before a write starts we set a bit in the write-intent bitmap. When the write completes we clear that bit if the write was successful to all devices. However if the write wasn't fully successful we should not clear the bit. If the faulty drive is subsequently re-added, the fact that the bit is still set ensure that we will re-write the data that is missing. This logic is mediated by the STRIPE_DEGRADED flag - we only clear the bitmap bit when this flag is not set. Currently we correctly set the flag if a write starts when some devices are failed or missing. But we do *not* set the flag if some device failed during the write attempt. This is wrong and can result in clearing the bit inappropriately. So: set the flag when a write fails. This bug has been present since bitmaps were introduces, so the fix is suitable for any -stable kernel. Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org> Signed-off-by: NeilBrown <neilb@suse.de> [bwh: Backported to 3.2: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15md/raid5: Fix possible confusion when multiple write errors occur.NeilBrown
commit 1cc03eb93245e63b0b7a7832165efdc52e25b4e6 upstream. commit 5d8c71f9e5fbdd95650be00294d238e27a363b5c md: raid5 crash during degradation Fixed a crash in an overly simplistic way which could leave R5_WriteError or R5_MadeGood set in the stripe cache for devices for which it is no longer relevant. When those devices are removed and spares added the flags are still set and can cause incorrect behaviour. commit 14a75d3e07c784c004b4b44b34af996b8e4ac453 md/raid5: preferentially read from replacement device if possible. Fixed the same bug if a more effective way, so we can now revert the original commit. Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 5d8c71f9e5fbdd95650be00294d238e27a363b5c Signed-off-by: NeilBrown <neilb@suse.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-25raid5: delayed stripe fixShaohua Li
commit fab363b5ff502d1b39ddcfec04271f5858d9f26e upstream. There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but the two bits have relationship. A delayed stripe can be moved to hold list only when preread active stripe count is below IO_THRESHOLD. If a stripe has both the bits set, such stripe will be in delayed list and preread count not 0, which will make such stripe never leave delayed list. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-12md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdevmajianpeng
commit 1850753d2e6d9ca7856581ca5d3cf09521e6a5d7 upstream. In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement nr_pending so we lose the reference we hold on the rdev. So atomic_inc it first to maintain the reference. This bug was introduced by commit 73e92e51b7969ef5477d md/raid5. Don't write to known bad block on doubtful devices. which appeared in 3.0, so patch is suitable for stable kernels since then. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-12md/raid5: Do not add data_offset before call to is_badblockmajianpeng
commit 6c0544e255dd6582a9899572e120fb55d9f672a4 upstream. In chunk_aligned_read() we are adding data_offset before calling is_badblock. But is_badblock also adds data_offset, so that is bad. So move the addition of data_offset to after the call to is_badblock. This bug was introduced by commit 31c176ecdf3563140e639 md/raid5: avoid reading from known bad blocks. which first appeared in 3.0. So that patch is suitable for any -stable kernel from 3.0.y onwards. However it will need minor revision for most of those (as the comment didn't appear until recently). Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> [bwh: Backported to 3.2: ignored missing comment] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2011-12-23md/raid5: ensure correct assessment of drives during degraded reshape.NeilBrown
While reshaping a degraded array (as when reshaping a RAID0 by first converting it to a degraded RAID4) we currently get confused about which devices are in_sync. In most cases we get it right, but in the region that is being reshaped we need to treat non-failed devices as in-sync when we have the data but haven't actually written it out yet. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-09md: raid5 crash during degradationAdam Kwolek
NULL pointer access causes crash in raid5 module. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08md/raid5: never wait for bad-block acks on failed device.NeilBrown
Once a device is failed we really want to completely ignore it. It should go away soon anyway. In particular the presence of bad blocks on it should not cause us to block as we won't be trying to write there anyway. So as soon as we can check if a device is Faulty, do so and pretend that it is already gone if it is Faulty. Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-08md/raid5: STRIPE_ACTIVE has lock semantics, add barriersDan Williams
All updates that occur under STRIPE_ACTIVE should be globally visible when STRIPE_ACTIVE clears. test_and_set_bit() implies a barrier, but clear_bit() does not. This is suitable for 3.1-stable. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
2011-11-08md/raid5: abort any pending parity operations when array fails.NeilBrown
When the number of failed devices exceeds the allowed number we must abort any active parity operations (checks or updates) as they are no longer meaningful, and can lead to a BUG_ON in handle_parity_checks6. This bug was introduce by commit 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 in 2.6.29. Reported-by: Manish Katiyar <mkatiyar@gmail.com> Tested-by: Manish Katiyar <mkatiyar@gmail.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-04Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c
2011-10-31md: Add module.h to all files using it implicitlyPaul Gortmaker
A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-26md: Fix some bugs in recovery_disabled handling.NeilBrown
In 3.0 we changed the way recovery_disabled was handle so that instead of testing against zero, we test an mddev-> value against a conf-> value. Two problems: 1/ one place in raid1 was missed and still sets to '1'. 2/ We didn't explicitly set the conf-> value at array creation time. It defaulted to '0' just like the mddev value does so they could appear equal and thus disable recovery. This did not affect normal 'md' as it calls bind_rdev_to_array which changes the mddev value. However the dmraid interface doesn't call this and so doesn't change ->recovery_disabled; so at array start all recovery is incorrectly disabled. So initialise the 'conf' value to one less that the mddev value, so the will only be the same when explicitly set that way. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-26md/raid5: fix bug that could result in reads from a failed device.NeilBrown
This bug was introduced in 415e72d034c50520ddb7ff79e7d1792c1306f0c9 which was in 2.6.36. There is a small window of time between when a device fails and when it is removed from the array. During this time we might still read from it, but we won't write to it - so it is possible that we could read stale data. We didn't need the test of 'Faulty' before because the test on In_sync is sufficient. Since we started allowing reads from the early part of non-In_sync devices we need a test on Faulty too. This is suitable for any kernel from 2.6.36 onwards, though the patch might need a bit of tweaking in 3.0 and earlier. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-19Merge branch 'v3.1-rc10' into for-3.2/coreJens Axboe
Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-11md: rename "mdk_personality" to "md_personality"NeilBrown
"mdk" doesn't mean anything any more. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md/raid5: typedef removal: raid5_conf_t -> struct r5confNeilBrown
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md/raid0: typedef removal: raid0_conf_t -> struct r0confNeilBrown
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: remove typedefs: mddev_t -> struct mddevNeilBrown
Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: removing typedefs: mdk_rdev_t -> struct md_rdevNeilBrown
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md: remove some old DEBUGging code.NeilBrown
This code is not really helpful and is hard to maintain, so just discard it. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md/raid5: convert to macros into inline functions.NeilBrown
More type-safety. Easier to read. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md/raid5: remove pointless NULL test.NeilBrown
In the 'abort' branch of run(), 'conf' cannot possibly be NULL, so remove the test. Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21md: Avoid waking up a thread after it has been freed.NeilBrown
Two related problems: 1/ some error paths call "md_unregister_thread(mddev->thread)" without subsequently clearing ->thread. A subsequent call to mddev_unlock will try to wake the thread, and crash. 2/ Most calls to md_wakeup_thread are protected against the thread disappeared either by: - holding the ->mutex - having an active request, so something else must be keeping the array active. However mddev_unlock calls md_wakeup_thread after dropping the mutex and without any certainty of an active request, so the ->thread could theoretically disappear. So we need a spinlock to provide some protections. So change md_unregister_thread to take a pointer to the thread pointer, and ensure that it always does the required locking, and clears the pointer properly. Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com> Signed-off-by: NeilBrown <neilb@suse.de> cc: stable@kernel.org
2011-09-12block: remove support for bio remapping from ->make_requestChristoph Hellwig
There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-31md/raid5: fix a hang on device failure.NeilBrown
Waiting for a 'blocked' rdev to become unblocked in the raid5d thread cannot work with internal metadata as it is the raid5d thread which will clear the blocked flag. This wasn't a problem in 3.0 and earlier as we only set the blocked flag when external metadata was used then. However we now set it always, so we need to be more careful. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid5: Clear bad blocks on successful write.NeilBrown
On a successful write to a known bad block, flag the sh so that raid5d can remove the known bad block from the list. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid5. Don't write to known bad block on doubtful devices.NeilBrown
If a device has seen write errors, don't write to any known bad blocks on that device. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid5: write errors should be recorded as bad blocks if possible.NeilBrown
When a write error is detected, don't mark the device as failed immediately but rather record the fact for handle_stripe to deal with. Handle_stripe then attempts to record a bad block. Only if that fails does the device get marked as faulty. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid5: use bad-block log to improve handling of uncorrectable read errors.NeilBrown
If we get an uncorrectable read error - record a bad block rather than failing the device. And if these errors (which may be due to known bad blocks) cause recovery to be impossible, record a bad block on the recovering devices, or abort the recovery. As we might abort a recovery without failing a device we need to teach RAID5 about recovery_disabled handling. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid5: avoid reading from known bad blocks.NeilBrown
There are two times that we might read in raid5: 1/ when a read request fits within a chunk on a single working device. In this case, if there is any bad block in the range of the read, we simply fail the cache-bypass read and perform the read though the stripe cache. 2/ when reading into the stripe cache. In this case we mark as failed any device which has a bad block in that strip (1 page wide). Note that we will both avoid reading and avoid writing. This is correct (as we will never read from the block, there is no point writing), but not optimal (as writing could 'fix' the error) - that will be addressed later. If we have not seen any write errors on the device yet, we treat a bad block like a recent read error. This will encourage an attempt to fix the read error which will either generate a write error, or will ensure good data is stored there. We don't yet forget the bad block in that case. That comes later. Now that we honour bad blocks when reading we can allow devices with bad blocks into the array. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md: make it easier to wait for bad blocks to be acknowledged.NeilBrown
It is only safe to choose not to write to a bad block if that bad block is safely recorded in metadata - i.e. if it has been 'acknowledged'. If it hasn't we need to wait for the acknowledgement. We support that using rdev->blocked wait and md_wait_for_blocked_rdev by introducing a new device flag 'BlockedBadBlock'. This flag is only advisory. It is cleared whenever we acknowledge a bad block, so that a waiter can re-check the particular bad blocks that it is interested it. It should be set by a caller when they find they need to wait. This (set after test) is inherently racy, but as md_wait_for_blocked_rdev already has a timeout, losing the race will have minimal impact. When we clear "Blocked" was also clear "BlockedBadBlocks" incase it was set incorrectly (see above race). We also modify the way we manage 'Blocked' to fit better with the new handling of 'BlockedBadBlocks' and to make it consistent between externally managed and internally managed metadata. This requires that each raidXd loop checks if the metadata needs to be written and triggers a write (md_check_recovery) if needed. Otherwise a queued write request might cause raidXd to wait for the metadata to write, and only that thread can write it. Before writing metadata, we set FaultRecorded for all devices that are Faulty, then after writing the metadata we clear Blocked for any device for which the Fault was certainly Recorded. The 'faulty' device flag now appears in sysfs if the device is faulty *or* it has unacknowledged bad blocks. So user-space which does not understand bad blocks can continue to function correctly. User space which does, should not assume a device is faulty until it sees the 'faulty' flag, and then sees the list of unacknowledged bad blocks is empty. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md: don't allow arrays to contain devices with bad blocks.NeilBrown
As no personality understand bad block lists yet, we must reject any device that is known to contain bad blocks. As the personalities get taught, these tests can be removed. This only applies to raid1/raid5/raid10. For linear/raid0/multipath/faulty the whole concept of bad blocks doesn't mean anything so there is no point adding the checks. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: Avoid BUG caused by multiple failures.NeilBrown
While preparing to write a stripe we keep the parity block or blocks locked (R5_LOCKED) - towards the end of schedule_reconstruction. If the array is discovered to have failed before this write completes we can leave those blocks LOCKED, and init_stripe will notice that a free stripe still has a locked block and will complain. So clear the R5_LOCKED flag in handle_failed_stripe, and demote the 'BUG' to a 'WARN_ON'. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27md/raid5: move rdev->corrected_errors countingNamhyung Kim
Read errors are considered to corrected if write-back and re-read cycle is finished without further problems. Thus moving the rdev-> corrected_errors counting after the re-reading looks more reasonable IMHO. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27md: introduce link/unlink_rdev() helpersNamhyung Kim
There are places where sysfs links to rdev are handled in a same way. Add the helper functions to consolidate them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27md/raid: use printk_ratelimited instead of printk_ratelimitChristian Dietrich
As per printk_ratelimit comment, it should not be used. Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de> Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27md/raid5: finalise new merged handle_stripe.NeilBrown
handle_stripe5() and handle_stripe6() are now virtually identical. So discard one and rename the other to 'analyse_stripe()'. It always returns 0, so change it to 'void' and remove the 'done' variable in handle_stripe(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: move some more common code into handle_stripeNeilBrown
The RAID6 version of this code is usable for RAID5 providing: - we test "conf->max_degraded" rather than "2" as appropriate - we make sure s->failed_num[1] is meaningful (and not '-1') when s->failed > 1 The 'return 1' must become 'goto finish' in the new location. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: move more common code into handle_stripeNeilBrown
Apart from 'prexor' which can only be set for RAID5, and 'qd_idx' which can only be meaningful for RAID6, these two chunks of code are nearly the same. So combine them into one adding a test to call either handle_parity_checks5 or handle_parity_checks6 as appropriate. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6NeilBrown
RAID6 is only allowed to choose 'reconstruct-write' while RAID5 is also allow 'read-modify-write' Apart from this difference, handle_stripe_dirtying[56] are nearly identical. So resolve these differences and create just one function. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: unite fetch_block5 and fetch_block6NeilBrown
Provided that ->failed_num[1] is not a valid device number (which is easily achieved) fetch_block6 provides all the functionality of fetch_block5. So remove the latter and rename the former to simply "fetch_block". Then handle_stripe_fill5 and handle_stripe_fill6 become the same and can similarly be united. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: rearrange a test in fetch_block6.NeilBrown
Next patch will unite fetch_block5 and fetch_block6. First I want to make the differences a little more clear. For RAID6 if we are writing at all and there is a failed device, then we need to load or compute every block so we can do a reconstruct-write. This case isn't needed for RAID5 - we will do a read-modify-write in that case. So make that test a separate test in fetch_block6 rather than merged with two other tests. Make a similar change in fetch_block5 so the one bit that is not needed for RAID6 is clearly separate. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: move more code into common handle_stripeNeilBrown
The difference between the RAID5 and RAID6 code here is easily resolved using conf->max_degraded. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: Move code for finishing a reconstruction into handle_stripe.NeilBrown
Prior to commit ab69ae12ceef7 the code in handle_stripe5 and handle_stripe6 to "Finish reconstruct operations initiated by the expansion process" was identical. That commit added an identical stanza of code to each function, but in different places. That was careless. The raid5 code was correct, so move that out into handle_stripe and remove raid6 version. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27md/raid5: Remove stripe_head_state arg from handle_stripe_expansion.NeilBrown
This arg is only used to differentiate between RAID5 and RAID6 but that is not needed. For RAID5, raid5_compute_sector will set qd_idx to "~0" so j with certainly not equals qd_idx, so there is no need for a guard on that condition. So remove the guard and remove the arg from the declaration and callers of handle_stripe_expansion. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26md/raid5: move stripe_head_state and more code into handle_stripe.NeilBrown
By defining the 'stripe_head_state' in 'handle_stripe', we can move some common code out of handle_stripe[56]() and into handle_stripe. The means that all accesses for stripe_head_state in handle_stripe[56] need to be 's->' instead of 's.', but the compiler should inline those functions and just use a direct stack reference, and future patches while hoist most of this code up into handle_stripe() so we will revert to "s.". Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>