linux-toradex.git/drivers/scsi/scsi_error.c, branch v3.2.26

fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

2012-08-02T13:37:56+00:00

commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream.

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain.  When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.  Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Reported-by: Tom Jackson 
Tested-by: Tom Jackson 
Signed-off-by: Dan Williams 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

[SCSI] Fix out of spec CD-ROM problem with media change

2011-08-27T14:36:41+00:00

Some CD-ROMs fail to report a media change correctly.  The specific
one for this patch simply fails to respond to commands, then gives a
UNIT ATTENTION after being reset which returns ASC/ASCQ 28/00.  This
is out of spec behaviour, but add a check in the eat CC/UA on reset
path to catch this case so the CD-ROM will function somewhat properly.

[jejb: fixed up white space and accepted without signoff]
Signed-off-by: James Bottomley

[SCSI] Reduce error recovery time by reducing use of TURs

2011-05-24T16:51:53+00:00

In error recovery, most scsi error recovery stages will send a TUR command
for every bad command when a driver's error handler reports success.  When
several bad commands to the same device, this results in a device
being probed multiple times.

This becomes very problematic if the device or connection is in a state
where the device still doesn't respond to commands even after a recovery
function returns success.  The error handler must wait for the test
commands to time out.  The time waiting for the redundant commands can
drastically lengthen error recovery.

This patch alters the scsi mid-layer's error routines to send test commands
once per device instead of once per bad command.  This can drastically
lower error recovery time.

[jejb: fixed up whitespace and formatting]
Signed-of-by: David Jeffery 
Signed-off-by: James Bottomley

[SCSI] Log thin provisioning threshold event

2011-04-15T21:29:25+00:00

At least log the message that we received a THIN PROVISIONING SOFT
THRESHOLD REACHED Unit Attention.  Also added it to unit attention
decodes.

Signed-off-by: Shyam Iyer 
Signed-off-by: James Bottomley

Reduce sequential pointer derefs in scsi_error.c and reduce size as well

2011-03-21T22:54:35+00:00

This patch reduces the number of sequential pointer derefs in
drivers/scsi/scsi_error.c

This has been submitted a number of times over a couple of years.  I
believe this version adresses all comments it has gathered over time.
Please apply or reject with a reason.

The benefits are:

 - makes the code easier to read.  Lots of sequential derefs of the same
   pointers is not easy on the eye.

 - theoretically at least, just dereferencing the pointers once can
   allow the compiler to generally slightly faster code, so in theory
   this could also be a micro speed optimization.

 - reduces size of object file (tiny effect: on x86-64, in at least one
   configuration, the text size decreased from 9439 bytes to 9400)

 - removes some pointless (mostly trailing) whitespace.

Signed-off-by: Jesper Juhl 
Signed-off-by: Linus Torvalds

[SCSI] Add detailed SCSI I/O errors

2011-02-12T16:33:08+00:00

Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.

Update the possible I/O errors to:

- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus

'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.

'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.

I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Mike Snitzer 
Signed-off-by: James Bottomley

[SCSI] fix id computation in scsi_eh_target_reset()

2010-12-21T18:23:56+00:00

The current code in scsi_eh_target_reset() has an off by one error
that actually sends spurious extra resets.  Since there's no real need
to reset the targets in numerical order, simply chunk up the command
recovery list doing target resets and pulling matching targets out of
the list (that also makes the loop O(N) instead of O(N^2).

[mike christie found and fixed a list_splice -> list_splice_init problem]

Reported-by: Hillf Danton
Signed-off-by: James Bottomley

[SCSI] Eliminate error handler overload of the SCSI serial number

2010-12-09T15:41:16+00:00

The error handler is using the test cmd->serial_number == 0 in the
abort routines to signal that the command to be aborted has already
completed normally.  This design was to close a race window in the
original error handler where a command could go through the normal
completion routines after it timed out but before error handling was
started.

Mike Anderson pointed out that when we converted our timeout and
softirq completions, we picked up atomicity here because the block
layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
that *either* the command times out or our done routine is called, but
ensures we can't get both occurring.  That makes the serial number
zero check redundant and it can be removed.

Signed-off-by: James Bottomley

SCSI host lock push-down

2010-11-16T21:33:23+00:00

Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.

The patch below presents a simple SCSI host lock push-down as an
equivalent transformation.  No locking or other behavior should change
with this patch.  All existing bugs and locking orders are preserved.

Additionally, add one parameter to queuecommand,
	struct Scsi_Host *
and remove one parameter from queuecommand,
	void (*done)(struct scsi_cmnd *)

Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.

Minimal code disturbance was attempted with this change.  Most drivers
needed only two one-line modifications for their host lock push-down.

Signed-off-by: Jeff Garzik 
Acked-by: James Bottomley 
Signed-off-by: Linus Torvalds

block: remove REQ_HARDBARRIER

2010-11-10T13:54:09+00:00

REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
at this point is:

 - various checks inside the block layer.
 - sanity checks in bio based drivers.
 - now unused bio_empty_barrier helper.
 - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
   but Xen really needs to sort out it's barrier situaton.
 - setting of ordered tags in uas - dead code copied from old scsi
   drivers.
 - scsi different retry for barriers - it's dead and should have been
   removed when flushes were converted to FS requests.
 - blktrace handling of barriers - removed.  Someone who knows blktrace
   better should add support for REQ_FLUSH and REQ_FUA, though.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe