linux-toradex.git/drivers/scsi, branch v4.4.85

scsi: qla2xxx: Get mutex lock before checking optrom_state

2017-08-11T16:08:57+00:00

[ Upstream commit c7702b8c22712a06080e10f1d2dee1a133ec8809 ]

There is a race condition with qla2xxx optrom functions where one thread
might modify optrom buffer, optrom_state while other thread is still
reading from it.

In couple of crashes, it was found that we had successfully passed the
following 'if' check where we confirm optrom_state to be
QLA_SREADING. But by the time we acquired mutex lock to proceed with
memory_read_from_buffer function, some other thread/process had already
modified that option rom buffer and optrom_state from QLA_SREADING to
QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:

        if (ha->optrom_state != QLA_SREADING)
                return 0;

        mutex_lock(&ha->optrom_mutex);
        rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
            ha->optrom_region_size);
        mutex_unlock(&ha->optrom_mutex);

With current optrom function we get following crash due to a race
condition:

[ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1479.466707] IP: [] memcpy+0x6/0x110
[...]
[ 1479.473673] Call Trace:
[ 1479.474296]  [] ? memory_read_from_buffer+0x3c/0x60
[ 1479.474941]  [] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
[ 1479.475571]  [] read+0xdb/0x1f0
[ 1479.476206]  [] vfs_read+0x9e/0x170
[ 1479.476839]  [] SyS_read+0x7f/0xe0
[ 1479.477466]  [] system_call_fastpath+0x16/0x1b

Below patch modifies qla2x00_sysfs_read_optrom,
qla2x00_sysfs_write_optrom functions to get the mutex_lock before
checking ha->optrom_state to avoid similar crashes.

The patch was applied and tested and same crashes were no longer
observed again.

Tested-by: Milan P. Gandhi 
Signed-off-by: Milan P. Gandhi 
Reviewed-by: Laurence Oberman 
Acked-by: Himanshu Madhani 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: snic: Return error code on memory allocation failure

2017-08-07T02:19:47+00:00

[ Upstream commit 0371adcdaca92912baaa3256ed13e058a016e62d ]

If a call to mempool_create_slab_pool() in snic_probe() returns NULL,
return -ENOMEM to indicate failure. mempool_creat_slab_pool() only fails
if it cannot allocate memory.

https://bugzilla.kernel.org/show_bug.cgi?id=189061

Reported-by: bianpan2010@ruc.edu.cn
Signed-off-by: Burak Ok 
Signed-off-by: Andreas Schaertl 
Acked-by: Narsimhulu Musini 
Reviewed-by: Ewan D. Milne 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: fnic: Avoid sending reset to firmware when another reset is in progress

2017-08-07T02:19:47+00:00

[ Upstream commit 9698b6f473555a722bf81a3371998427d5d27bde ]

This fix is to avoid calling fnic_fw_reset_handler through
fnic_host_reset when a finc reset is alreay in progress.

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

mpt3sas: Don't overreach ioc->reply_post[] during initialization

2017-08-07T02:19:41+00:00

commit 5ec8a1753bc29efa7e4b1391d691c9c719b30257 upstream.

In _base_make_ioc_operational(), we walk ioc->reply_queue_list and pull
a pointer out of successive elements of ioc->reply_post[] for each entry
in that list if RDPQ is enabled.

Since the code pulls the pointer for the next iteration at the bottom of
the loop, it triggers the a KASAN dump on the final iteration:

    BUG: KASAN: slab-out-of-bounds in _base_make_ioc_operational+0x47b7/0x47e0 [mpt3sas] at addr ffff880754816ab0
    Read of size 8 by task modprobe/305
    
    Call Trace:
     [] dump_stack+0x4d/0x6c
     [] print_trailer+0xf9/0x150
     [] object_err+0x34/0x40
     [] kasan_report_error+0x221/0x530
     [] __asan_report_load8_noabort+0x43/0x50
     [] _base_make_ioc_operational+0x47b7/0x47e0 [mpt3sas]
     [] mpt3sas_base_attach+0x1991/0x2120 [mpt3sas]
     [] _scsih_probe+0xeb3/0x16b0 [mpt3sas]
     [] local_pci_probe+0xc7/0x170
     [] pci_device_probe+0x20f/0x290
     [] really_probe+0x17d/0x600
     [] __driver_attach+0x153/0x190
     [] bus_for_each_dev+0x11c/0x1a0
     [] driver_attach+0x3d/0x50
     [] bus_add_driver+0x44a/0x5f0
     [] driver_register+0x18c/0x3b0
     [] __pci_register_driver+0x156/0x200
     [] _mpt3sas_init+0x135/0x1000 [mpt3sas]
     [] do_one_initcall+0x113/0x2b0
     [] do_init_module+0x1d0/0x4d8
     [] load_module+0x6729/0x8dc0
     [] SYSC_init_module+0x183/0x1a0
     [] SyS_init_module+0xe/0x10
     [] entry_SYSCALL_64_fastpath+0x12/0x6a

Fix this by pulling the value at the beginning of the loop.

Signed-off-by: Calvin Owens 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Jens Axboe 
Acked-by: Chaitra Basappa 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Amit Pundir 
Signed-off-by: Greg Kroah-Hartman

scsi: lpfc: avoid double free of resource identifiers

2017-07-05T12:37:20+00:00

[ Upstream commit cd60be4916ae689387d04b86b6fc15931e4c95ae ]

Set variables initialized in lpfc_sli4_alloc_resource_identifiers() to
NULL if an error occurred. Otherwise, lpfc_sli4_driver_resource_unset()
attempts to free the memory again.

Signed-off-by: Roberto Sassu 
Signed-off-by: Johannes Thumshirn 
Acked-by: James Smart 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: virtio_scsi: Reject commands when virtqueue is broken

2017-07-05T12:37:19+00:00

[ Upstream commit 773c7220e22d193e5667c352fcbf8d47eefc817f ]

In the case of a graceful set of detaches, where the virtio-scsi-ccw
disk is removed from the guest prior to the controller, the guest
behaves quite normally.  Specifically, the detach gets us into
sd_sync_cache to issue a Synchronize Cache(10) command, which
immediately fails (and is retried a couple of times) because the device
has been removed.  Later, the removal of the controller sees two CRWs
presented, but there's no further indication of the removal from the
guest viewpoint.

 [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
 [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0

However, on s390, the SCSI disks can be removed "by surprise" when an
entire controller (host) is removed and all associated disks are removed
via the loop in scsi_forget_host.  The same call to sd_sync_cache is
made, but because the controller has already been removed, the
Synchronize Cache(10) command is neither issued (and then failed) nor
rejected.

That the I/O isn't returned means the guest cannot have other devices
added nor removed, and other tasks (such as shutdown or reboot) issued
by the guest will not complete either.  The virtio ring has already been
marked as broken (via virtio_break_device in virtio_ccw_remove), but we
still attempt to queue the command only to have it remain there.  The
calling sequence provides a bit of distinction for us:

  virtscsi_queuecommand()
   -> virtscsi_kick_cmd()
    -> virtscsi_add_cmd()
     -> virtqueue_add_sgs()
      -> virtqueue_add()
         if success
           return 0
         elseif vq->broken or vring_mapping_error()
           return -EIO
         else
           return -ENOSPC

A return of ENOSPC is generally a temporary condition, so returning
"host busy" from virtscsi_queuecommand makes sense here, to have it
redriven in a moment or two.  But the EIO return code is more of a
permanent error and so it would be wise to return the I/O itself and
allow the calling thread to finish gracefully.  The result is these four
kernel messages in the guest (the fourth one does not occur prior to
this patch):

 [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
 [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
 [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

I opted to fill in the same response data that is returned from the more
graceful device detach, where the disk device is removed prior to the
controller device.

Signed-off-by: Eric Farman 
Reviewed-by: Fam Zheng 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

qla2xxx: Fix erroneous invalid handle message

2017-07-05T12:37:16+00:00

[ Upstream commit 4f060736f29a960aba8e781a88837464756200a8 ]

Termination of Immediate Notify IOCB was using wrong
IOCB handle. IOCB completion code was unable to find
appropriate code path due to wrong handle.

Following message is seen in the logs.

"Error entry - invalid handle/queue (ffff)."

Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Christoph Hellwig 
[ bvanassche: Fixed word order in patch title ]
Signed-off-by: Bart Van Assche 

Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: lpfc: Set elsiocb contexts to NULL after freeing it

2017-07-05T12:37:16+00:00

[ Upstream commit 8667f515952feefebb3c0f8d9a9266c91b101a46 ]

Set the elsiocb contexts to NULL after freeing as others depend on it.

Signed-off-by: Johannes Thumshirn 
Acked-by: Dick Kennedy 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type

2017-07-05T12:37:16+00:00

[ Upstream commit 26f2819772af891dee2843e1f8662c58e5129d5f ]

Zoned block devices force the use of READ/WRITE(16) commands by setting
sdkp->use_16_for_rw and clearing sdkp->use_10_for_rw. This result in
DPOFUA always being disabled for these drives as the assumed use of
the deprecated READ/WRITE(6) commands only looks at sdkp->use_10_for_rw.
Strenghten the test by also checking that sdkp->use_16_for_rw is false.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: qla2xxx: don't disable a not previously enabled PCI device

2017-06-14T11:16:25+00:00

commit ddff7ed45edce4a4c92949d3c61cd25d229c4a14 upstream.

When pci_enable_device() or pci_enable_device_mem() fail in
qla2x00_probe_one() we bail out but do a call to
pci_disable_device(). This causes the dev_WARN_ON() in
pci_disable_device() to trigger, as the device wasn't enabled
previously.

So instead of taking the 'probe_out' error path we can directly return
*iff* one of the pci_enable_device() calls fails.

Additionally rename the 'probe_out' goto label's name to the more
descriptive 'disable_device'.

Signed-off-by: Johannes Thumshirn 
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Reviewed-by: Bart Van Assche 
Reviewed-by: Giridhar Malavali 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman