Age | Commit message (Collapse) | Author |
|
commit 64ce60cab24603ac0fcd59c9fbc3be78f4c4d229 upstream.
The SA controller spins down RAID drive spares.
A REGNEWD event causes an inquiry to be sent to all physical
drives. This causes the SA controller to spin up the spare.
The controller suspends all I/O to a logical volume until
the spare is spun up. The spin-up can take over 50 seconds.
This can result in one or both of the following:
- SML sends down aborts and resets to the logical volume
and can cause the logical volume to be off-lined.
- a negative impact on the logical volume's I/O performance
each time a REGNEWD is triggered.
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6b7e9cde49691e04314342b7dce90c67ad567fcc upstream.
For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.
Fixes: d0eb20a863ba ("sd: Optimal I/O size is in bytes, not sectors")
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bcd8f2e94808fcddf6ef3af5f060a36820dcc432 upstream.
This patch fixes one use-after-free report[1] by KASAN.
In __scsi_scan_target(), when a type 31 device is probed,
SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned
again.
Inside the following scsi_report_lun_scan(), one new scsi_device
instance is allocated, and scsi_probe_and_add_lun() is called again to
probe the target and still see type 31 device, finally
__scsi_remove_device() is called to remove & free the device at the end
of scsi_probe_and_add_lun(), so cause use-after-free in
scsi_report_lun_scan().
And the following SCSI log can be observed:
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0)
scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
scsi 0:0:2:0: scsi scan: REPORT LUN scan
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104
This patch fixes the issue by moving the putting reference at
the end of scsi_report_lun_scan().
[1] KASAN report
==================================================================
[ 3.274597] PM: Adding info for serio:serio1
[ 3.275127] BUG: KASAN: use-after-free in __scsi_scan_target+0xd87/0xdf0 at addr ffff880254d8c304
[ 3.275653] Read of size 4 by task kworker/u10:0/27
[ 3.275903] CPU: 3 PID: 27 Comm: kworker/u10:0 Not tainted 4.8.0 #2121
[ 3.276258] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3.276797] Workqueue: events_unbound async_run_entry_fn
[ 3.277083] ffff880254d8c380 ffff880259a37870 ffffffff94bbc6c1 ffff880078402d80
[ 3.277532] ffff880254d8bb80 ffff880259a37898 ffffffff9459fec1 ffff880259a37930
[ 3.277989] ffff880254d8bb80 ffff880078402d80 ffff880259a37920 ffffffff945a0165
[ 3.278436] Call Trace:
[ 3.278528] [<ffffffff94bbc6c1>] dump_stack+0x65/0x84
[ 3.278797] [<ffffffff9459fec1>] kasan_object_err+0x21/0x70
[ 3.279063] device: 'psaux': device_add
[ 3.279616] [<ffffffff945a0165>] kasan_report_error+0x205/0x500
[ 3.279651] PM: Adding info for No Bus:psaux
[ 3.280202] [<ffffffff944ecd22>] ? kfree_const+0x22/0x30
[ 3.280486] [<ffffffff94bc2dc9>] ? kobject_release+0x119/0x370
[ 3.280805] [<ffffffff945a0543>] __asan_report_load4_noabort+0x43/0x50
[ 3.281170] [<ffffffff9507e1f7>] ? __scsi_scan_target+0xd87/0xdf0
[ 3.281506] [<ffffffff9507e1f7>] __scsi_scan_target+0xd87/0xdf0
[ 3.281848] [<ffffffff9507d470>] ? scsi_add_device+0x30/0x30
[ 3.282156] [<ffffffff94f7f660>] ? pm_runtime_autosuspend_expiration+0x60/0x60
[ 3.282570] [<ffffffff956ddb07>] ? _raw_spin_lock+0x17/0x40
[ 3.282880] [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[ 3.283200] [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[ 3.283563] [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[ 3.283882] [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[ 3.284173] [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[ 3.284492] [<ffffffff941a8954>] ? pwq_dec_nr_in_flight+0x124/0x2a0
[ 3.284876] [<ffffffff941d1770>] ? preempt_count_add+0x130/0x160
[ 3.285207] [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[ 3.285526] [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[ 3.285844] [<ffffffff941aa810>] ? process_one_work+0x12d0/0x12d0
[ 3.286182] [<ffffffff941bb365>] kthread+0x1c5/0x260
[ 3.286443] [<ffffffff940855cd>] ? __switch_to+0x88d/0x1430
[ 3.286745] [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[ 3.287085] [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[ 3.287368] [<ffffffff941bb1a0>] ? kthread_worker_fn+0x5a0/0x5a0
[ 3.287697] Object at ffff880254d8bb80, in cache kmalloc-2048 size: 2048
[ 3.288064] Allocated:
[ 3.288147] PID = 27
[ 3.288218] [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[ 3.288531] [<ffffffff9459f246>] save_stack+0x46/0xd0
[ 3.288806] [<ffffffff9459f4bd>] kasan_kmalloc+0xad/0xe0
[ 3.289098] [<ffffffff9459c07e>] __kmalloc+0x13e/0x250
[ 3.289378] [<ffffffff95078e5a>] scsi_alloc_sdev+0xea/0xcf0
[ 3.289701] [<ffffffff9507de76>] __scsi_scan_target+0xa06/0xdf0
[ 3.290034] [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[ 3.290362] [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[ 3.290724] [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[ 3.291055] [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[ 3.291354] [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[ 3.291695] [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[ 3.292022] [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[ 3.292325] [<ffffffff941bb365>] kthread+0x1c5/0x260
[ 3.292594] [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[ 3.292886] Freed:
[ 3.292945] PID = 27
[ 3.293016] [<ffffffff940b27ab>] save_stack_trace+0x2b/0x50
[ 3.293327] [<ffffffff9459f246>] save_stack+0x46/0xd0
[ 3.293600] [<ffffffff9459fa61>] kasan_slab_free+0x71/0xb0
[ 3.293916] [<ffffffff9459bac2>] kfree+0xa2/0x1f0
[ 3.294168] [<ffffffff9508158a>] scsi_device_dev_release_usercontext+0x50a/0x730
[ 3.294598] [<ffffffff941ace9a>] execute_in_process_context+0xda/0x130
[ 3.294974] [<ffffffff9508107c>] scsi_device_dev_release+0x1c/0x20
[ 3.295322] [<ffffffff94f566f6>] device_release+0x76/0x1e0
[ 3.295626] [<ffffffff94bc2db7>] kobject_release+0x107/0x370
[ 3.295942] [<ffffffff94bc29ce>] kobject_put+0x4e/0xa0
[ 3.296222] [<ffffffff94f56e17>] put_device+0x17/0x20
[ 3.296497] [<ffffffff9505201c>] scsi_device_put+0x7c/0xa0
[ 3.296801] [<ffffffff9507e1bc>] __scsi_scan_target+0xd4c/0xdf0
[ 3.297132] [<ffffffff9507e505>] scsi_scan_channel+0x105/0x160
[ 3.297458] [<ffffffff9507e8a2>] scsi_scan_host_selected+0x212/0x2f0
[ 3.297829] [<ffffffff9507eb3c>] do_scsi_scan_host+0x1bc/0x250
[ 3.298156] [<ffffffff9507efc1>] do_scan_async+0x41/0x450
[ 3.298453] [<ffffffff941c1fee>] async_run_entry_fn+0xfe/0x610
[ 3.298777] [<ffffffff941a9a84>] process_one_work+0x544/0x12d0
[ 3.299105] [<ffffffff941aa8e9>] worker_thread+0xd9/0x12f0
[ 3.299408] [<ffffffff941bb365>] kthread+0x1c5/0x260
[ 3.299676] [<ffffffff956dde9f>] ret_from_fork+0x1f/0x40
[ 3.299967] Memory state around the buggy address:
[ 3.300209] ffff880254d8c200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 3.300608] ffff880254d8c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 3.300986] >ffff880254d8c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 3.301408] ^
[ 3.301550] ffff880254d8c380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3.301987] ffff880254d8c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3.302396]
==================================================================
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07d0e9a847401ffd2f09bd450d41644cd090e81d upstream.
If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests. This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4bd173c30792791a6daca8c64793ec0a4ae8324f upstream.
Do the user_len check first and then the ver_addr allocation so that we
can save us the kfree() on the error path when user_len is >
ARCMSR_API_DATA_BUFLEN.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Marco Grassi <marco.gra@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7bc2b55a5c030685b399bb65b6baa9ccc3d1f167 upstream.
We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dd7328e4c53649c1c7ec36bc1cf5b229b8662047 upstream.
pci_dma_mapping_error() returns true on error and false on success.
Fixes: fd6ddfa4c1dd ('fnic: check pci_map_single() return value')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 546e559c79b1a8d27c23262907a00fc209e392a0 upstream.
The pd_seq_sync pointer can't be NULL, we have to check its entries
instead.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a87eeb900dbb9f8202f96604d56e47e67c936b9d upstream.
Commit 655ee63cf371 ("scsi constants: command, sense key + additional
sense string") added a "Completed" sense string with key 0xF to
snstext[], but failed to updated the upper bounds check of the sense key
in scsi_sense_key_string().
Fixes: 655ee63cf371 ("[SCSI] scsi constants: command, sense key + additional sense strings")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ea76543127da32dec28af0a13ea1b06625fc085e ]
While profiling the cxlflash_queuecommand() path under a heavy load it
was found that number of retries to find cmd_room was fairly high.
There are two problems with the current back-off:
a) It starts with a udelay of 0
b) It backs-off linearly
Tried several approaches (a higher multiple 10*n, 100*n, as well as n^2,
2^n) and found that the exponential back-off(2^n) approach had the least
overall cost. Cost as being defined as overall time spent waiting.
The fix is to change the linear back-off to an exponential back-off.
This solution also takes care of the problem with the initial
delay (starts with 1 usec).
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d5e26bb1d812ba74f29b6bcbc88c3dbfb3eed824 ]
Applications which use virtual LUN's that are backed by a physical LUN
over both adapter ports may experience an I/O failure in the event of a
link loss (e.g. cable pull).
Virtual LUNs may be accessed through one or both ports of the adapter.
This access is encoded in the translation entries that comprise the
virtual LUN and used by the AFU for load-balancing I/O and handling
failover scenarios. In a link loss scenario, even though the AFU is able
to maintain connectivity to the LUN, it is up to the application to
retry the failed I/O. When applications are unaware of the virtual LUN's
underlying topology, they are unable to make a sound decision of when to
retry an I/O and therefore are forced to make their reaction to a failed
I/O absolute. The result is either a failure to retry I/O or increased
latency for scenarios where a retry is pointless.
To remedy this scenario, provide feedback back to the application on
virtual LUN creation as to which ports the LUN may be accessed. LUN's
spanning both ports are candidates for a retry in a presence of an I/O
failure.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a9be294ecb3b9dc82b15625631b153f871181d16 ]
The original fix to escalate a 'login timed out' error to a LINK_RESET
was only made for one of the two ports on the card. This fix resolves
the same issue for the second port (port 1).
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ae09c765109293b600ba9169aa3d632e1ac1a843 ]
Driver didn't program the REG_VFI mailbox correctly, giving the adapter
bad addresses.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 635f6b0893cff193a1774881ebb1e4a4b9a7fead ]
When a cxlflash adapter goes into EEH recovery and multiple processes
(each having established its own context) are active, the EEH recovery
can hang if the processes attempt to recover in parallel. The symptom
logged after a couple of minutes is:
INFO: task eehd:48 blocked for more than 120 seconds.
Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd 0 48 2
Call Trace:
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_write_failed+0x294/0x410
down_write+0x88/0xb0
cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
cxl_vphb_error_detected+0x88/0x110 [cxl]
cxl_pci_error_detected+0xb0/0x1d0 [cxl]
eeh_report_error+0xbc/0x130
eeh_pe_dev_traverse+0x94/0x160
eeh_handle_normal_event+0x17c/0x450
eeh_handle_event+0x184/0x370
eeh_event_handler+0x1c8/0x1d0
kthread+0x110/0x130
ret_from_kernel_thread+0x5c/0xa4
INFO: task blockio:33215 blocked for more than 120 seconds.
Not tainted 4.5.0-491-26f710d+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blockio 0 33215 33213
Call Trace:
0x1 (unreliable)
__switch_to+0x2f0/0x410
__schedule+0x300/0x980
schedule+0x48/0xc0
rwsem_down_read_failed+0x124/0x1d0
down_read+0x68/0x80
cxlflash_ioctl+0x70/0x6f0 [cxlflash]
scsi_ioctl+0x3b0/0x4c0
sg_ioctl+0x960/0x1010
do_vfs_ioctl+0xd8/0x8c0
SyS_ioctl+0xd4/0xf0
system_call+0x38/0xb4
INFO: task eehd:48 blocked for more than 120 seconds.
The hang is because of a 3 way dead-lock:
Process A holds the recovery mutex, and waits for eehd to complete.
Process B holds the semaphore and waits for the recovery mutex.
eehd waits for semaphore.
The fix is to have Process B above release the semaphore before
attempting to acquire the recovery mutex. This will allow
eehd to proceed to completion.
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 603ecce95f4817074a724a889cd88c3c8210f933 ]
When switching to the internal LUN defined on the
IBM CXL flash adapter, there is an unnecessary
scan occurring on the second port. This scan leads
to the following extra lines in the log:
Dec 17 10:09:00 tul83p1 kernel: [ 3708.561134] cxlflash 0008:00:00.0: cxlflash_queuecommand: (scp=c0000000fc1f0f00) 11/1/0/0 cdb=(A0000000-00000000-10000000-00000000)
Dec 17 10:09:00 tul83p1 kernel: [ 3708.561147] process_cmd_err: cmd failed afu_rc=32 scsi_rc=0 fc_rc=0 afu_extra=0xE, scsi_extra=0x0, fc_extra=0x0
By definition, both of the internal LUNs are on the first port/channel.
When the lun_mode is switched to internal LUN the
same value for host->max_channel is retained. This
causes an unnecessary scan over the second port/channel.
This fix alters the host->max_channel to 0 (1 port), if internal
LUNs are configured and switches it back to 1 (2 ports) while
going back to external LUNs.
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 98f90debc2b64a40a416dd9794ac2d8de6b43af2 ]
Releasing allocated resource if get configuration data failed.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 251e2d25bfb72b69edd414abfa42a41191d9657a ]
Fixed getting wrong configuration data of adapter type B and type D.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d7236ac368212bd6fc8b45f050136ee53e6a6f2d ]
The function value inside se_cmd can change if the TMR is cancelled.
Use original ATIO Type to correctly determine CTIO response.
Signed-off-by: Swapnil Nagle <swapnil.nagle@purestroage.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
timedout IO.
[ Upstream commit 03d1fb3a65783979f23bd58b5a0387e6992d9e26 ]
Track msix of each IO and use the same msix for issuing abort to timed
out IO. With this driver will process IO's reply first followed by TM.
Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com>
Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5f985d88bac34e7f3b4403118eab072902a0b392 ]
It might happen that we try to free an already freed pointer.
Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Chaitra P B <chaitra.basappa@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b99dbe56d511eb07de33bfa1b99ac5a6ff76ae08 ]
A barrier should be added to ensure proper ordering of memory mapped
writes.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ea1c928bb6051ec4ccf24826898aa2361eaa71e5 ]
Inside compat IOCTL hook of driver, driver was using wrong address of
ioc->frame.raw which leads sense_ioc_ptr to be calculated wrongly and
failing IOCTL.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 11c71cb4ab7cd901b9d6f0ff267c102778c1c8ef ]
This patch will do synhronization between OCR function and AEN function
using "reset_mutex" lock. reset_mutex will be acquired only in the
first half of the AEN function which issues a DCMD. Second half of the
function which calls SCSI API (scsi_add_device/scsi_remove_device)
should be out of reset_mutex to avoid deadlock between scsi_eh thread
and driver.
During chip reset (inside OCR function), there should not be any PCI
access and AEN function (which is called in delayed context) may be
firing DCMDs (doing PCI writes) when chip reset is happening in parallel
which will cause FW fault. This patch will solve the problem by making
AEN thread and OCR thread mutually exclusive.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4360ca9c24388e44cb0e14861a62fff43cf225c0 ]
Fix external loopback failure.
Rx sequence reassembly was incorrect.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 01c73bbcd7cc4f31f45a1b0caeacdba46acd9c9c ]
Fix mbox reuse in PLOGI completion. Moved allocations so that buffer
properly init'd.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 81e7517723fc17396ba91f59312b3177266ddbda ]
Fix RDP Speed reporting.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c90261dcd86e4eb5c9c1627fde037e902db8aefa ]
Fix crash in fcp command completion path.
Missed null check.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6690e0d4fc5cccf74534abe0c9f9a69032bc02f0 ]
Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4b7789b71c916f79a3366da080101014473234c3 ]
Fix RegLogin failed error seen on Lancer FC during port bounce
Fix the statemachine and ref counting.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d6de08cc46269899988b4f40acc7337279693d4b ]
Fix the FLOGI discovery logic to comply with T11 standards
We weren't properly setting fabric parameters, such as R_A_TOV and E_D_TOV,
when we registered the vfi object in default configs and pt2pt configs.
Revise to now pass service params with the values to the firmware and
ensure they are reset on link bounce. Required reworking the call sequence
in the discovery threads.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f5cb5304eb26d307c9b30269fb0e007e0b262b7d ]
Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a2746fb16e41b7c8f02aa4d2605ecce97abbebbd ]
This drop enables a future card with a device id of 0x0600 to be
recognized by the cxlflash driver.
As per the design, the Accelerator Function Unit (AFU) for this new IBM
CXL Flash Adapter retains the same host interface as the previous
generation. For the early prototypes of the new card, the driver with
this change behaves exactly as the driver prior to this behaved with the
earlier generation card. Therefore, no card specific programming has
been added. These card specific changes can be staged in later if
needed.
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b45cdbaf9f7f0486847c52f60747fb108724652a ]
If an async error interrupt is generated, and the error requires the FC
link to be reset, it cannot be performed in the interrupt context. So a
work element is scheduled to complete the link reset in a process
context. If either an EEH event or an escalation occurs in between when
the interrupt is generated and the scheduled work is started, the MMIO
space may no longer be available. This will cause an oops in the worker
thread.
[ 606.806583] NIP kthread_data+0x28/0x40
[ 606.806633] LR wq_worker_sleeping+0x30/0x100
[ 606.806694] Call Trace:
[ 606.806721] 0x50 (unreliable)
[ 606.806796] wq_worker_sleeping+0x30/0x100
[ 606.806884] __schedule+0x69c/0x8a0
[ 606.806959] schedule+0x44/0xc0
[ 606.807034] do_exit+0x770/0xb90
[ 606.807109] die+0x300/0x460
[ 606.807185] bad_page_fault+0xd8/0x150
[ 606.807259] handle_page_fault+0x2c/0x30
[ 606.807338] wait_port_offline.constprop.12+0x60/0x130 [cxlflash]
To prevent the problem space area from being unmapped, when there is
pending work, a mapcount (using the kref mechanism) is held. The
mapcount is released only when the work is completed. The last
reference release is tied to the unmapping service.
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ee91e332a6e6e9b939f60f6e1bd72fb2def5290d ]
After a few iterations of resetting the card, either during EEH
recovery, or a host_reset the following is seen in the logs. cxlflash
0008:00: cxlflash_queuecommand: could not get a free command
At every reset of the card, the commands that are outstanding are being
leaked. No effort is being made to reap these commands. A few more
resets later, the above error message floods the logs and the card is
rendered totally unusable as no free commands are available.
Iterated through the 'cmd' queue and printed out the 'free' counter and
found that on each reset certain commands were in-use and stayed in-use
through subsequent resets.
To resolve this issue, when the card is reset, reap all the commands
that are active/outstanding.
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e37390bee6fe7dfbe507a9d50cdc11344b53fa08 ]
The "> MAX_CONTEXT" should be ">= MAX_CONTEXT". Otherwise we go one
step beyond the end of the cfg->ctx_tbl[] array.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7f851684efb3377e9c93aca7fae6e76212e5680 upstream.
Found one megaraid_sas HBA probe fails,
[ 187.235190] scsi host2: Avago SAS based MegaRAID driver
[ 191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io 0x0000-0x00ff]
[ 191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!
and the card has resource like,
[ 125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[ 125.104446] pci 0000:89:00.0: reg 0x10: [io 0x0000-0x00ff]
[ 125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[ 125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[ 125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]
that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.
The driver is only looking for MEM, and should not fail.
It turns out megasas_init_fw() etc are using bar index as mask. index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.
Fix all related reference.
Fixes: b6d5d8808b4c ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ce7c6c9e1d997a2670aead3a7b87f4df32c11118 upstream.
mpt3sas crashes on resume after suspend with WarpDrive flash cards. The
reply_post_host_index array is not set back up after the resume, and we
deference a stale pointer in _base_interrupt().
[ 47.309711] BUG: unable to handle kernel paging request at ffffc90001f8006c
[ 47.318289] IP: [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
[ 47.326749] PGD 41ccaa067 PUD 41ccab067 PMD 3466c067 PTE 0
[ 47.333848] Oops: 0002 [#1] SMP
...
[ 47.452708] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #6
[ 47.460506] Hardware name: Dell Inc. OptiPlex 990/06D7TR, BIOS A18 09/24/2013
[ 47.469629] task: ffffffff81c0d500 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 47.479112] RIP: 0010:[<ffffffffc00863ef>] [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
[ 47.490466] RSP: 0018:ffff88041d203e30 EFLAGS: 00010002
[ 47.497801] RAX: 0000000000000001 RBX: ffff880033f4c000 RCX: 0000000000000001
[ 47.506973] RDX: ffffc90001f8006c RSI: 0000000000000082 RDI: 0000000000000082
[ 47.516141] RBP: ffff88041d203eb0 R08: ffff8804118e2820 R09: 0000000000000001
[ 47.525300] R10: 0000000000000001 R11: 00000000100c0000 R12: 0000000000000000
[ 47.534457] R13: ffff880412c487e0 R14: ffff88041a8987d8 R15: 0000000000000001
[ 47.543632] FS: 0000000000000000(0000) GS:ffff88041d200000(0000) knlGS:0000000000000000
[ 47.553796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.561632] CR2: ffffc90001f8006c CR3: 0000000001c06000 CR4: 00000000000406f0
[ 47.570883] Stack:
[ 47.575015] 000000001d211228 ffff88041d2100c0 ffff8800c47d8130 0000000000000100
[ 47.584625] ffff8804100c0000 100c000000000000 ffff88041a8992a0 ffff88041a8987f8
[ 47.594230] ffff88041d203e00 ffffffff81111e55 000000000000038c ffff880414ad4280
[ 47.603862] Call Trace:
[ 47.608474] <IRQ>
[ 47.610413] [<ffffffff81111e55>] ? call_timer_fn+0x35/0x120
[ 47.620539] [<ffffffff81100a1f>] handle_irq_event_percpu+0x7f/0x1c0
[ 47.629061] [<ffffffff81100b8c>] handle_irq_event+0x2c/0x50
[ 47.636859] [<ffffffff81103fff>] handle_edge_irq+0x6f/0x130
[ 47.644654] [<ffffffff8102fbf3>] handle_irq+0x73/0x120
[ 47.652011] [<ffffffff810c6ada>] ? atomic_notifier_call_chain+0x1a/0x20
[ 47.660854] [<ffffffff817e374b>] do_IRQ+0x4b/0xd0
[ 47.667777] [<ffffffff817e160c>] common_interrupt+0x8c/0x8c
[ 47.675635] <EOI>
Move the reply_post_host_index array setup into
mpt3sas_base_map_resources(), which is also in the resume path.
Signed-off-by: Greg Edwards <gedwards@fireweed.org>
Acked-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fa00c437eef8dc2e7b25f8cd868cfa405fcc2bb3 upstream.
In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05a05872c8d4b4357c9d913e6d73ae64882bddf5 upstream.
The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd
'lpfc_cmd->pCmd' not to be null, and point to the midlayer command.
That's not true in the .eh_(device|target|bus)_reset_handler path,
because lpfc_send_taskmgmt() sends commands not from the midlayer, so
does not set 'lpfc_cmd->pCmd'.
That is true in the .queuecommand path because lpfc_queuecommand()
stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is
stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed
to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter.
This problem can be hit on SCSI EH, and immediately with sg_reset.
These 2 test-cases demonstrate the problem/fix with next-20160601.
Test-case 1) sg_reset
# strace sg_reset --device /dev/sdm
<...>
open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3
ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 <unfinished ...>
+++ killed by SIGSEGV +++
Segmentation fault
# dmesg
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xd00000001c88442c
Oops: Kernel access of bad area, sig: 11 [#1]
<...>
CPU: 104 PID: 16333 Comm: sg_reset Tainted: G W 4.7.0-rc1-next-20160601-00004-g95b89dc #6
<...>
NIP [d00000001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
LR [d00000001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
Call Trace:
[c000003c9ec876f0] [c000003c9ec87770] 0xc000003c9ec87770 (unreliable)
[c000003c9ec87720] [d00000001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
[c000003c9ec87780] [d00000001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
[c000003c9ec87880] [d00000001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
[c000003c9ec87950] [d00000001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
[c000003c9ec87a10] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
[c000003c9ec87a40] [c0000000006113e8] scsi_ioctl_reset+0x198/0x2c0
[c000003c9ec87bf0] [c00000000060fe5c] scsi_ioctl+0x13c/0x4b0
[c000003c9ec87c80] [c0000000006629b0] sd_ioctl+0xf0/0x120
[c000003c9ec87cd0] [c00000000046e4f8] blkdev_ioctl+0x248/0xb70
[c000003c9ec87d30] [c0000000002a1f60] block_ioctl+0x70/0x90
[c000003c9ec87d50] [c00000000026d334] do_vfs_ioctl+0xc4/0x890
[c000003c9ec87de0] [c00000000026db60] SyS_ioctl+0x60/0xc0
[c000003c9ec87e30] [c000000000009120] system_call+0x38/0x108
Instruction dump:
<...>
With fix:
# strace sg_reset --device /dev/sdm
<...>
open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3
ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
# dmesg
[ 424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002
Test-case 2) SCSI EH
Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl():
- cmd->scsi_done(cmd);
+ if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x32100000)
+ printk(KERN_ALERT "lpfc: skip scsi_done()\n");
+ else
+ cmd->scsi_done(cmd);
# echo 0x32100000 > /sys/class/scsi_host/host11/lpfc_log_verbose
# dd if=/dev/sdm of=/dev/null iflag=direct &
<...>
After a while:
# dmesg
lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
lpfc: skip scsi_done()
<...>
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xd0000000199e448c
Oops: Kernel access of bad area, sig: 11 [#1]
<...>
CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: G W 4.7.0-rc1-next-20160601-00004-g95b89dc #6
<...>
NIP [d0000000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
LR [d000000019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
Call Trace:
[c000000ff0d0b890] [c000000ff0d0b900] 0xc000000ff0d0b900 (unreliable)
[c000000ff0d0b8c0] [d00000001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
[c000000ff0d0b920] [d000000019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
[c000000ff0d0ba20] [d0000000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
[c000000ff0d0baf0] [d0000000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
[c000000ff0d0bbb0] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
[c000000ff0d0bbe0] [c0000000006126cc] scsi_eh_ready_devs+0x49c/0x9c0
[c000000ff0d0bcb0] [c000000000614160] scsi_error_handler+0x580/0x680
[c000000ff0d0bd80] [c0000000000ae848] kthread+0x108/0x130
[c000000ff0d0be30] [c0000000000094a8] ret_from_kernel_thread+0x5c/0xb4
Instruction dump:
<...>
With fix:
# dmesg
lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
lpfc: skip scsi_done()
<...>
lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return x2002
<...>
lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return x2002
<...>
lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
<...>
lpfc 0006:01:00.4: 4:(0):3172 SCSI layer issued Host Reset Data:
<...>
Fixes: 8b0dff14164d ("lpfc: Add support for using block multi-queue")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 221255aee67ec1c752001080aafec0c4e9390d95 upstream.
device handler initialisation might fail due to a number of
reasons. But as device_handlers are optional this shouldn't
cause us to disable the device entirely.
So just ignore errors from scsi_dh_add_device().
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 54e430bbd490e18ab116afa4cd90dcc45787b3df upstream.
If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5e7ff2ca7f2da55fe777167849d0c93403bd0dc8 upstream.
Commit b704f70ce200 ("SCSI: fix bug in scsi_dev_info_list matching")
changed the way vendor- and model-string matching was carried out in the
routine that looks up entries in a SCSI devinfo list. The new matching
code failed to take into account the case of a maximum-length string; in
such cases it could end up testing for a terminating '\0' byte beyond
the end of the memory allocated to the string. This out-of-bounds bug
was detected by UBSAN.
I don't know if anybody has actually encountered this bug. The symptom
would be that a device entry in the blacklist might not be matched
properly if it contained an 8-character vendor name or a 16-character
model name. Such entries certainly exist in scsi_static_device_list.
This patch fixes the problem by adding a check for a maximum-length
string before the '\0' test.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: b704f70ce200 ("SCSI: fix bug in scsi_dev_info_list matching")
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8beb330044d0d1878c7b92290e91c0b889e92633 upstream.
The untagged command case in the 53c700 driver has been broken since
host wide tags were enabled because the replaced scsi_find_tag()
function had a special case for the tag value SCSI_NO_TAG to retrieve
sdev->current_cmnd. The replacement function scsi_host_find_tag() has
no such special case and returns NULL causing untagged commands to
trigger a BUG() in the driver. Inspection shows that the 53c700 is the
only driver using this SCSI_NO_TAG case, so a local fix in the driver
suffices to fix this problem globally.
Fixes: 64d513ac31b - "scsi: use host wide tags by default"
Reported-by: Helge Deller <deller@gmx.de>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 72d8c36ec364c82bf1bf0c64dfa1041cfaf139f7 upstream.
sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.
It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.
Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.
Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fbd83006e3e536fcb103228d2422ea63129ccb03 upstream.
Linux fails to boot as a guest with a QEMU CD-ROM:
[ 4.439488] ata2.00: ATAPI: QEMU CD-ROM, 0.8.2, max UDMA/100
[ 4.443649] ata2.00: configured for MWDMA2
[ 4.450267] scsi 1:0:0:0: CD-ROM QEMU QEMU CD-ROM 0.8. PQ: 0 ANSI: 5
[ 4.464317] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.464319] ata2.00: BMDMA stat 0x5
[ 4.464339] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.464339] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.464341] ata2.00: status: { DRDY DRQ }
[ 4.465864] ata2: soft resetting link
[ 4.625971] ata2.00: configured for MWDMA2
[ 4.628290] ata2: EH complete
[ 4.646670] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.646671] ata2.00: BMDMA stat 0x5
[ 4.646683] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.646683] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.646685] ata2.00: status: { DRDY DRQ }
[ 4.648193] ata2: soft resetting link
...
Fix this by suppressing VPD inquiry for this device.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream.
When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data. This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request. Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time. This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done. Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.
Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 78cbccd3bd683c295a44af8050797dc4a41376ff upstream.
When KDUMP is triggered the driver first talks to the firmware in INTX
mode, but the adapter firmware is still in MSIX mode. Therefore the first
driver command hangs since the driver is waiting for an INTX response and
firmware gives a MSIX response. If when the OS is installed on a RAID
drive created by the adapter KDUMP will hang since the driver does not
receive a response in sync mode.
Fixed by: Change the firmware to INTX mode if it is in MSIX mode before
sending the first sync command.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc4bf75ea300a5e62a2419f89dd0e22189dd7ab7 upstream.
Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.
In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.
The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.
Fixed by: Adding another kthread_should_stop() before schedule_timeout()
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07beca2be24cc710461c0b131832524c9ee08910 upstream.
aac_fib_send has a special function case for initial commands during
driver initialization using wait < 0(pseudo sync mode). In this case,
the command does not sleep but rather spins checking for timeout.This
loop is calls cpu_relax() in an attempt to allow other processes/threads
to use the CPU, but this function does not relinquish the CPU and so the
command will hog the processor. This was observed in a KDUMP
"crashkernel" and that prevented the "command thread" (which is
responsible for completing the command from being timed out) from
starting because it could not get the CPU.
Fixed by replacing "cpu_relax()" call with "schedule()"
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 305c2e71b3d733ec065cb716c76af7d554bd5571 upstream.
Now that we've done a more comprehensive fix with the intermediate
target state we can remove the previous hack introduced with commit
90a88d6ef88e ("scsi: fix soft lockup in scsi_remove_target() on module
removal").
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|