Age | Commit message (Collapse) | Author |
|
commit aff132d95ffe14eca96cab90597cdd010b457af7 upstream.
The amount of memory required for tracking chain buffers is rather
large, and when the host credit count is big, memory allocation
failure occurs inside __get_free_pages.
The fix is to limit the number of chains to 100,000. In addition,
the number of host credits is limited to 30,000 IOs. However this
limitation can be overridden this using the command line option
max_queue_depth. The algorithm for calculating the
reply_post_queue_depth is changed so that it is equal to
(reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 30c43282f3d347f47f9e05199d2b14f56f3f2837 upstream.
Added code to release the spinlock that is used to protect the
raid device list before calling a function that can block. The
blocking was causing a reschedule, and subsequently it is tried
to acquire the same lock, resulting in a panic (NMI Watchdog
detecting a CPU lockup).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9ae89b0296e275d5a556068b40b7c2557a556a85 upstream.
mpt2sas_base_detach() call was removed from _scsih_remove() while
doing some code shuffling. Mainly when we work on adding code for
scsih_shutdown(). I have added back mpt2sas_base_detach() which will
get callled from _scsih_remove().
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Upstrem commit: 911ae9434f83e7355d343f6c2be3ef5b00ea7aed
There's a bug in the MSIX backup and restore routines that cause a crash on
non-x86 (direct access to PCI space not via read/write). These routines are
unnecessary and were removed by the above commit, so also remove them from
stable to fix the crash.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7e1e7ead88dff75b11b86ee0d5232c4591be1326 upstream.
The error exit path leaks preempt count. Add the missing put_cpu().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f6a290b419a2675c4b77a6b0731cd2a64332365e upstream.
_scsih_smart_predicted_fault is called in an interrupt and therefore
must allocate memory using GFP_ATOMIC.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 745718132c3c7cac98a622b610e239dcd5217f71 upstream.
When we tear down a device we try to flush all outstanding
commands in scsi_free_queue(). However the check in
scsi_request_fn() is imperfect as it only signals that
we _might start_ aborting commands, not that we've actually
aborted some.
So move the printk inside the scsi_kill_request function,
this will also give us a hint about which commands are aborted.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cf16123c9c8e346ed1dd171295a678d77648d7f8 upstream.
Aacraid controller can hang on some nodes if kernel uses non-default
(powersave) ASPM policy. Controller hangs shortly after successful load and
hardware detection. Scsi error handler detects this hang and tries to restart
hardware but it does not help.
Initially it was noticed on RHEL6-based openVZ kernel after backporting
aacraid driver from mainline (RHEL6 kernel with original driver works well)
http://bugzilla.openvz.org/show_bug.cgi?id=2043
This issue happens because default ASPM policy was changed in Red Hat
kernels. Therefore guys from Red Hat have noticed this problem long time ago:
on Fedora 12
https://bugzilla.redhat.com/show_bug.cgi?id=540478
on Fedora 14
https://bugzilla.redhat.com/show_bug.cgi?id=679385
In RHEL6 kernel this issue was fixed, ASPM was disabled in aacraid driver. In
kernel changelog I've found that seems it was done by Matthew Garrett: -
[scsi] aacraid: Disable ASPM by default (Matthew Garrett) [599735]
However seems this patch was not submitted to mainline. I've reproduced this
issue on vanilla 3.1.0 kernel booted with "pcie_aspm.policy=powersave" option,
So I believe it makes sense to do it now.
Signed-off-by: Vasily Averin <vvs@sw.ru>
[mjg: Checking the Windows drivers indicates that they disable ASPM under all
circumstances, so:]
Acked-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Achim Leubner <Achim_Leubner@pmc-sierra.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e5a44df85e8d78e5c2d3d2e4f59b460905691e2f upstream.
The Windows driver .inf disables ASPM on hpsa devices. Do the same because the
selection of a non default ASPM policy can cause the device to hang.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4e6c82b3614a18740ef63109d58743a359266daf upstream.
On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote:
> Hi all,
>
> Starting some time last week I am getting the following during boot on
> our PPC970 blade:
>
> calling .ipr_init+0x0/0x68 @ 1
> ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011)
> ipr 0000:01:01.0: Found IOA with IRQ: 26
> ipr 0000:01:01.0: Starting IOA initialization sequence.
> ipr 0000:01:01.0: Adapter firmware version: 06160039
> ipr 0000:01:01.0: IOA initialized.
> scsi0 : IBM 572E Storage Adapter
> ------------[ cut here ]------------
> WARNING: at drivers/scsi/scsi_lib.c:1704
> Modules linked in:
> NIP: c00000000053b3d4 LR: c00000000053e5b0 CTR: c000000000541d70
> REGS: c0000000783c2f60 TRAP: 0700 Not tainted (3.1.0-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 24002024 XER: 20000002
> TASK = c0000000783b8000[1] 'swapper' THREAD: c0000000783c0000 CPU: 0
> GPR00: 0000000000000001 c0000000783c31e0 c000000000cf38b0 c00000000239a9d0
> GPR04: c000000000cbe8f8 0000000000000000 c0000000783c3040 0000000000000000
> GPR08: c000000075daf488 c000000078a3b7ff c000000000bcacc8 0000000000000000
> GPR12: 0000000044002028 c000000007ffb000 0000000002e40000 000000000099b800
> GPR16: 0000000000000000 c000000000bba5fc c000000000a61db8 0000000000000000
> GPR20: 0000000001b77200 0000000000000000 c000000078990000 0000000000000001
> GPR24: c000000002396828 0000000000000000 0000000000000000 c000000078a3b938
> GPR28: fffffffffffffffa c0000000008ad2c0 c000000000c7faa8 c00000000239a9d0
> NIP [c00000000053b3d4] .scsi_free_queue+0x24/0x90
> LR [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
> Call Trace:
> [c0000000783c31e0] [c000000000c7faa8] wireless_seq_fops+0x278d0/0x2eb88 (unreliable)
> [c0000000783c3270] [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
> [c0000000783c3330] [c00000000053eba0] .scsi_probe_and_add_lun+0x390/0xb40
> [c0000000783c34a0] [c00000000053f7ec] .__scsi_scan_target+0x16c/0x650
> [c0000000783c35f0] [c00000000053fd90] .scsi_scan_channel+0xc0/0x100
> [c0000000783c36a0] [c00000000053fefc] .scsi_scan_host_selected+0x12c/0x1c0
> [c0000000783c3750] [c00000000083dcb4] .ipr_probe+0x2c0/0x390
> [c0000000783c3830] [c0000000003f50b4] .local_pci_probe+0x34/0x50
> [c0000000783c38a0] [c0000000003f5f78] .pci_device_probe+0x148/0x150
> [c0000000783c3950] [c0000000004e1e8c] .driver_probe_device+0xdc/0x210
> [c0000000783c39f0] [c0000000004e20cc] .__driver_attach+0x10c/0x110
> [c0000000783c3a80] [c0000000004e1228] .bus_for_each_dev+0x98/0xf0
> [c0000000783c3b30] [c0000000004e1bf8] .driver_attach+0x28/0x40
> [c0000000783c3bb0] [c0000000004e07d8] .bus_add_driver+0x218/0x340
> [c0000000783c3c60] [c0000000004e2a2c] .driver_register+0x9c/0x1b0
> [c0000000783c3d00] [c0000000003f62d4] .__pci_register_driver+0x64/0x140
> [c0000000783c3da0] [c000000000b99f88] .ipr_init+0x4c/0x68
> [c0000000783c3e20] [c00000000000ad24] .do_one_initcall+0x1a4/0x1e0
> [c0000000783c3ee0] [c000000000b512d0] .kernel_init+0x14c/0x1fc
> [c0000000783c3f90] [c000000000022468] .kernel_thread+0x54/0x70
> Instruction dump:
> ebe1fff8 7c0803a6 4e800020 7c0802a6 fba1ffe8 fbe1fff8 7c7f1b78 f8010010
> f821ff71 e8030398 3120ffff 7c090110 <0b000000> e86303b0 482de065 60000000
> ---[ end trace 759bed76a85e8dec ]---
> scsi 0:0:1:0: Direct-Access IBM-ESXS MAY2036RC T106 PQ: 0 ANSI: 5
> ------------[ cut here ]------------
>
> I get lots more of these. The obvious commit to point the finger at
> is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
> commands") but the root cause may be something different.
Caused by
commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
Author: Anton Blanchard <anton@samba.org>
Date: Thu Nov 3 08:56:22 2011 +1100
[SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
Doesn't completely do the teardown. The true fix is to do a proper
teardown instead of hand rolling it
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c4853efec665134b2e6fc9c13447323240980351 upstream.
The P600 requires a small delay when changing states. Otherwise we may think
the board did not reset and we bail. This for kdump only and is particular
to the P600.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0167ac67ff6f35bf2364f7672c8012b0cd40277f upstream.
Fix for issue : While discovery is in progress, hot unplug and hot plug of
enclosure connected to the controller card is causing system to hang.
When a device is in the process of being detected at driver load time then
if it is removed, the device that is no longer present will not be added
to the list. So the code in _scsih_probe_sas() is rearranged as such so
the devices that failed to be detected are not added to the list.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f7c9c6bb14f3104608a3a83cadea10a6943d2804 upstream.
When looking at memory consumption issues I noticed quite a
lot of memory in the kmalloc-2048 bucket:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
6561 6471 98% 2.30K 243 27 15552K kmalloc-2048
Over 15MB. slub debug shows that cfq is responsible for almost
all of it:
# sort -nr /sys/kernel/slab/kmalloc-2048/alloc_calls
6402 .cfq_init_queue+0xec/0x460 age=43423/43564/43655 pid=1 cpus=4,11,13
In scsi_alloc_sdev we do scsi_alloc_queue but if slave_alloc
fails we don't free it with scsi_free_queue.
The patch below fixes the issue:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
135 72 53% 2.30K 5 27 320K kmalloc-2048
# cat /sys/kernel/slab/kmalloc-2048/alloc_calls
3 .cfq_init_queue+0xec/0x460 age=3811/3876/3925 pid=1 cpus=4,11,13
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3308511c93e6ad0d3c58984ecd6e5e57f96b12c8 upstream.
Make sure that SCSI device removal via scsi_remove_host() does finish
all pending SCSI commands. Currently that's not the case and hence
removal of a SCSI host during I/O can cause a deadlock. See also
"blkdev_issue_discard() hangs forever if underlying storage device is
removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also
http://lkml.org/lkml/2011/8/27/6.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a18a920c70d48a8e4a2b750d8a183b3c1a4be514 upstream.
This patch validates sdev pointer in scsi_dh_activate before proceeding further.
Without this check we might see the panic as below. I have seen this
panic multiple times..
Call trace:
#0 [ffff88007d647b50] machine_kexec at ffffffff81020902
#1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0
#2 [ffff88007d647c70] oops_end at ffffffff8139c650
#3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15
#4 [ffff88007d647d50] page_fault at ffffffff8139b8cf
[exception RIP: scsi_dh_activate+0x82]
RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5
RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988
RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000
R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988
R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#5 [ffff88007d647e38] run_workqueue at ffffffff81060268
#6 [ffff88007d647e78] worker_thread at ffffffff81060386
#7 [ffff88007d647ee8] kthread at ffffffff81064436
#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba
Signed-off-by: Babu Moger <babu.moger@netapp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c68bf8eeaa57c852e74adcf597237be149eef830 upstream.
The call to complete() in st_scsi_execute_end() wakes up sleeping thread
in write_behind_check(), which frees the st_request, thus invalidating
the pointer to the associated bio structure, which is then passed to the
blk_rq_unmap_user(). Fix by storing pointer to bio structure into
temporary local variable.
This bug is present since at least linux-2.6.32.
Signed-off-by: Petr Uzel <petr.uzel@suse.cz>
Reported-by: Juergen Groß <juergen.gross@ts.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 983d3fdd332742167d0482c06fd29cf4b8a687c0 upstream.
Needed to jump to scic_lock unlock.
Also spotted by coccicheck.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 54b5e3a4bfa3452bc10cd4da672099ccc46b8c09 upstream.
Kill the local smp response buffer.
Besides being unnecessary, it is too small (currently truncates
responses to 60 bytes). The mid-layer will have already allocated a
sufficiently sized buffer, just kmap and copy into it directly.
Reported-by: Derick Marks <derick.w.marks@intel.com>
Tested-by: Derick Marks <derick.w.marks@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit bb041a0e9c31229071b6e56e1d0d8374af0d2038 upstream.
Libsas forget to set the sas_address and device type of rphy lead to file
under /sys/class/sas_x show wrong value, fix that.
Signed-off-by: Jack Wang <jack_wang@usish.com>
Tested-by: Crystal Yu <crystal_yu@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5d7c20b7fa5c6ca19e871b4050e321c99d32bd43 upstream.
During kdump testing I noticed timeouts when initialising each IPR
adapter. While the driver has logic to detect an adapter in an
indeterminate state, it wasn't triggering and each adapter went
through a 5 minute timeout before finally going operational.
Some analysis showed the needs_hard_reset flag wasn't getting set.
We can check the reset_devices kernel parameter which is set by
kdump and force a full reset. This fixes the problem.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f575c5d3ebdca3b0482847d8fcba971767754a9e upstream.
The following patch for megaraid_sas will fix a potential bad pointer access
in megasas_reset_timer(), when a MegaRAID 9265/9285 or 9360/9380 gets a
timeout. megasas_build_io_fusion() sets SCp.ptr to be a struct
megasas_cmd_fusion *, but then megasas_reset_timer() was casting SCp.ptr to be
a struct megasas_cmd *, then trying to access cmd->instance, which is invalid.
Just loading instance from scmd->device->host->hostdata in
megasas_reset_timer() fixes the issue.
Signed-off-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
When a wide port is being utilized to a target, if one disables only one
of the
phys, we get an OS crash:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000238
IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
PGD 4103f5067 PUD 41dba9067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/bus/pci/slots/5/address
CPU 0
Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]
Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]
Pid: 5146, comm: scsi_wq_5 Not tainted
2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server
RIP: 0010:[<ffffffff814ca9b1>] [<ffffffff814ca9b1>]
mutex_lock+0x21/0x50
RSP: 0018:ffff8803e4e33d30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238
RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0
FS: 0000000000000000(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task
ffff8803e4e294a0)
Stack:
ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e
<0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00
<0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200
Call Trace:
[<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas]
[<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas]
[<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas]
[<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas]
[<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas]
[<ffffffff8108c7d0>] worker_thread+0x170/0x2a0
[<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c660>] ? worker_thread+0x0/0x2a0
[<ffffffff81091b36>] kthread+0x96/0xa0
[<ffffffff810141ca>] child_rip+0xa/0x20
[<ffffffff81091aa0>] ? kthread+0x0/0xa0
[<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24
4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff
0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d
RIP [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
RSP <ffff8803e4e33d30>
CR2: 0000000000000238
The following patch is admittedly a band-aid, and does not solve the
root cause, but it still is a good candidate for hardening as a pointer
check before reference.
Signed-off-by: Mark Salyzyn <mark_salyzyn@us.xyratex.com>
Tested-by: Jack Wang <jack_wang@usish.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
I hit a crash in qla2x00_abort_all_cmds() if the qla2xxx module is
unloaded right after it is loaded. I debugged this down to the abort
handling improperly treating a command of type SRB_ADISC_CMD as if it
had a bsg_job to complete when that command actually uses the iocb_cmd
part of the union. (I guess to hit this one has to unload the module
while the async FC initialization is still in progress)
It seems we should only look for a bsg_job if type is SRB_ELS_CMD_RPT,
SRB_ELS_CMD_HST or SRB_CT_CMD, so switch the test to make that explicit.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
[SCSI] 3w-9xxx: fix iommu_iova leak
[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
[SCSI] scsi: qla4xxx needs libiscsi.o
[SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
[SCSI] aacraid: reset should disable MSI interrupt
|
|
Following reports on the list, it looks like the 3e-9xxx driver will leak dma
mappings every time we get a transient queueing error back from the card.
This is because it maps the sg list in the routine that sends the command, but
doesn't unmap again in the transient failure path (even though the command is
sent back to the block layer). Fix by unmapping before returning the status.
Reported-by: Chris Boot <bootc@bootc.net>
Tested-by: Chris Boot <bootc@bootc.net>
Acked-by: Adam Radford <aradford@gmail.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
This oops was reported recently:
d:mon> e
cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
sp: c0000000fd4c73a0
msr: 8000000000009032
dar: 0
dsisr: 40000000
current = 0xc0000000fd640d40
paca = 0xc00000000054ff80
pid = 5085, comm = iscsid
d:mon> t
[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
[scsi_transport_iscsi2]
[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000080da560cfc
The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.
The fix is to prevent the setting of the NULL pointer until after there are no
further users of it. The t3cdev->l2opt pointer is now converted to be an rcu
pointer and the L2DATA macro is now called under the protection of the
rcu_read_lock(). When the EEH error path:
t3_adapter_error->offload_close->cxgb3_offload_deactivate
Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
pointer without concern that the underlying data will be freeded before the
pointer is dereferenced.
This has been tested by the reporter and shown to fix the reproted oops
[nhorman: fix up unitinialised variable reported by Dan Carpenter]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Karen Xie <kxie@chelsio.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
sector_t can be different types, so cast it to its largest possible
type.
drivers/scsi/qla2xxx/qla_isr.c:1509:5: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'sector_t'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
SCSI_ISCI needs to select SCSI_SAS_HOST_SMP to ensure that all
needed symbols are available to it.
Fixes this build error:
ERROR: "try_test_sas_gpio_gp_bit" [drivers/scsi/isci/isci.ko] undefined!
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
qla4xxx driver needs to be linked with libiscsi.o to fix
build errors. This happens when no other drivers that use
libiscsi.o are enabled.
ERROR: "iscsi_conn_stop" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_get_addr_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_session_teardown" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_host_alloc" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_start" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_send_pdu" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_session_get_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_get_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_set_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_session_failure" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_complete_pdu" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_session_setup" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_bind" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_conn_setup" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
ERROR: "iscsi_itt_to_task" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined!
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
expander child.
In an enclosure model where there are chaining expanders to a large body
of storage, it was discovered that libsas, responding to a broadcast
event change, would only revalidate the domain of first child expander
in the list.
The issue is that the pointer value to the discovered source device was
used to break out of the loop, rather than the content of the pointer.
This still remains non-compliant as the revalidate domain code is
supposed to loop through all child expanders, and not stop at the first
one it finds that reports a change count. However, the design of this
routine does not allow multiple device discoveries and that would be a
more complicated set of patches reserved for another day. We are fixing
the glaring bug rather than refactoring the code.
Signed-off-by: Mark Salyzyn <msalyzyn@us.xyratex.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
scsi reset on hardware with enabled MSI interrupts generates WARNING message
[11027.798722] aacraid: Host adapter abort request (0,0,0,0)
[11027.798814] aacraid: Host adapter reset request. SCSI hang ?
[11087.762237] aacraid: SCSI bus appears hung
[11135.082543] ------------[ cut here ]------------
[11135.082646] WARNING: at drivers/pci/msi.c:658 pci_enable_msi_block+0x251/0x290()
Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Mark Salyzyn <mark_salyzyn@us.xyratex.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6: (25 commits)
[SCSI] bnx2i: Fixed the endian on TTT for NOP out transmission
[SCSI] libfc: fix referencing to fc_fcp_pkt from the frame pointer via fr_fsp()
[SCSI] libfc: block SCSI eh thread for blocked rports
[SCSI] libfc: fix fc_eh_host_reset
[SCSI] fcoe: Fix deadlock between fip's recv_work and rtnl
[SCSI] qla2xxx: Update version number to 8.03.07.07-k.
[SCSI] qla2xxx: Set the task attributes after memsetting fcp cmnd.
[SCSI] qla2xxx: Correct inadvertent loop state transitions during port-update handling.
[SCSI] qla2xxx: Save and restore irq in the response queue interrupt handler.
[SCSI] qla2xxx: Double check for command completion if abort mailbox command fails.
[SCSI] qla2xxx: Acquire hardware lock while manipulating dsd list.
[SCSI] qla2xxx: Fix qla24xx revision check while enabling interrupts.
[SCSI] qla2xxx: T10 DIF - Fix incorrect error reporting.
[SCSI] qla2xxx: T10 DIF - Handle uninitalized sectors.
[SCSI] hpsa: fix physical device lun and target numbering problem
[SCSI] hpsa: fix problem that OBDR devices are not detected
[SCSI] isci: add version number
[SCSI] isci: fix event-get pointer increment
[SCSI] isci: dynamic interrupt coalescing
[SCSI] isci: Leave requests alone if already terminating.
...
|
|
When CONFIG_NET is disabled, SCSI_QLA_ISCSI selects SCSI_ISCSI_ATTRS,
which uses network interfaces, so the build fails with multiple errors:
warning: (ISCSI_TCP && SCSI_CXGB3_ISCSI && SCSI_CXGB4_ISCSI && SCSI_QLA_ISCSI && INFINIBAND_ISER) selects SCSI_ISCSI_ATTRS which has unmet direct dependencies (SCSI && NET)
ERROR: "skb_trim" [drivers/scsi/scsi_transport_iscsi.ko] undefined!
ERROR: "netlink_kernel_create" [drivers/scsi/scsi_transport_iscsi.ko] undefined!
ERROR: "netlink_kernel_release" [drivers/scsi/scsi_transport_iscsi.ko] undefined!
...
so make SCSI_QLA_ISCSI also depend on NET to prevent the build errors.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: iscsi-driver@qlogic.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The iscsi_nopout task's TTT is defined as __be32 while the DMA
memory to the chip is CPU specific. This creates a problem for
unsolicited NOP-In responses where the TTT is not the RESERVED
tag of 0xFFs. This patch adds a call to be32_to_cpu for the TTT
specified.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
In commit 6a716a8, while releasing the DDP context in case frame_send() failed,
the frame may already be freed, so we should store the pointer to fc_fcp_pkt and
release the DDP context using the locally stored fsp instead of getting fsp from
the fr_fsp(fp) on a frame.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Reported-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Call fc_block_scsi_eh() in all fcoe eh to blocks
the scsi_eh thread for blocked rports.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Current fc_eh_host_reset leaves lport offline
permanently due to FLOGI response getting
handled by LOGO response from last reset as both
had same exchange id.
So fix this by having end to end exches clean-up
using exchange abort along exches reset
done from fc_eh_host_reset. This would avoid
exchanges collision between the sessions across
the reset. In this case implicit login should have
done that but no aborting support for FIP
frames, so just wait till lport->r_a_tov before
restarting next flogi to ensure all exchanges
are good to use again for next session.
Below is the trace of LOGO from older session
coming ahead of FLOGI response with same exche id
0x203:-
617 86.435165 4e.00.0b -> ff.ff.fc FC ELS LOGO 0x203
618 86.435195 4e.00.0b -> b6.02.00 FC ELS LOGO 0x213
619 86.435220 4e.00.0b -> 18.03.00 FC ELS LOGO 0x223
620 86.435244 4e.00.0b -> 18.02.00 FC ELS LOGO 0x233
621 86.435267 4e.00.0b -> 18.01.00 FC ELS LOGO 0x243
622 86.435349 00.00.00 -> ff.ff.fe FC ELS FLOGI 0x203
623 86.435549 ff.ff.fc -> 4e.00.0b FC ELS ACC (LOGO) 0x203
624 86.438721 ff.ff.fe -> 4e.00.0b FC ELS ACC (FLOGI) 0x203
625 86.442059 18.03.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x223
626 86.443683 b6.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x213
627 86.447693 18.01.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x243
628 86.453499 18.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x233
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
The rtnl cannot be held durrng the fcoe_interface_put.
If it is the last reference on the fcoe_interface the
fcoe_ctlr_destroy will be called as a part of the
cleanup, ultimately calling cancel_work_sync(&fip->recv_work);
If we are processing a flogi response we will be in
the recv_work context and we will lock the rtnl to
add a new unicast MAC address. This is how the deadlock
can occur.
The fix is simply to move the rtnl_lock/unlock into
fcoe_interface_cleanup so that it can be unlocked before
fcoe_interface_put is called.
Here is the lockdep report:
Jul 21 11:26:35 bubba [ 223.870702]
ul 21 11:26:35 bubba [ 223.870704] =======================================================
Jul 21 11:26:35 bubba [ 223.871255] [ INFO: possible circular locking dependency detected ]
Jul 21 11:26:35 bubba [ 223.871530] 3.0.0-rc7+ #1
Jul 21 11:26:35 bubba [ 223.871797] -------------------------------------------------------
Jul 21 11:26:35 bubba [ 223.872072] lockdeptest.sh/3464 is trying to acquire lock:
Jul 21 11:26:35 bubba [ 223.872345] ((&fip->recv_work)
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff810531f1>] wait_on_work+0x0/0xbd
Jul 21 11:26:35 bubba [ 223.873022]
Jul 21 11:26:35 bubba [ 223.873023] but task is already holding lock:
Jul 21 11:26:35 bubba [ 223.873555] (rtnl_mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14
Jul 21 11:26:35 bubba [ 223.874229]
Jul 21 11:26:35 bubba [ 223.874230] which lock already depends on the new lock.
Jul 21 11:26:35 bubba [ 223.874231]
Jul 21 11:26:35 bubba [ 223.875032]
Jul 21 11:26:35 bubba [ 223.875033] the existing dependency chain (in reverse order) is:
Jul 21 11:26:35 bubba [ 223.875573]
Jul 21 11:26:35 bubba [ 223.875573] -> #1
Jul 21 11:26:35 bubba (rtnl_mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba :
Jul 21 11:26:35 bubba [ 223.876301]
Jul 21 11:26:35 bubba [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7
Jul 21 11:26:35 bubba [ 223.876645]
Jul 21 11:26:35 bubba [<ffffffff8151d975>] __mutex_lock_common+0x47/0x30d
Jul 21 11:26:35 bubba [ 223.876991]
Jul 21 11:26:35 bubba [<ffffffff8151dd36>] mutex_lock_nested+0x3b/0x40
Jul 21 11:26:35 bubba [ 223.877334]
Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14
Jul 21 11:26:35 bubba [ 223.877675]
Jul 21 11:26:35 bubba [<ffffffffa003d5a0>] fcoe_update_src_mac+0x2b/0x80 [fcoe]
Jul 21 11:26:35 bubba [ 223.878022]
Jul 21 11:26:35 bubba [<ffffffffa003d698>] fcoe_flogi_resp+0x5e/0x79 [fcoe]
Jul 21 11:26:35 bubba [ 223.878366]
Jul 21 11:26:35 bubba [<ffffffffa001566f>] fc_exch_recv+0x7f5/0x9da [libfc]
Jul 21 11:26:35 bubba [ 223.878713]
Jul 21 11:26:35 bubba [<ffffffffa00327d8>] fcoe_ctlr_recv_work+0x71f/0x10dc [libfcoe]
Jul 21 11:26:35 bubba [ 223.879258]
Jul 21 11:26:35 bubba [<ffffffff81053761>] process_one_work+0x1d7/0x347
Jul 21 11:26:35 bubba [ 223.879601]
Jul 21 11:26:35 bubba [<ffffffff81054ade>] worker_thread+0xf8/0x17c
Jul 21 11:26:35 bubba [ 223.879944]
Jul 21 11:26:35 bubba [<ffffffff81058184>] kthread+0x7d/0x85
Jul 21 11:26:35 bubba [ 223.880287]
Jul 21 11:26:35 bubba [<ffffffff81526414>] kernel_thread_helper+0x4/0x10
Jul 21 11:26:35 bubba [ 223.880634]
Jul 21 11:26:35 bubba [ 223.880635] -> #0
Jul 21 11:26:35 bubba ((&fip->recv_work)
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba :
Jul 21 11:26:35 bubba [ 223.881357]
Jul 21 11:26:35 bubba [<ffffffff8106b93e>] __lock_acquire+0xb1d/0xe2c
Jul 21 11:26:35 bubba [ 223.881695]
Jul 21 11:26:35 bubba [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7
Jul 21 11:26:35 bubba [ 223.882033]
Jul 21 11:26:35 bubba [<ffffffff81053241>] wait_on_work+0x50/0xbd
Jul 21 11:26:35 bubba [ 223.882378]
Jul 21 11:26:35 bubba [<ffffffff81053b32>] __cancel_work_timer+0xb6/0xf4
Jul 21 11:26:35 bubba [ 223.882718]
Jul 21 11:26:35 bubba [<ffffffff81053b8a>] cancel_work_sync+0xb/0xd
Jul 21 11:26:35 bubba [ 223.883057]
Jul 21 11:26:35 bubba [<ffffffffa00317e6>] fcoe_ctlr_destroy+0x1d/0x67 [libfcoe]
Jul 21 11:26:35 bubba [ 223.883399]
Jul 21 11:26:35 bubba [<ffffffffa003e51e>] fcoe_interface_release+0x21/0x45 [fcoe]
Jul 21 11:26:35 bubba [ 223.883940]
Jul 21 11:26:35 bubba [<ffffffff811fbbe6>] kref_put+0x43/0x4d
Jul 21 11:26:35 bubba [ 223.884280]
Jul 21 11:26:35 bubba [<ffffffffa003ebba>] fcoe_interface_put+0x17/0x19 [fcoe]
Jul 21 11:26:35 bubba [ 223.884624]
Jul 21 11:26:35 bubba [<ffffffffa003f2a6>] fcoe_interface_cleanup+0x188/0x193 [fcoe]
Jul 21 11:26:35 bubba [ 223.885163]
Jul 21 11:26:35 bubba [<ffffffffa003f303>] fcoe_destroy+0x52/0x72 [fcoe]
Jul 21 11:26:35 bubba [ 223.885502]
Jul 21 11:26:35 bubba [<ffffffffa00340a4>] fcoe_transport_destroy+0xab/0x110 [libfcoe]
Jul 21 11:26:35 bubba [ 223.886045]
Jul 21 11:26:35 bubba [<ffffffff81056153>] param_attr_store+0x43/0x62
Jul 21 11:26:35 bubba [ 223.886385]
Jul 21 11:26:35 bubba [<ffffffff8105602d>] module_attr_store+0x21/0x25
Jul 21 11:26:35 bubba [ 223.886728]
Jul 21 11:26:35 bubba [<ffffffff8114c23d>] sysfs_write_file+0x103/0x13f
Jul 21 11:26:35 bubba [ 223.887068]
Jul 21 11:26:35 bubba [<ffffffff810f3e7b>] vfs_write+0xa7/0xfa
Jul 21 11:26:35 bubba [ 223.887406]
Jul 21 11:26:35 bubba [<ffffffff810f4073>] sys_write+0x45/0x69
Jul 21 11:26:35 bubba [ 223.887742]
Jul 21 11:26:35 bubba [<ffffffff815252bb>] system_call_fastpath+0x16/0x1b
Jul 21 11:26:35 bubba [ 223.888083]
Jul 21 11:26:35 bubba [ 223.888084] other info that might help us debug this:
Jul 21 11:26:35 bubba [ 223.888085]
Jul 21 11:26:35 bubba [ 223.888879] Possible unsafe locking scenario:
Jul 21 11:26:35 bubba [ 223.888881]
Jul 21 11:26:35 bubba [ 223.889411] CPU0 CPU1
Jul 21 11:26:35 bubba [ 223.889683] ---- ----
Jul 21 11:26:35 bubba [ 223.889955] lock(
Jul 21 11:26:35 bubba rtnl_mutex
Jul 21 11:26:35 bubba );
Jul 21 11:26:35 bubba [ 223.890349] lock(
Jul 21 11:26:35 bubba (&fip->recv_work)
Jul 21 11:26:35 bubba );
Jul 21 11:26:35 bubba [ 223.890751] lock(
Jul 21 11:26:35 bubba rtnl_mutex
Jul 21 11:26:35 bubba );
Jul 21 11:26:35 bubba [ 223.891154] lock(
Jul 21 11:26:35 bubba (&fip->recv_work)
Jul 21 11:26:35 bubba );
Jul 21 11:26:35 bubba [ 223.891549]
Jul 21 11:26:35 bubba [ 223.891550] *** DEADLOCK ***
Jul 21 11:26:35 bubba [ 223.891551]
Jul 21 11:26:35 bubba [ 223.892347] 6 locks held by lockdeptest.sh/3464:
Jul 21 11:26:35 bubba [ 223.892621] #0:
Jul 21 11:26:35 bubba (&buffer->mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff8114c171>] sysfs_write_file+0x37/0x13f
Jul 21 11:26:35 bubba [ 223.893359] #1:
Jul 21 11:26:35 bubba (s_active
Jul 21 11:26:35 bubba ){++++.+}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff8114c21c>] sysfs_write_file+0xe2/0x13f
Jul 21 11:26:35 bubba [ 223.894094] #2:
Jul 21 11:26:35 bubba (param_lock
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff81056146>] param_attr_store+0x36/0x62
Jul 21 11:26:35 bubba [ 223.894835] #3:
Jul 21 11:26:35 bubba (ft_mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffffa0034017>] fcoe_transport_destroy+0x1e/0x110 [libfcoe]
Jul 21 11:26:35 bubba [ 223.895574] #4:
Jul 21 11:26:35 bubba (fcoe_config_mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffffa003f2c9>] fcoe_destroy+0x18/0x72 [fcoe]
Jul 21 11:26:35 bubba [ 223.896314] #5:
Jul 21 11:26:35 bubba (rtnl_mutex
Jul 21 11:26:35 bubba ){+.+.+.}
Jul 21 11:26:35 bubba , at:
Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14
Jul 21 11:26:35 bubba [ 223.897047]
Jul 21 11:26:35 bubba [ 223.897048] stack backtrace:
Jul 21 11:26:35 bubba [ 223.897578] Pid: 3464, comm: lockdeptest.sh Not tainted 3.0.0-rc7+ #1
Jul 21 11:26:35 bubba [ 223.897853] Call Trace:
Jul 21 11:26:35 bubba [ 223.898128] [<ffffffff81068e16>] print_circular_bug+0x1f8/0x209
Jul 21 11:26:35 bubba [ 223.898416] [<ffffffff8106b93e>] __lock_acquire+0xb1d/0xe2c
Jul 21 11:26:35 bubba [ 223.898699] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6
Jul 21 11:26:35 bubba [ 223.898982] [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7
Jul 21 11:26:35 bubba [ 223.899263] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6
Jul 21 11:26:35 bubba [ 223.899547] [<ffffffff8104a097>] ? mod_timer+0x8f/0x98
Jul 21 11:26:35 bubba [ 223.899827] [<ffffffff81053241>] wait_on_work+0x50/0xbd
Jul 21 11:26:35 bubba [ 223.900108] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6
Jul 21 11:26:35 bubba [ 223.900390] [<ffffffff81053b32>] __cancel_work_timer+0xb6/0xf4
Jul 21 11:26:35 bubba [ 223.900671] [<ffffffff81053b8a>] cancel_work_sync+0xb/0xd
Jul 21 11:26:35 bubba [ 223.900953] [<ffffffffa00317e6>] fcoe_ctlr_destroy+0x1d/0x67 [libfcoe]
Jul 21 11:26:35 bubba [ 223.901237] [<ffffffffa003e51e>] fcoe_interface_release+0x21/0x45 [fcoe]
Jul 21 11:26:35 bubba [ 223.901522] [<ffffffffa003e4fd>] ? fcoe_enable+0x6b/0x6b [fcoe]
Jul 21 11:26:35 bubba [ 223.901803] [<ffffffff811fbbe6>] kref_put+0x43/0x4d
Jul 21 11:26:35 bubba [ 223.902083] [<ffffffffa003ebba>] fcoe_interface_put+0x17/0x19 [fcoe]
Jul 21 11:26:35 bubba [ 223.902367] [<ffffffffa003f2a6>] fcoe_interface_cleanup+0x188/0x193 [fcoe]
Jul 21 11:26:35 bubba [ 223.902653] [<ffffffff8151dd36>] ? mutex_lock_nested+0x3b/0x40
Jul 21 11:26:35 bubba [ 223.902939] [<ffffffffa003f303>] fcoe_destroy+0x52/0x72 [fcoe]
Jul 21 11:26:35 bubba [ 223.903223] [<ffffffffa00340a4>] fcoe_transport_destroy+0xab/0x110 [libfcoe]
Jul 21 11:26:35 bubba [ 223.903508] [<ffffffff81056153>] param_attr_store+0x43/0x62
Jul 21 11:26:35 bubba [ 223.903792] [<ffffffff8105602d>] module_attr_store+0x21/0x25
Jul 21 11:26:35 bubba [ 223.904075] [<ffffffff8114c23d>] sysfs_write_file+0x103/0x13f
Jul 21 11:26:35 bubba [ 223.904357] [<ffffffff810f3e7b>] vfs_write+0xa7/0xfa
Jul 21 11:26:35 bubba [ 223.904642] [<ffffffff810f51d6>] ? fget_light+0x35/0x96
Jul 21 11:26:35 bubba [ 223.904923] [<ffffffff810f4073>] sys_write+0x45/0x69
Jul 21 11:26:35 bubba [ 223.905204] [<ffffffff815252bb>] system_call_fastpath+0x16/0x1b
Jul 21 11:26:36 bubba [ 223.964438] ixgbe 0000:05:00.0: eth3: detected SFP+: 5
Jul 21 11:26:37 bubba [ 225.196702] ixgbe 0000:05:00.0: eth3: NIC Link is Up 10 Gbps, Flow Control: None
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
The memset of the fcp_cmnd struct needs to be moved so that it will not
zero-out valid data.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
port-update handling.
Transitioning to a LOOP_UPDATE loop-state could cause the driver
to miss normal link/target processing. LOOP_UPDATE is a crufty
artifact leftover from at time the driver performed it's own
internal command-queuing. Safely remove this state.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
fails.
Close a small window where we could falsely fail an abort request if the mailbox
command fails but the command was returned during interrupt context.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
The dsd list shouldn't be manipulated without taking the per host hardware
lock to prevent multiple callers from trampling upon one another.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Since we enable interrupts before initializing the firmware, use the chip
revision from PCI config space directly to perform the chip revision check.
Also remove the unnecessary firmware attributes test.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
This fix:
- Disables app tag peeking; correct tag check will be added when the
SCSI API is available.
- Always derive ref_tag from scsi_get_lba()
- Removes incorrect swap of FCP_LUN in FCP_CMND
- Moves app-tag error check before ref-tag check. The reason being,
currently there is no interface in SCSI to retrieve the app-tag
for protection I/Os, so driver puts zero for app-tag in the
firmware interface, but requests not to validate it, but when a
ref-tag error is detected by firmware, it would put
expected/actual tags for all the protection tags (guard/app/ref).
As driver checks for app tag error first, a ref-tag error is
incorrectly flagged as app-tag error.
- Convert HBA specific checks to capability based.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Driver needs to update protection bytes for uninitialized sectors as they are
not DMA-d.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Reviewed-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
If a physical device exposed to the OS by hpsa
is replaced (e.g. one hot plug tape drive is replaced
by another, or a tape drive is placed into "OBDR" mode
in which it acts like a CD-ROM device) and a rescan is
initiated, the replaced device will be added to the
SCSI midlayer with target and lun numbers set to -1.
After that, a panic is likely to ensue. When a physical
device is replaced, the lun and target number should be
preserved.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
The test to detect OBDR ("One Button Disaster Recovery")
cd-rom devices was comparing against uninitialized data.
Fixed by moving the test for the device to where the
inquiry data is collected, and uninitialized variable
altogether as it wasn't really being used.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|