linux-toradex.git/arch/powerpc/kernel/eeh_driver.c, branch v4.2-rc8

powerpc/eeh: fix comment for wait_state()

2015-05-13T04:00:07+00:00

To retrieve the PCI slot state, EEH driver would set a timeout for that.
While current comment is not aligned to what the code does.

This patch fixes those comments according to the code.

Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Remove device_node dependency

2015-03-24T02:15:53+00:00

The patch removes struct eeh_dev::dn and the corresponding helper
functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead,
eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the
pdn, which might contain device_node on PowerNV platform.

Signed-off-by: Gavin Shan 
Signed-off-by: Benjamin Herrenschmidt

powerpc/eeh: Allow to set maximal frozen times

2015-01-23T03:02:54+00:00

When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.

The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).

Requested-by: Ryan Grimm 
Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Introduce flag EEH_PE_REMOVED

2015-01-23T03:02:53+00:00

The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.

Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.

Requested-by: Ryan Grimm 
Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Set EEH_PE_RESET on PE reset

2014-12-02T00:03:26+00:00

The patch introduces additional flag EEH_PE_RESET to indicate the
corresponding PE is under reset. In turn, the PE retrieval bakcend
on PowerNV platform can return unfrozen state for the EEH core to
moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for
the purpose.

In PCI passthrou case, the problem is more worse: Guest doesn't
recover 6th EEH error. The PE is left in isolated (frozen) and
config blocked state on Broadcom adapters. We can't retrieve the
PE's state correctly any more, even from the host side via sysfs
/sys/bus/pci/devices/xxx/eeh_pe_state.

Reported-by: Rajeshkumar Subramanian 
Signed-off-by: Gavin Shan 
Signed-off-by: Benjamin Herrenschmidt

powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED

2014-10-15T00:27:18+00:00

The flag EEH_PE_RESET indicates blocking config space of the PE
during reset time. We potentially need block PE's config space
other than reset time. So it's reasonable to replace it with
EEH_PE_CFG_BLOCKED to indicate its usage.

There are no substantial code or logic changes in this patch.

Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Emulate EEH recovery for VFIO devices

2014-09-30T07:15:18+00:00

When enabling EEH functionality on passed through devices (PE)
with VFIO, the devices in the PE would be removed permanently
from guest side. In that case, the PE remains frozen state.
When returning PE to host, or restarting the guest again, we
had mechanism unfreezing the PE by clearing PESTA/B frozen
bits. However, that's not enough for some adapters, which are
indicated as following "lspci" shows. Those adapters require
hot reset on the parent bus to bring their firmware back to
workable state. Otherwise, those adaptrs won't be operative
and the host (for returning case) or the guest will fail to
load the drivers for those adapters without exception.

0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \
             10Gb NIC (be3) (rev 02)
0000:01:00.0 0200: 19a2:0710 (rev 02)
0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \
             NIC (Lancer) (rev 10)
0001:03:00.0 0200: 10df:e220 (rev 10)

The patch adds mechanism to emulate EEH recovery (for hot reset
on parent PCI bus) on 3 gates to fix the issue: open/release one
adapter of the PE, enable EEH functionality on one adapter of the
PE.

Reported-by:  Murilo Fossa Vicentini 
Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Use eeh_unfreeze_pe()

2014-09-30T07:15:14+00:00

The patch uses eeh_unfreeze_pe() to replace the logic clearing
frozen IO and DMA, in order to simplify the code.

Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Replace pr_warning() with pr_warn()

2014-08-05T05:41:34+00:00

pr_warn() is equal to pr_warning(), but the former is a bit more
formal according to commit fc62f2f ("kernel.h: add pr_warn for
symmetry to dev_warn, netdev_warn").

The patch replaces pr_warning() with pr_warn().

Signed-off-by: Gavin Shan 
Signed-off-by: Benjamin Herrenschmidt

powerpc/powernv: Fix killed EEH event

2014-06-11T07:04:33+00:00

On PowerNV platform, EEH errors are reported by IO accessors or poller
driven by interrupt. After the PE is isolated, we won't produce EEH
event for the PE. The current implementation has possibility of EEH
event lost in this way:

The interrupt handler queues one "special" event, which drives the poller.
EEH thread doesn't pick the special event yet. IO accessors kicks in, the
frozen PE is marked as "isolated" and EEH event is queued to the list.
EEH thread runs because of special event and purge all existing EEH events.
However, we never produce an other EEH event for the frozen PE. Eventually,
the PE is marked as "isolated" and we don't have EEH event to recover it.

The patch fixes the issue to keep EEH events for PEs that have been
marked as "isolated" with the help of additional "force" help to
eeh_remove_event().

Reported-by: Rolf Brudeseth 
Signed-off-by: Gavin Shan 
Signed-off-by: Benjamin Herrenschmidt