linux-toradex.git/arch/powerpc/kernel/eeh.c, branch v4.2.7

powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()

2015-09-29T17:33:23+00:00

commit 259800135c654a098d9f0adfdd3d1f20eef1f231 upstream.

The config space of some PCI devices can't be accessed when their
PEs are in frozen state. Otherwise, fenced PHB might be seen.
Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing
EEH_PE_CFG_BLOCKED is set automatically when the PE is put to
frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores
PCI device BARs with eeh_pe_restore_bars(), which then calls
eeh_ops->restore_config() to reinitialize the PCI device in
(OPAL) firmware. eeh_ops->restore_config() produces PCI config
access that causes fenced PHB. The problem was reported on below
adapter:

   0001:01:00.0 0200: 14e4:168e (rev 10)
   0001:01:00.0 Ethernet controller: Broadcom Corporation \
                NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

This fixes the issue by skipping eeh_pe_restore_bars() in
eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE.

Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE")
Reported-by: Manvanthara B. Puttashankar 
Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/eeh: Probe after unbalanced kref check

2015-09-29T17:33:23+00:00

commit e642d11bdbfe8eb10116ab3959a2b5d75efda832 upstream.

In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: Gavin Shan 
Signed-off-by: Daniel Axtens 
Acked-by: Gavin Shan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group

2015-06-11T05:14:54+00:00

This relies on the fact that a PCI device always has an IOMMU table
which may not be the case when we get dynamic DMA windows so
let's use more reliable check for IOMMU group here.

As we do not rely on the table presence here, remove the workaround
from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group
parameter from pnv_ioda_setup_bus_dma().

Signed-off-by: Alexey Kardashevskiy 
Acked-by: Gavin Shan 
Reviewed-by: David Gibson 
Signed-off-by: Michael Ellerman

powerpc/eeh: Fix trivial error in eeh_restore_dev_state()

2015-06-07T09:11:49+00:00

Commit 28158cd "powerpc/eeh: Enhance pcibios_set_pcie_reset_state()"
introduced a fix for a problem where certain configurations could lead to
pci_reset_function() destroying the state of PCI devices other than the one
specified.

Unfortunately, the fix has a trivial bug - it calls pci_save_state() again,
when it should be calling pci_restore_state().  This corrects the problem.

Cc: Gavin Shan 
Signed-off-by: David Gibson 
Acked-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: remove unused macro IS_BRIDGE

2015-05-13T04:00:07+00:00

Currently, the macro IS_BRIDGE is not used any where.
This patch just removes it.

Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Introduce eeh_pe_inject_err()

2015-05-12T10:33:35+00:00

The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.

Signed-off-by: Gavin Shan 
Reviewed-by: David Gibson 
Signed-off-by: Michael Ellerman

powerpc/eeh: Delay probing EEH device during hotplug

2015-05-01T03:52:32+00:00

Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE in hotplug case. So we have to delay probing EEH devices
for PowerNV platform until the PE# is assigned.

Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan 
Signed-off-by: Michael Ellerman

powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()

2015-05-01T03:52:09+00:00

When asserting reset in pcibios_set_pcie_reset_state(), the PE
is enforced to (hardware) frozen state in order to drop unexpected
PCI transactions (except PCI config read/write) automatically by
hardware during reset, which would cause recursive EEH error.
However, the (software) frozen state EEH_PE_ISOLATED is missed.
When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
is set in PE state retrival backend. Unfortunately, nobody (the
reset handler or the EEH recovery functinality in host) will clear
EEH_PE_ISOLATED when the PE has been passed through to guest.

The patch sets and clears EEH_PE_ISOLATED properly during reset
in function pcibios_set_pcie_reset_state() to fix the issue.

Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()")
Reported-by: Carol L. Soto 
Signed-off-by: Gavin Shan 
Tested-by: Carol L. Soto 
Signed-off-by: Michael Ellerman

powerpc/mm/thp: Make page table walk safe against thp split/collapse

2015-04-17T01:23:39+00:00

We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman

powerpc/eeh: Fix crash in eeh_add_device_early() on Cell

2015-04-14T07:13:31+00:00

The recent change to the EEH probing causes a crash on Cell because
eeh_ops is NULL.

Check if EEH is enabled and if not bail out.

Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Michael Ellerman