linux-toradex.git/drivers/misc, branch toradex_vf_4.1-next

GenWQE: Fix bad page access during abort of resource allocation

2016-11-26T03:57:01+00:00

[ Upstream commit a7a7aeefbca2982586ba2c9fd7739b96416a6d1d ]

When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
 [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
 [<00000000001567c2>] s390_dma_free+0x62/0x98
 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
 [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
 [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
 [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
 [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
 [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
 [<0000000000325344>] SyS_ioctl+0xa4/0xb8
 [<00000000007b86c6>] system_call+0xd6/0x264
 [<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
 [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer 
Signed-off-by: Frank Haverkamp 
Cc: stable 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Sasha Levin

mei: txe: don't clean an unprocessed interrupt cause.

2016-11-24T01:59:49+00:00

[ Upstream commit 43605e293eb13c07acb546c14f407a271837af17 ]

SEC registers are not accessible when the TXE device is in low power
state, hence the SEC interrupt cannot be processed if device is not
awake.

In some rare cases entrance to low power state (aliveness off) and input
ready bits can be signaled at the same time, resulting in communication
stall as input ready won't be signaled again after waking up. To resolve
this IPC_HHIER_SEC bit in HHISR_REG should not be cleaned if the
interrupt is not processed.

Cc: stable@vger.kernel.org
Signed-off-by: Alexander Usyskin 
Signed-off-by: Tomas Winkler 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Sasha Levin

drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors

2016-07-11T03:07:14+00:00

[ Upstream commit f3df53e4d70b5736368a8fe8aa1bb70c1cb1f577 ]

Fix RDAC read back errors caused by a typo. Value must shift by 2.

Fixes: a4bd394956f2 ("drivers/misc/ad525x_dpot.c: new features")
Signed-off-by: Michael Hennerich 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Sasha Levin

misc/bmp085: Enable building as a module

2016-07-11T03:07:13+00:00

[ Upstream commit 50e6315dba721cbc24ccd6d7b299f1782f210a98 ]

Commit 985087dbcb02 'misc: add support for bmp18x chips to the bmp085
driver' changed the BMP085 config symbol to a boolean.  I see no
reason why the shared code cannot be built as a module, so change it
back to tristate.

Fixes: 985087dbcb02 ("misc: add support for bmp18x chips to the bmp085 driver")
Cc: Eric Andersson 
Signed-off-by: Ben Hutchings 
Acked-by: Arnd Bergmann 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Sasha Levin

cxl: Keep IRQ mappings on context teardown

2016-05-17T17:42:44+00:00

[ Upstream commit d6776bba44d9752f6cdf640046070e71ee4bba7b ]

Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.

Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.

We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.

A fix to the generic code is being investigated also.

Signed-off-by: Michael Neuling 
Cc: stable@vger.kernel.org # 3.8
Tested-by: Andrew Donnellan 
Acked-by: Ian Munsie 
Tested-by: Vaibhav Jain 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin

Revert "mei: bus: move driver api functions at the start of the file"

2016-05-08T12:08:56+00:00

This reverts commit 79b768dec5d354aeb143f51db11e0cbb758176fb.

Signed-off-by: Sasha Levin

mei: bus: move driver api functions at the start of the file

2016-04-18T12:50:35+00:00

[ Upstream commit 6238299774377b12c3e24507b100b2687eb5ea32 ]

To make the file more organize move mei client driver api
to the start of the file and add Kdoc.

There are no functional changes in this patch.

Signed-off-by: Tomas Winkler 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Sasha Levin

cxl: Don't remove AFUs/vPHBs in cxl_reset

2015-09-29T17:26:26+00:00

commit 4e1efb403c1c016ae831bd9988a7d2e5e0af41a0 upstream.

If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.

If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.

In both cases, we shouldn't remove the AFU/vPHB.

Reviewed-by: Cyril Bur 
Signed-off-by: Daniel Axtens 
Signed-off-by: Michael Ellerman 
Reported-by: Guenter Roeck 
Signed-off-by: Sudip Mukherjee 
Signed-off-by: Greg Kroah-Hartman

cxl: Fix unbalanced pci_dev_get in cxl_probe

2015-09-29T17:25:58+00:00

commit 2925c2fdf1e0eb642482f5b30577e9435aaa8edb upstream.

Currently the first thing we do in cxl_probe is to grab a reference
on the pci device. Later on, we call device_register on our adapter.
In our remove path, we call device_unregister, but we never call
pci_dev_put. We therefore leak the device every time we do a
reflash.

device_register/unregister is sufficient to hold the reference.
Therefore, drop the call to pci_dev_get.

Here's why this is safe.
The proposed cxl_probe(pdev) calls cxl_adapter_init:
    a) init calls cxl_adapter_alloc, which creates a struct cxl,
       conventionally called adapter. This struct contains a
       device entry, adapter->dev.

    b) init calls cxl_configure_adapter, where we set
       adapter->dev.parent = &dev->dev (here dev is the pci dev)

So at this point, the cxl adapter's device's parent is the PCI
device that I want to be refcounted properly.

    c) init calls cxl_register_adapter
       *) cxl_register_adapter calls device_register(&adapter->dev)

So now we're in device_register, where dev is the adapter device, and
we want to know if the PCI device is safe after we return.

device_register(&adapter->dev) calls device_initialize() and then
device_add().

device_add() does a get_device(). device_add() also explicitly grabs
the device's parent, and calls get_device() on it:

         parent = get_device(dev->parent);

So therefore, device_register() takes a lock on the parent PCI dev,
which is what pci_dev_get() was guarding. pci_dev_get() can therefore
be safely removed.

Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access")
Signed-off-by: Daniel Axtens 
Acked-by: Ian Munsie 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

cxl: Remove racy attempt to force EEH invocation in reset

2015-09-29T17:25:57+00:00

commit 9d8e27673c45927fee9e7d8992ffb325a6b0b0e4 upstream.

cxl_reset currently PERSTs the slot, and then repeatedly tries to
read MMIO space in order to kick off EEH.

There are 2 problems with this: it's unnecessary, and it's racy.

It's unnecessary because the PERST will bring down the PHB link.
That will be picked up by the CAPP, which will send out an HMI.
Skiboot, noticing an HMI from the CAPP, will send an OPAL
notification to the kernel, which will trigger EEH recovery.

It's also racy: the EEH recovery triggered by the CAPP will
eventually cause the MMIO space to have its mapping invalidated
and the pointer NULLed out. This races with our attempt to read
the MMIO space. This is causing OOPSes in testing.

Simply drop all the attempts to force EEH detection, and trust
that Skiboot will send the notification and that we'll act on it.
The Skiboot code to send the EEH notification has been in Skiboot
for as long as CAPP recovery has been supported, so we don't need
to worry about breaking obscure setups with ancient firmware.

Cc: Ryan Grimm 
Fixes: 62fa19d4b4fd ("cxl: Add ability to reset the card")
Signed-off-by: Daniel Axtens 
Acked-by: Ian Munsie 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman