linux-toradex.git/drivers/accel, branch v7.0-rc7

accel/qaic: Handle DBC deactivation if the owner went away

2026-03-27T16:48:30+00:00

When a DBC is released, the device sends a QAIC_TRANS_DEACTIVATE_FROM_DEV
transaction to the host over the QAIC_CONTROL MHI channel. QAIC handles
this by calling decode_deactivate() to release the resources allocated for
that DBC. Since that handling is done in the qaic_manage_ioctl() context,
if the user goes away before receiving and handling the deactivation, the
host will be out-of-sync with the DBCs available for use, and the DBC
resources will not be freed unless the device is removed. If another user
loads and requests to activate a network, then the device assigns the same
DBC to that network, QAIC will "indefinitely" wait for dbc->in_use = false,
leading the user process to hang.

As a solution to this, handle QAIC_TRANS_DEACTIVATE_FROM_DEV transactions
that are received after the user has gone away.

Fixes: 129776ac2e38 ("accel/qaic: Add control path")
Signed-off-by: Youssef Samir 
Reviewed-by: Lizhi Hou 
Reviewed-by: Jeff Hugo 
Signed-off-by: Jeff Hugo 
Link: https://patch.msgid.link/20260205123415.3870898-1-youssef.abdulrahman@oss.qualcomm.com

accel/ivpu: Add disable clock relinquish workaround for NVL-A0

2026-03-24T08:29:28+00:00

Turn on disable clock relinquish workaround for Nova Lake A0.
Without this workaround NPU may not power off correctly after
inference, leading to unexpected system behavior.

Fixes: 550f4dd2cedd ("accel/ivpu: Add support for Nova Lake's NPU")
Cc:  # v6.19+

Reviewed-by: Lizhi.hou 
Signed-off-by: Karol Wachowski 
Link: https://patch.msgid.link/20260323095029.64613-1-karol.wachowski@linux.intel.com

accel/amdxdna: Fix runtime suspend deadlock when there is pending job

2026-03-10T18:46:40+00:00

The runtime suspend callback drains the running job workqueue before
suspending the device. If a job is still executing and calls
pm_runtime_resume_and_get(), it can deadlock with the runtime suspend
path.

Fix this by moving pm_runtime_resume_and_get() from the job execution
routine to the job submission routine, ensuring the device is resumed
before the job is queued and avoiding the deadlock during runtime
suspend.

Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) 
Signed-off-by: Lizhi Hou 
Link: https://patch.msgid.link/20260310180058.336348-1-lizhi.hou@amd.com

accel/ivpu: Remove boot params address setting via MMIO register

2026-03-10T12:34:31+00:00

The NPU 60XX uses the default boot params location specified
in the firmware image header, consistent with earlier generations.
Remove the unnecessary MMIO register write, freeing the AON register
for future use.

Fixes: 44e4c88951fa ("accel/ivpu: Implement warm boot flow for NPU6 and unify boot handling")
Signed-off-by: Andrzej Kacprowski 
Reviewed-by: Karol Wachowski 
Signed-off-by: Karol Wachowski 
Link: https://patch.msgid.link/20260305142226.194995-1-andrzej.kacprowski@linux.intel.com
(cherry picked from commit 81e62e7bf8b9309bf0febdf00940818f98bc23d8)
Signed-off-by: Thomas Zimmermann

accel: ethosu: Handle possible underflow in IFM size calculations

2026-03-05T21:21:17+00:00

If the command stream has larger padding sizes than the IFM and OFM
diminsions, then the calculations will underflow to a negative value.
The result is a very large region bounds which is caught on submit, but
it's better to catch it earlier.

Current mesa ethosu driver has a signedness bug which resulted in
padding of 127 (the max) and triggers this issue.

Reviewed-and-Tested-by: Anders Roxell 
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-3-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm)

accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar

2026-03-05T21:21:17+00:00

The NPU_OP_ELEMENTWISE instruction uses a scalar value for IFM2 if the
IFM2_BROADCAST "scalar" mode is set. It is a bit (7) on the u65 and
part of a field (bits 3:0) on the u85. The driver was hardcoded to the
u85.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Reviewed-and-Tested-by: Anders Roxell 
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-2-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm)

accel: ethosu: Fix job submit error clean-up refcount underflows

2026-03-05T21:21:17+00:00

If the job submit fails before adding the job to the scheduler queue
such as when the GEM buffer bounds checks fail, then doing a
ethosu_job_put() results in a pm_runtime_put_autosuspend() without the
corresponding pm_runtime_resume_and_get(). The dma_fence_put()'s are
also unnecessary, but seem to be harmless.

Split the ethosu_job_cleanup() function into 2 parts for the before
and after the job is queued.

Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Reviewed-and-Tested-by: Anders Roxell 
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-1-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm)

accel/amdxdna: Split mailbox channel create function

2026-03-05T17:24:33+00:00

The management channel used for firmware control command submission is
currently created after the firmware is started. If channel creation
fails (for example, due to memory allocation failure or workqueue
creation interruption), the firmware remains in a pending state and is
unable to receive any control commands.

To avoid leaving the firmware in this inconsistent state, split
xdna_mailbox_create_channel() into two separate functions so that
resource allocation can be completed before interacting with the
hardware.
  xdna_mailbox_alloc_channel()
    Allocates memory and initializes the workqueue. This can be called
    earlier, before interacting with the hardware.
  xdna_mailbox_start_channel()
    Performs the hardware interaction required to start the channel.

Rename xdna_mailbox_destroy_channel() to xdna_mailbox_free_channel().
Ensure that xdna_mailbox_stop_channel() and xdna_mailbox_free_channel()
properly unwind the corresponding start and allocation steps, respectively.

Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Mario Limonciello (AMD) 
Signed-off-by: Lizhi Hou 
Link: https://patch.msgid.link/20260305062041.3954024-1-lizhi.hou@amd.com

accel/amdxdna: Fix major version check on NPU1 platform

2026-03-04T20:05:02+00:00

Add the missing major number in npu1_fw_feature_table.

Without the major version specified, the firmware feature check fails,
preventing new firmware commands from being enabled on the NPU1
platform.

With the correct major version populated, the driver properly detects
firmware support and enables the new command.

Fixes: f1eac46fe5f7 ("accel/amdxdna: Update firmware version check for latest firmware")
Reviewed-by: Mario Limonciello (AMD) 
Signed-off-by: Lizhi Hou 
Link: https://patch.msgid.link/20260304195012.3616908-1-lizhi.hou@amd.com

accel/amdxdna: Fix NULL pointer dereference of mgmt_chann

2026-03-02T17:43:22+00:00

mgmt_chann may be set to NULL if the firmware returns an unexpected
error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL
pointer dereference in aie2_hw_stop().

Fix this by introducing a dedicated helper to destroy mgmt_chann
and by adding proper NULL checks before accessing it.

Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Mario Limonciello (AMD) 
Signed-off-by: Lizhi Hou 
Link: https://patch.msgid.link/20260226213857.3068474-1-lizhi.hou@amd.com