<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h, branch v7.0-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/amdgpu: Add virt command to send VF ras command</title>
<updated>2025-12-08T18:56:33+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2025-10-21T02:13:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=60a300780d8abb4c98f2837059db63f6211a54f0'/>
<id>60a300780d8abb4c98f2837059db63f6211a54f0</id>
<content type='text'>
Add virt command and interface to send VF ras command.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add virt command and interface to send VF ras command.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Introduce VF critical region check for RAS poison injection</title>
<updated>2025-09-18T13:43:02+00:00</updated>
<author>
<name>Xiang Liu</name>
<email>xiang.liu@amd.com</email>
</author>
<published>2025-08-19T04:51:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607'/>
<id>f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607</id>
<content type='text'>
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement unrecoverable error message handling for VFs</title>
<updated>2025-05-07T21:43:13+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T21:18:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=086809c82c96116aed78f762e4f1d61a4e0210a7'/>
<id>086809c82c96116aed78f762e4f1d61a4e0210a7</id>
<content type='text'>
This notification may arrive in VF mailbox while polling for response from
another event.

This patches covers the following scenarios:

- If VF is already in RMA state, then do not attempt to contact the host.
  Host will ignore the VF after sending the notification.

- If the notification is detected during polling, then set the RMA status,
  and return error to caller.

- If the notification arrives by interrupt, then set the RMA status and
  queue a reset.  This reset will fail and VF will stop runtime services.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This notification may arrive in VF mailbox while polling for response from
another event.

This patches covers the following scenarios:

- If VF is already in RMA state, then do not attempt to contact the host.
  Host will ignore the VF after sending the notification.

- If the notification is detected during polling, then set the RMA status,
  and return error to caller.

- If the notification arrives by interrupt, then set the RMA status and
  queue a reset.  This reset will fail and VF will stop runtime services.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement Runtime Bad Page query for VFs</title>
<updated>2025-05-07T21:41:49+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T20:48:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5da3d8820dd3202d5ae871050e45facb2787235d'/>
<id>5da3d8820dd3202d5ae871050e45facb2787235d</id>
<content type='text'>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Update headers for CPER support on SRIOV</title>
<updated>2025-03-05T15:46:40+00:00</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-02-26T21:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d4c60219ac10242a1d5a621e7ba673d6128b7e13'/>
<id>d4c60219ac10242a1d5a621e7ba673d6128b7e13</id>
<content type='text'>
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for
CPER support on VFs. PMFW ACA messages are not available on VFs,
and VFs must query CPERs from host.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for
CPER support on VFs. PMFW ACA messages are not available on VFs,
and VFs must query CPERs from host.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support</title>
<updated>2024-11-11T16:55:01+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T13:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=60c58d72afb81d2dc3f52f638eff5197511ac114'/>
<id>60c58d72afb81d2dc3f52f638eff5197511ac114</id>
<content type='text'>
The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT</title>
<updated>2024-08-13T16:12:51+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2024-08-07T09:32:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ef6c2cb349c708676b7820c36a5beb75868ad544'/>
<id>ef6c2cb349c708676b7820c36a5beb75868ad544</id>
<content type='text'>
on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to
wait all 8 gpus finish mode1 reset and then do re-init. As observed,
sometimes the gpu which triggered the reset need to wait 15s for all
gpus to finish.

If poll msg timeout, guest driver will send the reset message again, and
may mess up the following reinit sequence on other gpus.

So extend the time to cover the maximum time needed to recover.

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to
wait all 8 gpus finish mode1 reset and then do re-init. As observed,
sometimes the gpu which triggered the reset need to wait 15s for all
gpus to finish.

If poll msg timeout, guest driver will send the reset message again, and
may mess up the following reinit sequence on other gpus.

So extend the time to cover the maximum time needed to recover.

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: process RAS fatal error MB notification</title>
<updated>2024-06-27T21:31:37+00:00</updated>
<author>
<name>Vignesh Chander</name>
<email>Vignesh.Chander@amd.com</email>
</author>
<published>2024-06-24T21:44:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cbda2758d8bfae323b846210a3e52f0ad5fe7164'/>
<id>cbda2758d8bfae323b846210a3e52f0ad5fe7164</id>
<content type='text'>
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add RAS_POISON_READY host response message</title>
<updated>2024-01-25T19:58:03+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-01-21T15:25:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2474414c60b7ed1f90293facdc4d94ef7cf61a3b'/>
<id>2474414c60b7ed1f90293facdc4d94ef7cf61a3b</id>
<content type='text'>
In a non-FLR page avoidance scenario, the host driver will
provide the bad pages in the pf2vf exchange region.

Adding a new host response message to indicate when the
pf2vf exchange region has been updated.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a non-FLR page avoidance scenario, the host driver will
provide the bad pages in the pf2vf exchange region.

Adding a new host response message to indicate when the
pf2vf exchange region has been updated.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: add RAS poison consumption handler for NV SRIOV</title>
<updated>2022-12-15T17:18:19+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2022-12-06T09:04:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ae844dd79ffc60f419b32a8d6026128f18021650'/>
<id>ae844dd79ffc60f419b32a8d6026128f18021650</id>
<content type='text'>
Send handling request to host.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Send handling request to host.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
