<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h, branch v6.0-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/amdgpu: add dummy event6 for vega10</title>
<updated>2022-01-07T22:19:34+00:00</updated>
<author>
<name>James Yao</name>
<email>yiqing.yao@amd.com</email>
</author>
<published>2021-12-29T10:10:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=216a9873198bdc5c670a9f71d58fafd30227c9c8'/>
<id>216a9873198bdc5c670a9f71d58fafd30227c9c8</id>
<content type='text'>
[why]
Malicious mailbox event1 fails driver loading on vega10.
A dummy event6 prevent driver from taking response from malicious event1 as its own.

[how]
On vega10, send a mailbox event6 before sending event1.

Signed-off-by: James Yao &lt;yiqing.yao@amd.com&gt;
Reviewed-by: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[why]
Malicious mailbox event1 fails driver loading on vega10.
A dummy event6 prevent driver from taking response from malicious event1 as its own.

[how]
On vega10, send a mailbox event6 before sending event1.

Signed-off-by: James Yao &lt;yiqing.yao@amd.com&gt;
Reviewed-by: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10s</title>
<updated>2021-12-13T21:32:34+00:00</updated>
<author>
<name>Zhigang Luo</name>
<email>zhigang.luo@amd.com</email>
</author>
<published>2021-12-06T21:40:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=85a774d9ada4eb9e83044cefecf2712a2d7171bf'/>
<id>85a774d9ada4eb9e83044cefecf2712a2d7171bf</id>
<content type='text'>
For the ASIC has big FB, it need more time to clear FB during reset.
This change extended SRIOV VF waiting reset completion timeout from 5s
to 10s.

Signed-off-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Acked-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For the ASIC has big FB, it need more time to clear FB during reset.
This change extended SRIOV VF waiting reset completion timeout from 5s
to 10s.

Signed-off-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Acked-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Add ready_to_reset resp for vega10</title>
<updated>2021-08-30T18:59:33+00:00</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2021-08-27T06:48:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=64261a0d0600ab335677073c54b1989565ceddad'/>
<id>64261a0d0600ab335677073c54b1989565ceddad</id>
<content type='text'>
Send response to host after received the flr notification from host.
Port NV change to vega10.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Reviewed-by: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Send response to host after received the flr notification from host.
Port NV change to vega10.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Reviewed-by: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/SRIOV: Extend VF reset request wait period</title>
<updated>2020-12-15T16:35:35+00:00</updated>
<author>
<name>Jiange Zhao</name>
<email>Jiange.Zhao@amd.com</email>
</author>
<published>2020-11-25T13:56:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3aa883ac8eea38281f97a7409d2922e6f343bf6c'/>
<id>3aa883ac8eea38281f97a7409d2922e6f343bf6c</id>
<content type='text'>
In Virtualization case, when one VF is sending too many
FLR requests, hypervisor would stop responding to this
VF's request for a long period of time. This is called
event guard. During this period of cooling time, guest
driver should wait instead of doing other things. After
this period of time, guest driver would resume reset
process and return to normal.

Currently, guest driver would wait 12 seconds and return fail
if it doesn't get response from host.

Solution: extend this waiting time in guest driver and poll
response periodically. Poll happens every 6 seconds and it will
last for 60 seconds.

v2: change the max repetition times from number to macro.

Signed-off-by: Jiange Zhao &lt;Jiange.Zhao@amd.com&gt;
Acked-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In Virtualization case, when one VF is sending too many
FLR requests, hypervisor would stop responding to this
VF's request for a long period of time. This is called
event guard. During this period of cooling time, guest
driver should wait instead of doing other things. After
this period of time, guest driver would resume reset
process and return to normal.

Currently, guest driver would wait 12 seconds and return fail
if it doesn't get response from host.

Solution: extend this waiting time in guest driver and poll
response periodically. Poll happens every 6 seconds and it will
last for 60 seconds.

v2: change the max repetition times from number to macro.

Signed-off-by: Jiange Zhao &lt;Jiange.Zhao@amd.com&gt;
Acked-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu:  extent threshold of waiting FLR_COMPLETE</title>
<updated>2020-04-24T15:42:11+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2020-04-21T10:04:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=312a79b6eafe5c45e3e232506a4a6e97d7cdbba4'/>
<id>312a79b6eafe5c45e3e232506a4a6e97d7cdbba4</id>
<content type='text'>
to 5s to satisfy WHOLE GPU reset which need 3+ seconds to
finish

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Acked-by: Yintian Tao &lt;yttao@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
to 5s to satisfy WHOLE GPU reset which need 3+ seconds to
finish

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Acked-by: Yintian Tao &lt;yttao@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: cleanup idh event/req for NV headers</title>
<updated>2020-04-01T18:44:43+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2020-03-03T08:40:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4d130238a7600fbaa47fb07bf5ede5a2cb2c1622'/>
<id>4d130238a7600fbaa47fb07bf5ede5a2cb2c1622</id>
<content type='text'>
1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h

2) the IDH_EVENT_MAX is not used and not aligned with host side
   so drop it
3) the IDH_TEXT_MESSAG was provided in host but not defined in guest

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h

2) the IDH_EVENT_MAX is not used and not aligned with host side
   so drop it
3) the IDH_TEXT_MESSAG was provided in host but not defined in guest

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/powerplay: enable pp one vf mode for vega10</title>
<updated>2019-12-11T20:22:07+00:00</updated>
<author>
<name>Yintian Tao</name>
<email>yttao@amd.com</email>
</author>
<published>2019-10-30T09:16:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c9ffa427db34e6896523f0ef0c172a0bbb77c9ad'/>
<id>c9ffa427db34e6896523f0ef0c172a0bbb77c9ad</id>
<content type='text'>
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response

v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Acked-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-by: Kenneth Feng &lt;kenneth.feng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response

v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Acked-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-by: Kenneth Feng &lt;kenneth.feng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add IDH_QUERY_ALIVE event for SR-IOV</title>
<updated>2019-05-06T14:36:48+00:00</updated>
<author>
<name>Trigger Huang</name>
<email>Trigger.Huang@amd.com</email>
</author>
<published>2019-04-30T08:23:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b6818520edb0dc83d8de807cb40dff44995eab86'/>
<id>b6818520edb0dc83d8de807cb40dff44995eab86</id>
<content type='text'>
SR-IOV host side will send IDH_QUERY_ALIVE to guest VM to check
if this guest VM is still alive (not destroyed). The only thing
guest KMD need to do is to send ACK back to host.

Signed-off-by: Trigger Huang &lt;Trigger.Huang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SR-IOV host side will send IDH_QUERY_ALIVE to guest VM to check
if this guest VM is still alive (not destroyed). The only thing
guest KMD need to do is to send ACK back to host.

Signed-off-by: Trigger Huang &lt;Trigger.Huang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: support dpm level modification under virtualization v3</title>
<updated>2019-04-10T18:53:27+00:00</updated>
<author>
<name>Yintian Tao</name>
<email>yttao@amd.com</email>
</author>
<published>2019-04-09T12:33:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bb5a2bdf36a8df79e1437b443299b7fe1ea3abfc'/>
<id>bb5a2bdf36a8df79e1437b443299b7fe1ea3abfc</id>
<content type='text'>
Under vega10 virtualuzation, smu ip block will not be added.
Therefore, we need add pp clk query and force dpm level function
at amdgpu_virt_ops to support the feature.

v2: add get_pp_clk existence check and use kzalloc to allocate buf

v3: return -ENOMEM for allocation failure and correct the coding style

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under vega10 virtualuzation, smu ip block will not be added.
Therefore, we need add pp clk query and force dpm level function
at amdgpu_virt_ops to support the feature.

v2: add get_pp_clk existence check and use kzalloc to allocate buf

v3: return -ENOMEM for allocation failure and correct the coding style

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: refactoring mailbox to fix TDR handshake bugs(v2)</title>
<updated>2018-03-14T19:38:27+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2018-01-15T05:44:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=48527e5296edc7b952fb2c1c40fd8c388cc935ed'/>
<id>48527e5296edc7b952fb2c1c40fd8c388cc935ed</id>
<content type='text'>
this patch actually refactor mailbox implmentations, and
all below changes are needed together to fix all those mailbox
handshake issues exposured by heavey TDR test.

1)refactor all mailbox functions based on byte accessing for mb_control
reason is to avoid touching non-related bits when writing trn/rcv part of
mailbox_control, this way some incorrect INTR sent to hypervisor
side could be avoided, and it fixes couple handshake bug.

2)trans_msg function re-impled: put a invalid
logic before transmitting message to make sure the ACK bit is in
a clear status, otherwise there is chance that ACK asserted already
before transmitting message and lead to fake ACK polling.
(hypervisor side have some tricks to workaround ACK bit being corrupted
by VF FLR which hase an side effects that may make guest side ACK bit
asserted wrongly), and clear TRANS_MSG words after message transferred.

3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock
first if invoked, to block gpu recover's participate too early while
hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest
before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and
the FLR_NOTIFY will trigger interrupt to guest which lead to
mailbox_flr_work being invoked)

This can avoid the issue that mailbox trans msg being cleared by its VF FLR.

4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule
mailbox_flr_work, instead of ACK to hypervisor itself, because FLR_NOTIFY
msg sent from hypervisor side doesn't need VF's ACK (this is because
VF's ACK would lead to hypervisor clear its trans_valid/msg, and this
would cause handshake bug if trans_valid/msg is cleared not due to
correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one)

This fixed handshake bug that sometimes GUEST always couldn't receive
"READY_TO_ACCESS_GPU" msg from hypervisor.

5)seperate polling time limite accordingly:
POLL ACK cost no more than 500ms
POLL MSG cost no more than 12000ms
POLL FLR finish cost no more than 500ms

6) we still need to set adev into in_gpu_reset mode after we received
FLR_NOTIFY from host side, this can prevent innocent app wrongly succesed
to open amdgpu dri device.

FLR_NOFITY is received due to an IDLE hang detected from hypervisor side
which indicating GPU is already die in this VF.

v2:
use MACRO as the offset of mailbox_control register
don't test if NOTIFY_CMPL event in rcv_msg since it won't
recieve that message anymore

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Pixel Ding &lt;Pixel.Ding@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this patch actually refactor mailbox implmentations, and
all below changes are needed together to fix all those mailbox
handshake issues exposured by heavey TDR test.

1)refactor all mailbox functions based on byte accessing for mb_control
reason is to avoid touching non-related bits when writing trn/rcv part of
mailbox_control, this way some incorrect INTR sent to hypervisor
side could be avoided, and it fixes couple handshake bug.

2)trans_msg function re-impled: put a invalid
logic before transmitting message to make sure the ACK bit is in
a clear status, otherwise there is chance that ACK asserted already
before transmitting message and lead to fake ACK polling.
(hypervisor side have some tricks to workaround ACK bit being corrupted
by VF FLR which hase an side effects that may make guest side ACK bit
asserted wrongly), and clear TRANS_MSG words after message transferred.

3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock
first if invoked, to block gpu recover's participate too early while
hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest
before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and
the FLR_NOTIFY will trigger interrupt to guest which lead to
mailbox_flr_work being invoked)

This can avoid the issue that mailbox trans msg being cleared by its VF FLR.

4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule
mailbox_flr_work, instead of ACK to hypervisor itself, because FLR_NOTIFY
msg sent from hypervisor side doesn't need VF's ACK (this is because
VF's ACK would lead to hypervisor clear its trans_valid/msg, and this
would cause handshake bug if trans_valid/msg is cleared not due to
correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one)

This fixed handshake bug that sometimes GUEST always couldn't receive
"READY_TO_ACCESS_GPU" msg from hypervisor.

5)seperate polling time limite accordingly:
POLL ACK cost no more than 500ms
POLL MSG cost no more than 12000ms
POLL FLR finish cost no more than 500ms

6) we still need to set adev into in_gpu_reset mode after we received
FLR_NOTIFY from host side, this can prevent innocent app wrongly succesed
to open amdgpu dri device.

FLR_NOFITY is received due to an IDLE hang detected from hypervisor side
which indicating GPU is already die in this VF.

v2:
use MACRO as the offset of mailbox_control register
don't test if NOTIFY_CMPL event in rcv_msg since it won't
recieve that message anymore

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Pixel Ding &lt;Pixel.Ding@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
