<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch v4.14-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/amdgpu/SRIOV:implement guilty job TDR for(V2)</title>
<updated>2017-05-24T21:40:40+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2017-05-11T05:36:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=65781c78ad74e4260fbec92c0ecc05738044e177'/>
<id>65781c78ad74e4260fbec92c0ecc05738044e177</id>
<content type='text'>
1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.

by default this threshold is 1 so a job will be kicked out
after it hang.

2,if a job timeout TDR routine will not reset all sched/ring,
instead if will only reset on the givn one which is indicated
by @job of amdgpu_sriov_gpu_reset, that way we don't need to
reset and recover each sched/ring if we already know which job
cause GPU hang.

3,unblock sriov_gpu_reset for AI family.

V2:
1:put kickout guilty job after sched parked.
2:since parking scheduler prior to kickout already occupies a
while, we can do last check on the in question job before
doing hw_reset.

TODO:
1:when a job is considered as guilty, we should mark some flag
in its fence status flag, and let UMD side aware that this
fence signaling is not due to job complete but job hang.

2:if gpu reset cause all video memory lost, we need introduce
a new policy to implement TDR, like drop all jobs not yet
signaled, and all IOCTL on this device will return ERROR
DEVICE_LOST.
this will be implemented later.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.

by default this threshold is 1 so a job will be kicked out
after it hang.

2,if a job timeout TDR routine will not reset all sched/ring,
instead if will only reset on the givn one which is indicated
by @job of amdgpu_sriov_gpu_reset, that way we don't need to
reset and recover each sched/ring if we already know which job
cause GPU hang.

3,unblock sriov_gpu_reset for AI family.

V2:
1:put kickout guilty job after sched parked.
2:since parking scheduler prior to kickout already occupies a
while, we can do last check on the in question job before
doing hw_reset.

TODO:
1:when a job is considered as guilty, we should mark some flag
in its fence status flag, and let UMD side aware that this
fence signaling is not due to job complete but job hang.

2:if gpu reset cause all video memory lost, we need introduce
a new policy to implement TDR, like drop all jobs not yet
signaled, and all IOCTL on this device will return ERROR
DEVICE_LOST.
this will be implemented later.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset (v2)</title>
<updated>2017-05-24T21:40:37+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2017-05-05T07:09:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4fbf87e2fe472110d8d3f66ffcbfb7fff911c191'/>
<id>4fbf87e2fe472110d8d3f66ffcbfb7fff911c191</id>
<content type='text'>
because we don't want to do sriov-gpu-reset under certain
cases, so just split those two funtion and don't invoke
sr-iov one from bare-metal one.

V2:
remove debugfs_gpu_reset routine on SRIOV case.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
because we don't want to do sriov-gpu-reset under certain
cases, so just split those two funtion and don't invoke
sr-iov one from bare-metal one.

V2:
remove debugfs_gpu_reset routine on SRIOV case.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu:no gpu scheduler for KIQ</title>
<updated>2016-11-11T15:20:58+00:00</updated>
<author>
<name>Trigger Huang</name>
<email>trigger.huang@amd.com</email>
</author>
<published>2016-11-02T09:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e22504425f72cf8a6d43884e4080e534e4919e61'/>
<id>e22504425f72cf8a6d43884e4080e534e4919e61</id>
<content type='text'>
KIQ is used for interaction between driver and
CP, and not exposed to outside client, as such it
doesn't need to be handled by GPU scheduler.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Signed-off-by: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Signed-off-by: Trigger Huang &lt;trigger.huang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KIQ is used for interaction between driver and
CP, and not exposed to outside client, as such it
doesn't need to be handled by GPU scheduler.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Signed-off-by: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Signed-off-by: Trigger Huang &lt;trigger.huang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Backmerge tag 'v4.9-rc4' into drm-next</title>
<updated>2016-11-06T23:37:09+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2016-11-06T23:37:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b624ad8fea1be7ff4c22643e212191aa6a2a3c2'/>
<id>7b624ad8fea1be7ff4c22643e212191aa6a2a3c2</id>
<content type='text'>
Linux 4.9-rc4

This is needed for nouveau development.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Linux 4.9-rc4

This is needed for nouveau development.
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-buf: Rename struct fence to dma_fence</title>
<updated>2016-10-25T12:40:39+00:00</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2016-10-25T12:00:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f54d1867005c3323f5d8ad83eed823e84226c429'/>
<id>f54d1867005c3323f5d8ad83eed823e84226c429</id>
<content type='text'>
I plan to usurp the short name of struct fence for a core kernel struct,
and so I need to rename the specialised fence/timeline for DMA
operations to make room.

A consensus was reached in
https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
that making clear this fence applies to DMA operations was a good thing.
Since then the patch has grown a bit as usage increases, so hopefully it
remains a good thing!

(v2...: rebase, rerun spatch)
v3: Compile on msm, spotted a manual fixup that I broke.
v4: Try again for msm, sorry Daniel

coccinelle script:
@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Gustavo Padovan &lt;gustavo.padovan@collabora.co.uk&gt;
Acked-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I plan to usurp the short name of struct fence for a core kernel struct,
and so I need to rename the specialised fence/timeline for DMA
operations to make room.

A consensus was reached in
https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
that making clear this fence applies to DMA operations was a good thing.
Since then the patch has grown a bit as usage increases, so hopefully it
remains a good thing!

(v2...: rebase, rerun spatch)
v3: Compile on msm, spotted a manual fixup that I broke.
v4: Try again for msm, sorry Daniel

coccinelle script:
@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Gustavo Padovan &lt;gustavo.padovan@collabora.co.uk&gt;
Acked-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix fence slab teardown</title>
<updated>2016-10-24T21:01:22+00:00</updated>
<author>
<name>Grazvydas Ignotas</name>
<email>notasas@gmail.com</email>
</author>
<published>2016-10-23T18:31:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0f10425e811355986907c54f7d1d06703e406092'/>
<id>0f10425e811355986907c54f7d1d06703e406092</id>
<content type='text'>
To free fences, call_rcu() is used, which calls amdgpu_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so amdgpu_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
Acked-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Grazvydas Ignotas &lt;notasas@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To free fences, call_rcu() is used, which calls amdgpu_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so amdgpu_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
Acked-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Grazvydas Ignotas &lt;notasas@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: don't leave dangling pointers around</title>
<updated>2016-09-27T17:00:51+00:00</updated>
<author>
<name>Grazvydas Ignotas</name>
<email>notasas@gmail.com</email>
</author>
<published>2016-09-25T20:34:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=54ddf3a6af537cbfe038f9a2754f26de80c5818b'/>
<id>54ddf3a6af537cbfe038f9a2754f26de80c5818b</id>
<content type='text'>
Right now it's possible to trigger fence_drv.fences[] dereference after
the array has been freed. While the real problem is elsewhere, this still
results in confusing errors that depend on how the freed memory was
reused (I've seen "kernel tried to execute NX-protected page"), it's
better to clear them and get NULL dereference so that it's obvious what's
going wrong.

Signed-off-by: Grazvydas Ignotas &lt;notasas@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Right now it's possible to trigger fence_drv.fences[] dereference after
the array has been freed. While the real problem is elsewhere, this still
results in confusing errors that depend on how the freed memory was
reused (I've seen "kernel tried to execute NX-protected page"), it's
better to clear them and get NULL dereference so that it's obvious what's
going wrong.

Signed-off-by: Grazvydas Ignotas &lt;notasas@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: always signal all fences</title>
<updated>2016-07-14T20:46:07+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2016-07-12T11:57:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2ef004d984226fb6845481bc928ae37fb1dacc5e'/>
<id>2ef004d984226fb6845481bc928ae37fb1dacc5e</id>
<content type='text'>
A little fallout from "drm/amdgpu: sanitize fence numbers", we
sometimes need to signal all fences in the ring.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A little fallout from "drm/amdgpu: sanitize fence numbers", we
sometimes need to signal all fences in the ring.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: sanitize fence numbers</title>
<updated>2016-07-14T20:46:03+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2016-06-24T19:11:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4f399a082866e137cd3c3bdf329f5c7ce1dca5d7'/>
<id>4f399a082866e137cd3c3bdf329f5c7ce1dca5d7</id>
<content type='text'>
Looks like the VCE block sometimes still sends nonsense
fence numbers on startup.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Looks like the VCE block sometimes still sends nonsense
fence numbers on startup.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix bug in fence driver fini</title>
<updated>2016-05-25T13:42:53+00:00</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2016-05-18T08:15:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=48c60c03bcac36bb443f771b5e221a47d501b2ad'/>
<id>48c60c03bcac36bb443f771b5e221a47d501b2ad</id>
<content type='text'>
Using wrong counter for walking fences.  Fixes
a crash when unloading the driver.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using wrong counter for walking fences.  Fixes
a crash when unloading the driver.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
