diff options
Diffstat (limited to 'Documentation/gpu/amdgpu/ring-buffer.rst')
| -rw-r--r-- | Documentation/gpu/amdgpu/ring-buffer.rst | 95 |
1 files changed, 95 insertions, 0 deletions
diff --git a/Documentation/gpu/amdgpu/ring-buffer.rst b/Documentation/gpu/amdgpu/ring-buffer.rst new file mode 100644 index 000000000000..cc642c21316b --- /dev/null +++ b/Documentation/gpu/amdgpu/ring-buffer.rst @@ -0,0 +1,95 @@ +============= + Ring Buffer +============= + +To handle communication between user space and kernel space, AMD GPUs use a +ring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE, +etc.). See the figure below that illustrates how this communication works: + +.. kernel-figure:: ring_buffers.svg + +Ring buffers in the amdgpu work as a producer-consumer model, where userspace +acts as the producer, constantly filling the ring buffer with GPU commands to +be executed. Meanwhile, the GPU retrieves the information from the ring, parses +it, and distributes the specific set of instructions between the different +amdgpu blocks. + +Notice from the diagram that the ring has a Read Pointer (rptr), which +indicates where the engine is currently reading packets from the ring, and a +Write Pointer (wptr), which indicates how many packets software has added to +the ring. When the rptr and wptr are equal, the ring is idle. When software +adds packets to the ring, it updates the wptr, this causes the engine to start +fetching and processing packets. As the engine processes packets, the rptr gets +updates until the rptr catches up to the wptr and they are equal again. + +Usually, ring buffers in the driver have a limited size (search for occurrences +of `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is +that CP (Command Processor) is capable of following addresses inserted into the +ring; this is illustrated in the image by the reference to the IB (Indirect +Buffer). The IB gives userspace the possibility to have an area in memory that +CP can read and feed the hardware with extra instructions. + +All ASICs pre-GFX11 use what is called a kernel queue, which means +the ring is allocated in kernel space and has some restrictions, such as not +being able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11 +and newer support kernel queues, but also provide a new mechanism named +:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space +and can be mapped and unmapped via the scheduler. In practice, both queues +insert user-space-generated GPU commands from different jobs into the requested +component ring. + +Enforce Isolation +================= + +.. note:: After reading this section, you might want to check the + :ref:`Process Isolation<amdgpu-process-isolation>` page for more details. + +Before examining the Enforce Isolation mechanism in the ring buffer context, it +is helpful to briefly discuss how instructions from the ring buffer are +processed in the graphics pipeline. Let’s expand on this topic by checking the +diagram below that illustrates the graphics pipeline: + +.. kernel-figure:: gfx_pipeline_seq.svg + +In terms of executing instructions, the GFX pipeline follows the sequence: +Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan +Converter (SC), Primitive Assembler (PA), and cache manipulation (which may +vary across ASICs). Another common way to describe the pipeline is to use Pixel +Shader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages. +Now, with this pipeline in mind, let's assume that Job B causes a hang issue, +but Job C's instruction might already be executing, leading developers to +incorrectly identify Job C as the problematic one. This problem can be +mitigated on multiple levels; the diagram below illustrates how to minimize +part of this problem: + +.. kernel-figure:: no_enforce_isolation.svg + +Note from the diagram that there is no guarantee of order or a clear separation +between instructions, which is not a problem most of the time, and is also good +for performance. Furthermore, notice some circles between jobs in the diagram +that represent a **fence wait** used to avoid overlapping work in the ring. At +the end of the fence, a cache flush occurs, ensuring that when the next job +starts, it begins in a clean state and, if issues arise, the developer can +pinpoint the problematic process more precisely. + +To increase the level of isolation between jobs, there is the "Enforce +Isolation" method described in the picture below: + +.. kernel-figure:: enforce_isolation.svg + +As shown in the diagram, enforcing isolation introduces ordering between +submissions, since the access to GFX/Compute is serialized, think about it as +single process at a time mode for gfx/compute. Notice that this approach has a +significant performance impact, as it allows only one job to submit commands at +a time. However, this option can help pinpoint the job that caused the problem. +Although enforcing isolation improves the situation, it does not fully resolve +the issue of precisely pinpointing bad jobs, since isolation might mask the +problem. In summary, identifying which job caused the issue may not be precise, +but enforcing isolation might help with the debugging. + +Ring Operations +=============== + +.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c + :internal: + |
