1 files changed, 95 insertions, 0 deletions
diff --git a/Documentation/gpu/amdgpu/ring-buffer.rst b/Documentation/gpu/amdgpu/ring-buffer.rst
new file mode 100644
index 000000000000..cc642c21316b
--- /dev/null
+++ b/Documentation/gpu/amdgpu/ring-buffer.rst
@@ -0,0 +1,95 @@
+=============
+ Ring Buffer
+=============
+
+To handle communication between user space and kernel space, AMD GPUs use a
+ring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE,
+etc.). See the figure below that illustrates how this communication works:
+
+.. kernel-figure:: ring_buffers.svg
+
+Ring buffers in the amdgpu work as a producer-consumer model, where userspace
+acts as the producer, constantly filling the ring buffer with GPU commands to
+be executed. Meanwhile, the GPU retrieves the information from the ring, parses
+it, and distributes the specific set of instructions between the different
+amdgpu blocks.
+
+Notice from the diagram that the ring has a Read Pointer (rptr), which
+indicates where the engine is currently reading packets from the ring, and a
+Write Pointer (wptr), which indicates how many packets software has added to
+the ring. When the rptr and wptr are equal, the ring is idle. When software
+adds packets to the ring, it updates the wptr, this causes the engine to start
+fetching and processing packets. As the engine processes packets, the rptr gets
+updates until the rptr catches up to the wptr and they are equal again.
+
+Usually, ring buffers in the driver have a limited size (search for occurrences
+of `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is
+that CP (Command Processor) is capable of following addresses inserted into the
+ring; this is illustrated in the image by the reference to the IB (Indirect
+Buffer). The IB gives userspace the possibility to have an area in memory that
+CP can read and feed the hardware with extra instructions.
+
+All ASICs pre-GFX11 use what is called a kernel queue, which means
+the ring is allocated in kernel space and has some restrictions, such as not
+being able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11
+and newer support kernel queues, but also provide a new mechanism named
+:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space
+and can be mapped and unmapped via the scheduler. In practice, both queues
+insert user-space-generated GPU commands from different jobs into the requested
+component ring.
+
+Enforce Isolation
+=================
+
+.. note:: After reading this section, you might want to check the
+   :ref:`Process Isolation<amdgpu-process-isolation>` page for more details.
+
+Before examining the Enforce Isolation mechanism in the ring buffer context, it
+is helpful to briefly discuss how instructions from the ring buffer are
+processed in the graphics pipeline. Let’s expand on this topic by checking the
+diagram below that illustrates the graphics pipeline:
+
+.. kernel-figure:: gfx_pipeline_seq.svg
+
+In terms of executing instructions, the GFX pipeline follows the sequence:
+Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan
+Converter (SC), Primitive Assembler (PA), and cache manipulation (which may
+vary across ASICs). Another common way to describe the pipeline is to use Pixel
+Shader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages.
+Now, with this pipeline in mind, let's assume that Job B causes a hang issue,
+but Job C's instruction might already be executing, leading developers to
+incorrectly identify Job C as the problematic one. This problem can be
+mitigated on multiple levels; the diagram below illustrates how to minimize
+part of this problem:
+
+.. kernel-figure:: no_enforce_isolation.svg
+
+Note from the diagram that there is no guarantee of order or a clear separation
+between instructions, which is not a problem most of the time, and is also good
+for performance. Furthermore, notice some circles between jobs in the diagram
+that represent a **fence wait** used to avoid overlapping work in the ring. At
+the end of the fence, a cache flush occurs, ensuring that when the next job
+starts, it begins in a clean state and, if issues arise, the developer can
+pinpoint the problematic process more precisely.
+
+To increase the level of isolation between jobs, there is the "Enforce
+Isolation" method described in the picture below:
+
+.. kernel-figure:: enforce_isolation.svg
+
+As shown in the diagram, enforcing isolation introduces ordering between
+submissions, since the access to GFX/Compute is serialized, think about it as
+single process at a time mode for gfx/compute. Notice that this approach has a
+significant performance impact, as it allows only one job to submit commands at
+a time. However, this option can help pinpoint the job that caused the problem.
+Although enforcing isolation improves the situation, it does not fully resolve
+the issue of precisely pinpointing bad jobs, since isolation might mask the
+problem. In summary, identifying which job caused the issue may not be precise,
+but enforcing isolation might help with the debugging.
+
+Ring Operations
+===============
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+   :internal:
+