diff options
author | Chris Wilson <chris@chris-wilson.co.uk> | 2016-07-01 17:23:21 +0100 |
---|---|---|
committer | Chris Wilson <chris@chris-wilson.co.uk> | 2016-07-01 21:00:53 +0100 |
commit | f8973c217f07903247d222ab92ad37e2529aff2e (patch) | |
tree | 1bd2deb36cc05c8f1194ee8cb1c4b18b089c72a6 /drivers/gpu/drm/i915/i915_irq.c | |
parent | 7d5ea80720a8e53ff4ea309708cef2a7a0e163c7 (diff) |
drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.
Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.
Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk
Diffstat (limited to 'drivers/gpu/drm/i915/i915_irq.c')
-rw-r--r-- | drivers/gpu/drm/i915/i915_irq.c | 10 |
1 files changed, 3 insertions, 7 deletions
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 7c379afcff2f..be7f0b9b27e0 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1264,8 +1264,7 @@ static void ivybridge_parity_error_irq_handler(struct drm_i915_private *dev_priv static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv, u32 gt_iir) { - if (gt_iir & - (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT)) + if (gt_iir & GT_RENDER_USER_INTERRUPT) notify_ring(&dev_priv->engine[RCS]); if (gt_iir & ILK_BSD_USER_INTERRUPT) notify_ring(&dev_priv->engine[VCS]); @@ -1274,9 +1273,7 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv, static void snb_gt_irq_handler(struct drm_i915_private *dev_priv, u32 gt_iir) { - - if (gt_iir & - (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT)) + if (gt_iir & GT_RENDER_USER_INTERRUPT) notify_ring(&dev_priv->engine[RCS]); if (gt_iir & GT_BSD_USER_INTERRUPT) notify_ring(&dev_priv->engine[VCS]); @@ -3601,8 +3598,7 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev) gt_irqs |= GT_RENDER_USER_INTERRUPT; if (IS_GEN5(dev)) { - gt_irqs |= GT_RENDER_PIPECTL_NOTIFY_INTERRUPT | - ILK_BSD_USER_INTERRUPT; + gt_irqs |= ILK_BSD_USER_INTERRUPT; } else { gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT; } |