| Age | Commit message (Collapse) | Author |
|
Commit 37d078e51b4c ("drm/xe/uapi: Split xe_sync types from flags") renamed some DRM_XE_SYNC_*
defines but later commits kept using the old names. Correct them with the new definition.
v2: correct fixes tag and update commit message to explain why (Lucas)
Fixes: 9329f0667215 ("drm/xe/uapi: Use LR abbrev for long-running vms")
Fixes: 4b437893a826 ("drm/xe/uapi: More uAPI documentation additions and cosmetic updates")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Zongyao Bai <zongyao.bai@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250608230133.1250849-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
- Add Task Information for the wedge API
Cross-subsystem Changes:
Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests
Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Expose media OA units (Ashutosh)
Merge:
- Restore GuC submit UAF fix around queue destruction
accidentally removed in a drm-xe-fixes merge (Auld)
Core Changes:
- drm/gpusvm: Introduce devmem_only flag for allocation (Himal)
- drm/gpusvm: Add timeslicing support to GPU SVM (Brost)
Driver Changes:
- Make gem shrinker drm managed (Thomas)
- SRIOV VF Post-migration recovery of GGTT nodes and CTB (Tomasz)
- Some W/A additions and updates (Aradhya, Shekhar, Vinay, Daniele)
- Prefetch Support for svm ranges (Himal, Brost)
- Don't allocate managed BO for each policy change (Michal)
- Simplify and fix diff calculation in GuC submit (Lucas)
- Track FAST_REQ GuC H2Gs to report where errors came from (John)
- SRIOV PF: Don't allow LMEM provisioning if LMTT isn't available (Piotr)
- Check if all domains awake for MOCS dump (Tejas)
- Make creation of SLPC debugfs files conditional (Aradhya)
- Default auto_link_downgrade status to false (Aradhya)
- Use xe_mmio_read32() to read mtcfg register (Shuicheng)
- Updates in PCI ID tables (Atwood, Shekhar)
- SRIOV VF: Fail migration recovery if fixups needed but not supported (Tomasz)
- Add missing documentation around freq and RPa (Rodrigo)
- Some other SVM related fixes (Himal, Auld, Brost, Maarten)
- Allow to trigger GT resets using debugfs writes (Michal)
- Optimise CCS case for WB pages (Auld)
- Create LRC BO without VM (Niranjana)
- Initialize MOCS index early (Bala)
- HWMON fixes for BMG (Karthik, Lucas)
- Drop redundant conversion to bool (Raag)
- Rework eviction rejection of bound external bos (Thomas)
- Stop re-submitting signalled jobs (Auld)
- Small fixes and cleanups for PXP (Daniele)
- Convert some print messages to GT-oriented ones (Michal)
- Resend potentially lost GuC H2G MMIO request (Michal)
- Add configfs to load with fewer engines (Lucas)
- Remove unmatched xe_vm_unlock from __xe_exec_queue_init (Maciej)
- SRIOV VF: Small updates around GGTT handling (Michal)
- Make VMA tile_present, tile_invalidated access rules clear (Brost)
- Xe3 Tuning: Disable NULL query for Anyhit Shader (Nitin)
- Fixes for VF GuC version (Daniele)
- Don't store the xe device pointer inside xe_ttm_tt (Dave)
- Small improvements in topology code (Michal)
- Stop relying on GGTT internals (Maarten)
- GSM size should be constant on most platforms (Roper)
- Reorder 'Get pages failed' message (Brost)
- WA BB related fixes and improvements (Lucas, Brost)
- Fix early wedge on GuC load failure (Daniele)
- Add helper function to inject fault into ct_dead_capture (Satyanarayana)
- Determine ATS / PTA programming during early sw init (Roper)
- Consolidate PAT programming logic for pre-Xe2 and post-Xe2 (Roper)
- Fix kconfig prompt (Lucas)
- Convert xe_pci tests to parametrized tests (Michal)
- Do not kill VM in PT code on -ENODATA (Brost)
- Move LRC_ENGINE_ID_PPHWSP_OFFSET outside of parallel offset (Brost)
- Enable media OA (Ashutosh)
- GuC log level tuning (Lucas)
- Add xe_vm_has_valid_gpu_mapping helper (Brost)
- Opportunistically skip TLB invalidaion on unbind (Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aFMb_NVF_oCW7UVl@intel.com
|
|
On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM)
units. One or more SCMI OAM units might be present on a platform. In
addition there is another OAM unit for global events, called
OAM-SAG. Performance metrics for media workloads can be obtained from these
OAM units, similar to OAG.
Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA
unit without any attached engines.
Bspec: 70819, 67103, 63844, 72572, 74476, 61284
v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for < 12.7 platforms
v3: Return XE_OA_UNIT_INVALID for < 12.7 to indicate no OAM units
v4: Move xe_oa_print_oa_units() to separate patch
v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v6: Introduce DRM_XE_OA_CAPS_OAM
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com
|
|
Introduce a new parameter to the DRM_IVPU_CMDQ_CREATE ioctl,
enabling turbo mode for jobs submitted via the command queue.
Turbo mode allows jobs to run at higher frequencies,
potentially improving performance for demanding workloads.
Also adds the IVPU_TEST_MODE_TURBO_DISABLE flag to allow test
mode to explicitly disable turbo mode requested by the application.
The IVPU_TEST_MODE_TURBO mode has been renamed to
IVPU_TEST_MODE_TURBO_ENABLE for clarity and consistency.
Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250605162001.1237789-1-maciej.falkowski@linux.intel.com
|
|
Backmerging to forward to v6.16-rc1
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Currently, we pick the MMIO offset based on the size of the pgoff_t
type seen by the process that manipulates the FD, such that a 32-bit
process can always map the user MMIO ranges. But this approach doesn't
work well for emulators like FEX, where the emulator is a 64-bit binary
which might be executing 32-bit code. In that case, the kernel thinks
it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT
is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT,
because it can't mmap() anything above the pgoff_t size.
In order to solve that, we need a way to explicitly set the user MMIO
offset from the UMD, such that the kernel doesn't have to guess it
from the TIF_32BIT flag set on user thread. We keep the old behavior
if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called.
Changes in v2:
- Drop the lock/immutable fields and allow SET_USER_MMIO_OFFSET
requests to race with mmap() requests
- Don't do the is_user_mmio_offset test twice in panthor_mmap()
- Improve the uAPI docs
Changes in v3:
- Bump to version 1.5 instead of 1.4 after rebasing
- Add R-bs
- Fix/rephrase comment as suggested by Liviu
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://lore.kernel.org/r/20250606080932.4140010-3-boris.brezillon@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
|
|
drm_panthor_gpu_info::shader_present is currently automatically offset
by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those
constraints don't stand on 32-bit x86 and cause a mismatch when running
an x86 binary in a user emulated environment like FEX. It's also
generally agreed that uAPIs should explicitly pad their struct fields,
which we originally intended to do, but a mistake slipped through during
the submission process, leading drm_panthor_gpu_info::shader_present to
be misaligned.
This uAPI change doesn't break any of the existing users of panthor
which are either arm32 or arm64 where the 64-bit alignment of
u64 fields is already enforced a the compiler level.
Changes in v2:
- Rename the garbage field into pad0 and adjust the comment accordingly
- Add Liviu's A-b
Changes in v3:
- Add R-bs
Fixes: 0f25e493a246 ("drm/panthor: Add uAPI")
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250606080932.4140010-2-boris.brezillon@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
- A couple of vm init fixes (Matt Auld)
- Hwmon fixes (Karthik)
- Drop reduntant conversion to bool (Raag)
- Fix CONFIG_INTEL_VSEC dependency (Arnd)
- Rework eviction rejection of bound external bos (Thomas)
- Stop re-submitting signalled jobs (Matt Auld)
- A couple of pxp fixes (Daniele)
- Add back a fix that got lost in a merge (Matt Auld)
- Create LRC bo without VM (Niranjana)
- Fix for the above fix (Maciej)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aEHq44uIAZwfK-mG@fedora
|
|
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Christian needs a recent drm-next branch to merge fence patches.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Allow UM to label a BO for which it possesses a DRM handle.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250520174634.353267-4-adrian.larumbe@collabora.com
|
|
This adds FOURCCs for 3-plane 10/12/16bit YCbCr formats used by software
decoders like ffmpeg, dav1d and libvpx. The intended use-case is buffer
sharing between decoders and GPUs by allocating buffers with e.g. udmabuf
or dma-heaps, avoiding unnecessary copies and format conversions in
various scenarios.
Unlike formats typically used by hardware decoders the 10/12bit formats
use a LSB alignment. In order to allow fast implementations in GL
and Vulkan the padding must contain only zeros, so the float
representation can be calculated by multiplying with 2^6=64 or 2^4=16
respectively.
MRs or branches for Mesa, Vulkan, Gstreamer, Weston and Mutter can be found at:
- https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303
- https://github.com/rmader/Vulkan-Docs/commits/ycbcr-16bit-lsb-formats/
- https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8540
- https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1753
- https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4348
The naming scheme follows the 'P' and 'Q' formats. The 'S' stands for
'software' and was selected in order to make remembering easy.
The 'Sx16' formats could as well be 'Qx16'. We stick with 'S' as 16bit software
decoders are likely much more common than hardware ones for the foreseeable
future. Note that these formats already have Vulkan equivalents:
- VK_FORMAT_G16_B16_R16_3PLANE_420_UNORM
- VK_FORMAT_G16_B16_R16_3PLANE_422_UNORM
- VK_FORMAT_G16_B16_R16_3PLANE_444_UNORM
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Link: https://lore.kernel.org/r/20250509133535.60330-1-robert.mader@collabora.com
Signed-off-by: Daniel Stone <daniels@collabora.com>
|
|
https://gitlab.freedesktop.org/drm/nova into drm-next
Nova changes for v6.16
auxiliary:
- bus abstractions
- implementation for driver registration
- add sample driver
drm:
- implement __drm_dev_alloc()
- DRM core infrastructure Rust abstractions
- device, driver and registration
- DRM IOCTL
- DRM File
- GEM object
- IntoGEMObject rework
- generically implement AlwaysRefCounted through IntoGEMObject
- refactor unsound from_gem_obj() into as_ref()
- refactor into_gem_obj() into as_raw()
driver-core:
- merge topic/device-context-2025-04-17 from driver-core tree
- implement Devres::access()
- fix: doctest build under `!CONFIG_PCI`
- accessor for Device::parent()
- fix: conditionally expect `dead_code` for `parent()`
- impl TryFrom<&Device> bus devices (PCI, platform)
nova-core:
- remove completed Vec extentions from task list
- register auxiliary device for nova-drm
- derive useful traits for Chipset
- add missing GA100 chipset
- take &Device<Bound> in Gpu::new()
- infrastructure to generate register definitions
- fix register layout of NV_PMC_BOOT_0
- move Firmware into own (Rust) module
- fix: select AUXILIARY_BUS
nova-drm:
- initial driver skeleton (depends on drm and auxiliary bus
abstractions)
- fix: select AUXILIARY_BUS
Rust (dependencies):
- implement Opaque::zeroed()
- implement Revocable::try_access_with()
- implement Revocable::access()
From: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/aCxAf3RqQAXLDhAj@cassiopeiae
|
|
Support new version of HBM.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the initial nova-drm driver skeleton.
nova-drm is connected to nova-core through the auxiliary bus and
implements the DRM parts of the nova driver stack.
For now, it implements the fundamental DRM abstractions, i.e. creates a
DRM device and registers it, exposing a three sample IOCTLs.
DRM_IOCTL_NOVA_GETPARAM
- provides the PCI bar size from the bar that maps the GPUs VRAM
from nova-core
DRM_IOCTL_NOVA_GEM_CREATE
- creates a new dummy DRM GEM object and returns a handle
DRM_IOCTL_NOVA_GEM_INFO
- provides metadata for the DRM GEM object behind a given handle
I implemented a small userspace test suite [1] that utilizes this
interface.
Link: https://gitlab.freedesktop.org/dakr/drm-test [1]
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250424160452.8070-3-dakr@kernel.org
[ Kconfig: depend on DRM=y rather than just DRM. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.16-2025-05-09:
amdgpu:
- IPS fixes
- DSC cleanup
- DC Scaling updates
- DC FP fixes
- Fused I2C-over-AUX updates
- SubVP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
- DCN 2.1 fixes
- DPIA fixes
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- Misc code cleanups
- SR-IOV updates
- RAS updates
- PSP 12 cleanups
amdkfd:
- Update error messages for SDMA
- Userptr updates
drm:
- Add drm_file_err function
dma-buf:
- Add a helper to sort and deduplicate dma_fence arrays
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Linux 6.15-rc5, requested by tzimmerman for fixes required in drm-next.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.16-rc1:
UAPI Changes:
- panthor now fails in mmap_offset call for a BO created with
DRM_PANTHOR_BO_NO_MMAP.
- Add DRM_PANTHOR_BO_SET_LABEL ioctl and label panthor kernel BOs.
Cross-subsystem Changes:
- Add kmap_local_page_try_from_panic for drm/panic.
- Add DT bindings for panels.
- Update DT bindings for imagination.
- Extend %p4cc in lib/vsprintf.c to support fourcc printing.
Core Changes:
- Remove the disgusting turds.
- Register definition updates for DP.
- DisplayID timing blocks refactor.
- Remove now unused mipi_dsi_dsc_write_seq.
- Convert panel drivers to not return error in prepare/enable and
unprepare/disable calls.
Driver Changes:
- Assorted small fixes and featuers for rockchip, panthor, accel/ivpu,
accel/amdxdna, hisilicon/hibmc, i915/backlight, sysfb, accel/qaic,
udl, etnaviv, virtio, xlnx, panel/boe-bf060y8m-aj0, bridge/synopsis,
panthor, panel/samsung/sofef00m, lontium/lt9611uxc, nouveau, panel/himax-hx8279,
panfrost, st7571-i2c.
- Improve hibmc interrupt handling and add HPD support.
- Add NLT NL13676BC25-03F, Tianma TM070JDHG34-00, Himax HX8279/HX8279-D
DDIC, Visionox G2647FB105, Sitronix ST7571 LCD Controller, panels.
- Add zpos, alpha and blend to renesas.
- Convert drivers to use drm_gem_is_imported, replacing gem->import_attach.
- Support TI AM68 GPU in imagination.
- Support panic handler in virtio.
- Add support to get the panel from DP AUX bus in rockchip and add
RK3588 support.
- Make sofef00 only support the sofef00 panel, not another unrelated
one.
- Add debugfs BO dumping support to panthor, and print associated labels.
- Implement heartbeat based hangcheck in ivpu.
- Mass convert drivers to devm_drm_bridge_alloc api.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/e2a958d9-e506-4962-8bae-0dbf2ecc000f@linux.intel.com
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Core Changes:
Fix drm_gpusvm kernel-doc (Lucas)
Driver Changes:
- Release guc ids before cancelling work (Tejas)
- Remove a duplicated pc_start_call (Rodrigo)
- Fix an incorrect assert in previous userptr fixes (Thomas)
- Remove gen11 assertions and prefixes (Lucas)
- Drop sentinels from arg to xe_rtp_process_to_src (Lucas)
- Temporarily disable D3Cold on BMG (Rodrigo)
- Fix MOCS debugfs LNCF readout (Tvrtko)
- Some ring flush cleanups (Tvrtko)
- Use unsigned int for alignment in fb pinning code (Tvrtko)
- Retry and wait longer for GuC PC start (Rodrigo)
- Recognize 3DSTATE_COARSE_PIXEL in LRC dumps (Matt Roper)
- Remove reduntant check in xe_vm_create_ioctl() (Xin)
- A bunch of SRIOV updates (Michal)
- Add stats for SVM page-faults (Francois)
- Fix an UAF (Harish)
- Expose fan speed (Raag)
- Fix exporting xe buffer objects multiple times (Tomasz)
- Apply a workaround (Vinay)
- Simplify pinned bo iteration (Thomas)
- Remove an incorrect "static" keywork (Lucas)
- Add support for separate firmware files on each GT (Lucas)
- Survivability handling fixes (Lucas)
- Allow to inject error in early probe (Lucas)
- Fix unmet direct dependencies warning (Yue Haibing)
- More error injection during probe (Francois)
- Coding style fix (Maarten)
- Additional stats support (Riana)
- Add fault injection for xe_oa_alloc_regs (Nakshrtra)
- Add a BMG PCI ID (Matt Roper)
- Some SVM fixes and preliminary SVM multi-device work (Thomas)
- Switch the migrate code from drm managed to dev managed (Aradhya)
- Fix an out-of-bounds shift when invalidating TLB (Thomas)
- Ensure fixed_slice_mode gets set after ccs_mode change (Niranjana)
- Use local fence in error path of xe_migrate_clear (Matthew Brost)
- More Workarounds (Julia)
- Define sysfs_ops on all directories (Tejas)
- Set power state to D3Cold during s2idle/s3 (Badal)
- Devcoredump output fix (John)
- Avoid plain 64-bit division (Arnd Bergmann)
- Reword a debug message (John)
- Don't print a hwconfig error message when forcing execlists (Stuart)
- Restore an error code to avoid a smatch warning (Rodrigo)
- Invalidate L3 read-only cachelines for geometry streams too (Kenneth)
- Make PPHWSP size explicit in xe_gt_lrc_size() (Gustavo)
- Add GT frequency events (Vinay)
- Fix xe_pt_stage_bind_walk kerneldoc (Thomas)
- Add a workaround (Aradhya)
- Rework pinned save/restore (Matthew Auld, Matthew Brost)
- Allow non-contig VRAM kernel BO (Matthew Auld)
- Support non-contig VRAM provisioning for SRIOV (Matthew Auld)
- Allow scratch-pages for unmapped parts of page-faulting VMs. (Oak)
- Ensure XE_BO_FLAG_CPU_ADDR_MIRROR had a unique value (Matt Roper)
- Fix taking an invalid lock on wedge (Lucas)
- Configs and documentation for survivability mode (Riana)
- Remove an unused macro (Shuicheng)
- Work around a page-fault full error (Matt Brost)
- Enable a SRIOV workaround (John)
- Bump the recommended GuC version (John)
- Allow to drop VRAM resizing (Lucas)
- Don't expose privileged debugfs files if VF (Michal)
- Don't show GGTT/LMEM debugfs files under media GT (Michal)
- Adjust ring-buffer emission for maximum possible size (Tvrtko)
- Fix notifier vs folio lock deadlock (Matthew Auld)
- Stop relying on placement for dma-buf unmap Matthew Auld)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aADWaEFKVmxSnDLo@fedora
|
|
Allow UM to label a BO for which it possesses a DRM handle.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250423021238.1639175-3-adrian.larumbe@collabora.com
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
|
|
Add queue id support to the user queue wait IOCTL
drm_amdgpu_userq_wait structure.
This is required to retrieve the wait user queue and maintain
the fence driver references in it so that the user queue in
the same context releases their reference to the fence drivers
at some point before queue destruction.
Otherwise, we would gather those references until we
don't have any more space left and crash.
v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment.
Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If the queues needs to access TMZ surfaces, it must
be set up as secure.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Allow the user to set a queue priority levels:
0 - normal low - most apps (maps to MES AMD_PRIORITY_LEVEL_NORMAL)
1 - low - background jobs (maps to MES AMD_PRIORITY_LEVEL_LOW)
2 - normal high - apps that need relative high (maps to MES AMD_PRIORITY_LEVEL_MEDIUM)
3 - high (admin only - for compositors) (maps to MES AMD_PRIORITY_LEVEL_HIGH)
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reuse the _pad field for flags.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the frequency returned to the user space by
the DRM_IVPU_PARAM_CORE_CLOCK_RATE GET_PARAM IOCTL.
The kernel driver returned CPU frequency for MTL and bare
PLL frequency for LNL - this was inconsistent and incorrect
for both platforms. With this fix the driver returns maximum
frequency of the NPU data processing unit (DPU) for all HW
generations. This is what user space always expected.
Also do not set CPU frequency in boot params - the firmware
does not use frequency passed from the driver, it was only
used by the early pre-production firmware.
With that we can remove CPU frequency calculation code.
Show NPU frequency in FREQ_CHANGE interrupt when frequency
tracking is enabled.
Fixes: 8a27ad81f7d3 ("accel/ivpu: Split IP and buttress code")
Cc: stable@vger.kernel.org # v6.11+
Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250401155912.4049340-2-maciej.falkowski@linux.intel.com
|
|
This adds the UAPI for the Asahi driver targeting the GPU in the Apple
M1 and M2 series systems on chip. The UAPI design is based on other
modern Vulkan-capable drivers, including Xe and Panthor. Memory
management is based on explicit VM management. Synchronization is
exclusively explicit sync.
This UAPI is validated against our open source Mesa stack, which is
fully conformant to the OpenGL 4.6, OpenGL ES 3.2, OpenCL 3.0, and
Vulkan 1.4 standards. The Vulkan driver supports sparse, exercising the
VM_BIND mechanism.
This patch adds the standalone UAPI header. It is implemented by an open
source DRM driver written in Rust. We fully intend to upstream this
driver when possible. However, as a production graphics driver, it
depends on a significant number of Rust abstractions that will take a
long time to upstream. In the mean time, our userspace is upstream in
Mesa but is not allowed to probe with upstream Mesa as the UAPI is not
yet reviewed and merged in the upstream kernel. Although we ship a
patched Mesa in Fedora Asahi Remix, any containers shipping upstream
Mesa builds are broken for our users, including upstream Flatpak and
Waydroid runtimes. Additionally, it forces us to maintain forks of Mesa
and virglrenderer, which complicates bisects.
The intention in sending out this patch is for this UAPI to be
thoroughly reviewed. Once we as the DRM community are satisfied with the
UAPI, this header lands signifying that the UAPI is stable and must only
be evolved in backwards-compatible ways; it will be the UAPI implemented
in the DRM driver that eventually lands upstream. That promise lets us
enable upstream Mesa, solving all these issues while the upstream Rust
abstractions are developed.
https://github.com/alyssarosenzweig/linux/commits/agx-uapi-v7 contains
the DRM driver implementing this proposed UAPI.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33984 contains
the Mesa patches to implement this proposed UAPI.
That Linux and Mesa branch together give a complete graphics/compute
stack on top of this UAPI.
Co-developed-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Janne Grunau <j@jannau.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Link: https://lore.kernel.org/r/20250408-agx-uapi-v7-1-ad122d4f7324@rosenzweig.io
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
|
Add an INFO query to check if user queues are supported.
v2: switch to a mask of IPs (Marek)
v3: move to drm_amdgpu_info_device (Marek)
Cc: marek.olsak@amd.com
Cc: prike.liang@amd.com
Cc: sunil.khatri@amd.com
Cc: yogesh.mohanmarimuthu@amd.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch adds a new subquery (AMDGPU_INFO_UQ_FW_AREAS) in
AMDGPU_INFO_IOCTL to get the size and alignment of shadow
and csa objects from the FW setup. This information is
required for the userqueue consumers.
V2: Added Alex's suggestions and addressed review comments:
- make this query IP specific (GFX/SDMA etc)
- give a better title (AMDGPU_INFO_UQ_METADATA)
- restructured the code as per sample code shared by Alex
V3: Split the UAPI patch from shadow_size_fn modifications
V4: Addressed review comments from UAPI review (Marek/Pierre-Eric)
- Change the query name to AMDGPU_INFO_UQ_FW_AREAS
- remove unused inpur parameter for AMDGPU_HW_IP*
UAPI link: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/400/
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Arvind Yadav <arvind.yadav@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Modify kernel UAPI userq signal/wait struct field names and
description corresponding to the libdrm UAPI review comments.
libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch fixes some of the pending UAPI review comments
from the libDRM/UAPI review process.
- It updates some outdated comments in the userqueue UAPI header
highlighted during the libdrm UAPI review.
- It removes the GDS BO support which was found unused.
- It also removes the unused flags parameter from the UAPI.
- It also adds a padding variables in userqueue in/out structures.
(Pierre-Eric and Marek)
- clarify comments on top of drm_amdgpu_userq_in
- clarify comment for queue_id (in)
- clarify comment for mqd
- clarify comment for compute MQD size
- clarify comment for queue_id (out)
- remove GDB object from BO object list
- remove the unused flags parameter
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch adds input fences to VM_IOCTL for buffer object.
The kernel will map/unmap the BO only when the fence is signaled.
The UAPI for the same has been approved here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392
V2: Bug fix (Arvind)
V3: Bug fix (Arvind)
V4: Rename UAPI objects as per UAPI review (Marek)
V5: Addressed review comemnts from Christian
- function should return error.
- Add 'TODO' comment
- The input fence should be independent of the operation.
V6: Addressed review comemnts from Christian
- Release the memory allocated by memdup_user().
V7: Addressed review comemnts from Christian
- Drop the debug print and add "return r;" for the error handling.
V11: Rebase
v12: Fix 32-bit holes issue in sturct drm_amdgpu_gem_va.
v13: Fix deadlock issue.
v14: Fix merge conflict.
v15: Fix review comment by renaming syncobj handles.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Drop AMDGPU_USERQ_BO_WRITE as this should not be a global option
of the IOCTL, It should be option per buffer. Hence adding separate
array for read and write BO handles.
v2(Marek):
- Internal kernel details shouldn't be here. This file should only
document the observed behavior, not the implementation .
v3:
- Fix DAL CI clang issue.
v4:
- Added Alex RB to merge the kernel UAPI changes since he has
already approved the amdgpu_drm.h changes.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch updates the VM_IOCTL to allow userspace to synchronize
the mapping/unmapping of a BO in the page table.
The major changes are:
- it adds a drm_timeline object as an input parameter to the VM IOCTL.
- this object is used by the kernel to sync the update of the BO in
the page table during the mapping of the object.
- the kernel also synchronizes the tlb flush of the page table entry of
this object during the unmapping (Added in this series:
https://patchwork.freedesktop.org/series/131276/ and
https://patchwork.freedesktop.org/patch/584182/)
- the userspace can wait on this timeline, and then the BO is ready to
be consumed by the GPU.
The UAPI for the same has been approved here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392
V2:
- remove the eviction fence coupling
V3:
- added the drm timeline support instead of input/output fence
(Christian)
V4:
- made timeline 64-bit (Christian)
- bug fix (Arvind)
V5: GLCTS bug fix (Arvind)
V6: Rename syncobj_handle -> timeline_syncobj_out
Rename point -> timeline_point_in (Marek)
V7: Addressed review comments from Christian:
- do not send last_update fence in case of vm_clear_freed, instead
return the fence from gen_va_update_vm
- move the functions to update bo_mapping to amdgpu_gem.c
- do not use amdgpu_userq_update_vm anymore in userq_create()
V8: Addressed review comments from Christian:
- Split amdgpu_gem_update_bo_mapping function.
- amdgpu_gem_va_update_vm should return stub for error.
V9: Addressed review comments from Christian:
- Rename the function amdgpu_gem_update_timeline_node.
- amdgpu_gem_update_timeline_node should be void function.
- when timeline_point is zero don't allocate a chain and
call drm_syncobj_replace_fence() instead of
drm_syncobj_add_point().
V11: rebase
V12: Fix 32-bit holes issue in sturct drm_amdgpu_gem_va.
V13: Fix the review comment by renaming timeline syncobj (Marek)
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add user fence wait IOCTL timeline syncobj support.
v2:(Christian)
- handle dma_fence_wait() return value.
- shorten the variable name syncobj_timeline_points a bit.
- move num_points up to avoid padding issues.
v3:(Christian)
- Handle timeline drm_syncobj_find_fence() call error
handling
- Use dma_fence_unwrap_for_each() in timeline fence as
there could be more than one fence.
v4:(Christian)
- Drop the first num_fences since fence is always included in
the dma_fence_unwrap_for_each() iteration, when fence != f
then fence is most likely just a container.
v5: Added Alex RB to merge the kernel UAPI changes since he has
already approved the amdgpu_drm.h changes.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add UAPI header support for userqueue Secure semaphore
v2: Worked on review comments from Christian for the following
modifications
- Add bo handles, bo flags and padding fields.
- Include value/va in a combined array.
v3: Worked on review comments from Christian
- Add num_fences field to obtain the number of objects required
to allocate memory for userq_fence_info.
- Replace obj_handle name with syncobj_handle.
- Replace point name with syncobj_point.
- Replace count_handles name with num_syncobj_handles.
- Fix structure padding related issues.
v4: Worked on review comments from Christian
- Modify the bo flags description.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.
V9: Patch introduced
V10: Add custom IP specific mqd strcuture for compute (Alex)
V11: Rename drm_amdgpu_userq_mqd_compute_gfx_v11 to
drm_amdgpu_userq_mqd_compute_gfx11 (Marek)
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.
V9: introduced this patch in the series
V10: use header file instead of extern (Alex)
V11: rename drm_amdgpu_userq_mqd_sdma_gfx_v11 to
drm_amdgpu_userq_mqd_sdma_gfx11 (Marek)
Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch enables GFX-v11 IP support in the usermode queue base
code. It typically:
- adds a GFX_v11 specific MQD structure
- sets IP functions to create and destroy MQDs
- sets MQD objects coming from userspace
V10: introduced this spearate patch for GFX V11 enabling (Alex).
V11: Addressed review comments:
- update the comments in GFX mqd structure informing user about using
the INFO IOCTL for object sizes (Alex)
- rename struct drm_amdgpu_userq_mqd_gfx_v11 to
drm_amdgpu_userq_mqd_gfx11 (Marek)
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.
This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.
V2: Addressed review comments from Alex and Christian
- Make the doorbell offset's comment clearer
- Change the output parameter name to queue_id
V3: Integration with doorbell manager
V4:
- Updated the UAPI doc (Pierre-Eric)
- Created a Union for engine specific MQDs (Alex)
- Added Christian's R-B
V5:
- Add variables for GDS and CSA in MQD structure (Alex)
- Make MQD data a ptr-size pair instead of union (Alex)
V9:
- renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
drm_amdgpu_userq_mqd as its being used for SDMA and
compute queues as well
V10:
- keeping the drm_amdgpu_userq_mqd IP independent, moving the
_gfx_v11 objects in a separate structure in other patch.
(Alex)
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a
build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.
On xe2 and xe3, out of bound prefetch can cause page fault and further
system hang because xekmd can't resolve such page fault. SYCL and OCL
language runtime requires out of bound prefetch to be silently dropped
without causing any functional problem, thus the existing behavior
doesn't meet language runtime requirement.
At the same time, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.
Fix those problems by allowing scratch page under fault mode for xe2 and
xe3. With scratch page in place, HW prefetching could always hit scratch
page instead of causing interrupt.
A side effect is, scratch page could hide application program error.
Application out of bound accesses are hided by scratch page mapping,
instead of get reported to user.
v2: Refine commit message (Thomas)
v3: Move the scratch page flag check to after scratch page wa (Thomas)
v4: drop NEEDS_SCRATCH macro (matt)
Add a comment to DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250403165328.2438690-4-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
|
|
Add support for exporting a dma_fence fd for a specific point on a
timeline. This is needed for vtest/vpipe[1][2] to implement timeline
syncobj support, as it needs a way to turn a point on a timeline back
into a dma_fence fd. It also closes an odd omission from the syncobj
UAPI.
[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33433
[2] https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/805
v2: Add DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_TIMELINE
v3: Add unstaged uabi header hunk
v4: Also handle IMPORT_SYNC_FILE case
v5: Address comments from Dmitry
v6: checkpatch.pl nits
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250401155758.48855-1-robdclark@gmail.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Since the context-type additions to the virtio-gpu spec, these have been
defined locally in guest user-space, and virtio-gpu backend library code.
Now, these capsets have been stabilized, and should be defined in a
common space, in both the virtio_gpu header, and alongside the virtgpu_drm
interface that they apply to.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Aaron Ruby <aruby@qnx.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
[dmitry.osipenko@collabora.com: edit commit title]
Link: https://patchwork.freedesktop.org/patch/msgid/YT3PR01MB5857E808EDF6949F2DF517FDAFA12@YT3PR01MB5857.CANPRD01.PROD.OUTLOOK.COM
|
|
Apple GPUs support non-linear "GPU-tiled" image layouts. Add modifiers
for these layouts. Mesa requires these modifiers to share non-linear
buffers across processes, but no other userspace or kernel support is
required/expected.
These layouts are notably not used for interchange across hardware
blocks (e.g. with the display controller). There are other layouts for
that but we don't support them either in userspace or kernelspace yet
(even downstream), so we don't add modifiers here.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://patchwork.freedesktop.org/patch/msgid/20250310-apple-twiddled-modifiers-v4-1-1ccac9544808@rosenzweig.io
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
|
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag,
which indicates whether the device supports CPU address mirroring. The
intent is for UMDs to use this query to determine if a VM can be set up
with CPU address mirroring. This flag is implemented by checking if the
device supports GPU faults.
v7:
- Only report enabled if CONFIG_DRM_GPUSVM is selected (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com
|
|
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.
CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.
It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.
Expected usage:
- Bind the entire virtual address (VA) space upon program load using the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
allocate a CPU address using mmap(PROT_NONE), bind the BO to the
mmapped address using existing bind IOCTLs. If a CPU map of the BO is
needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
space and them bind the mapping address with the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
remove GPU mappings.
Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.
This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.
v3:
- Call vm_bind_ioctl_ops_fini on -ENODATA
- Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
- s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
- Rework commit message for expected usage (Thomas)
- Describe state of code after patch in commit message (Thomas)
v4:
- Fix alignment (Checkpatch)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
|
|
Allow user to provide a low latency hint. When set, KMD sends a hint
to GuC which results in special handling for that process. SLPC will
ramp the GT frequency aggressively every time it switches to this
process.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to processes that set this bit during process
creation.
Improvement with this approach as below:
Before,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 283.16 us
After,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 63.38 us
Compute PR: https://github.com/intel/compute-runtime/pull/794
Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214
IGT PR: https://patchwork.freedesktop.org/patch/639989/
V10(Lucas):
- Remove doc from drm-uapi.rst
v9(Vinay):
- remove extra line, align commit message
v8(Vinay):
- Add separate example for using low latency hint
v7(Jose):
- Update UMD PR
- applicable to all gpus
V6:
- init flags, remove redundant flags check (MAuld)
V5:
- Move uapi doc to documentation and GuC ABI specific change (Rodrigo)
- Modify logic to restrict exec queue flags (MAuld)
V4:
- To make it clear, dont use exec queue word (Vinay)
- Correct typo in description of flag (Jose/Vinay)
- rename set_strategy api and replace ctx with exec queue(Vinay)
- Start with 0th bit to indentify user flags (Jose)
V3:
- Conver user flag to kernel internal flag and use (Oak)
- Support query config for use to check kernel support (Jose)
- Dont need to take runtime pm (Vinay)
V2:
- DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon)
- Add motivation to description (Lucas)
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
Sync to fix conlicts between drm-xe-next and drm-intel-next.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Add mmap support for PCI memory barrier (Tejas, Matthew Auld)
- Enable integration with perf pmu, exposing event counters: for now, just
GT C6 residency (Vinay, Lucas)
- Add "survivability mode" to allow putting the driver in a state capable of
firmware upgrade on critical failures (Riana, Rodrigo)
- Add PXP HWDRM support and enable for compatible platforms:
Meteor Lake and Lunar Lake (Daniele, John Harrison)
- Expose package and vram temperature over hwmon subsystem (Raag, Badal, Rodrigo)
Cross-subsystem Changes:
- Backmege drm-next to synchronize with i915 display and other internal APIs
Display Changes (including i915):
- Device probe re-order to help with flicker-free boot (Maarten)
- Align watermark, hpd and dsm with i915 (Rodrigo)
- Better abstraction for d3cold (Rodrigo)
Driver Changes:
- Make sure changes to ccs_mode is with helper for gt sync reset (Maciej)
- Drop mmio_ext abstraction since it didn't prove useful in its current form
(Matt Roper)
- Reject BO eviction if BO is bound to current VM (Oak, Thomas Hellström)
- Add GuC Power Conservation debugfs (Rodrigo)
- L3 cache topology updates for Xe3 (Francois, Matt Atwood)
- Better logging about missing GuC logs (John Harrison)
- Better logging for hwconfig-related data availability (John Harrison)
- Tracepoint updates for xe_bo_create, xe_vm and xe_vma (Oak)
- Add missing SPDX licenses (Francois)
- Xe suballocator imporovements (Michal Wajdeczko)
- Improve logging for native vs SR-IOV driver mode (Satyanarayana)
- Make sure VF bootstrap is not attempted in execlist mode (Maarten)
- Add GuC Buffer Cache abstraction for some CTB H2G actions and use
during VF provisioning (Michal Wajdeczko)
- Better synchronization in gtidle for new users (Vinay)
- New workarounds for Panther Lake (Nirmoy, Vinay)
- PCI ID updates for Panther Lake (Matt Atwood)
- Enable SR-IOV for Panther Lake (Michal Wajdeczko)
- Update MAINTAINERS to stop directing xe changes to drm-misc (Lucas)
- New PCI IDs for Battle Mage (Shekhar)
- Better pagefault logging (Francois)
- SR-IOV fixes and refactors for past and new platforms (Michal Wajdeczko)
- Platform descriptor refactors and updates (Sai Teja)
- Add gt stats debugfs (Francois)
- Add guc_log debugfs to dump to dmesg (Lucas)
- Abstract per-platform LMTT availability (Piotr Piórkowski)
- Refactor VRAM manager location (Piotr Piórkowski)
- Add missing xe_pm_runtime_put when forcing wedged mode (Shuicheng)
- Fix possible lockup when forcing wedged mode (Xin Wang)
- Probe refactors to use cleanup actions with better error handling (Lucas)
- XE_IOCTL_DBG clarification for userspace (Maarten)
- Better xe_mmio initialization and abstraction (Ilia)
- Drop unnecessary GT lookup (Matt Roper)
- Skip client engine usage from fdinfo for VFs (Marcin Bernatowicz)
- Allow to test xe_sync_entry_parse with error injection (Priyanka)
- OA fix for polled read (Umesh)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/m3gbuh32wgiep43i4zxbyhxqbenvtgvtao5sczivlasj7tikwv@dmlba4bfg2ny
|