diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-11-01 06:28:35 -1000 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-11-01 06:28:35 -1000 |
| commit | 7d461b291e65938f15f56fe58da2303b07578a76 (patch) | |
| tree | 015dd7c2f1743dd70be52787dd9aff33822bc938 /drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | |
| parent | 8bc9e6515183935fa0cccaf67455c439afe4982b (diff) | |
| parent | 631808095a82e6b6f8410a95f8b12b8d0d38b161 (diff) | |
Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c')
| -rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 52 |
1 files changed, 25 insertions, 27 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 25d5fda5b243..b8412202a1b0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -28,6 +28,7 @@ #include "amdgpu.h" #include "amdgpu_gfx.h" #include "amdgpu_dma_buf.h" +#include <drm/ttm/ttm_tt.h> #include <linux/module.h> #include <linux/dma-buf.h> #include "amdgpu_xgmi.h" @@ -164,7 +165,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) */ bitmap_complement(gpu_resources.cp_queue_bitmap, adev->gfx.mec_bitmap[0].queue_bitmap, - KGD_MAX_QUEUES); + AMDGPU_MAX_QUEUES); /* According to linux/bitmap.h we shouldn't use bitmap_clear if * nbits is not compile time constant @@ -172,7 +173,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) last_valid_bit = 1 /* only first MEC can have compute queues */ * adev->gfx.mec.num_pipe_per_mec * adev->gfx.mec.num_queue_per_pipe; - for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i) + for (i = last_valid_bit; i < AMDGPU_MAX_QUEUES; ++i) clear_bit(i, gpu_resources.cp_queue_bitmap); amdgpu_doorbell_get_kfd_info(adev, @@ -467,28 +468,6 @@ uint32_t amdgpu_amdkfd_get_max_engine_clock_in_mhz(struct amdgpu_device *adev) return 100; } -void amdgpu_amdkfd_get_cu_info(struct amdgpu_device *adev, struct kfd_cu_info *cu_info) -{ - struct amdgpu_cu_info acu_info = adev->gfx.cu_info; - - memset(cu_info, 0, sizeof(*cu_info)); - if (sizeof(cu_info->cu_bitmap) != sizeof(acu_info.bitmap)) - return; - - cu_info->cu_active_number = acu_info.number; - cu_info->cu_ao_mask = acu_info.ao_cu_mask; - memcpy(&cu_info->cu_bitmap[0], &acu_info.bitmap[0], - sizeof(cu_info->cu_bitmap)); - cu_info->num_shader_engines = adev->gfx.config.max_shader_engines; - cu_info->num_shader_arrays_per_engine = adev->gfx.config.max_sh_per_se; - cu_info->num_cu_per_sh = adev->gfx.config.max_cu_per_sh; - cu_info->simd_per_cu = acu_info.simd_per_cu; - cu_info->max_waves_per_simd = acu_info.max_waves_per_simd; - cu_info->wave_front_size = acu_info.wave_front_size; - cu_info->max_scratch_slots_per_cu = acu_info.max_scratch_slots_per_cu; - cu_info->lds_size = acu_info.lds_size; -} - int amdgpu_amdkfd_get_dmabuf_info(struct amdgpu_device *adev, int dma_buf_fd, struct amdgpu_device **dmabuf_adev, uint64_t *bo_size, void *metadata_buffer, @@ -704,12 +683,19 @@ err: void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { + enum amd_powergating_state state = idle ? AMD_PG_STATE_GATE : AMD_PG_STATE_UNGATE; /* Temporary workaround to fix issues observed in some * compute applications when GFXOFF is enabled on GFX11. */ - if (IP_VERSION_MAJ(adev->ip_versions[GC_HWIP][0]) == 11) { + if (IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 11) { pr_debug("GFXOFF is %s\n", idle ? "enabled" : "disabled"); amdgpu_gfx_off_ctrl(adev, idle); + } else if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) && + (adev->flags & AMD_IS_APU)) { + /* Disable GFXOFF and PG. Temporary workaround + * to fix some compute applications issue on GFX9. + */ + adev->ip_blocks[AMD_IP_BLOCK_TYPE_GFX].version->funcs->set_powergating_state((void *)adev, state); } amdgpu_dpm_switch_power_profile(adev, PP_SMC_POWER_PROFILE_COMPUTE, @@ -805,11 +791,23 @@ void amdgpu_amdkfd_unlock_kfd(struct amdgpu_device *adev) u64 amdgpu_amdkfd_xcp_memory_size(struct amdgpu_device *adev, int xcp_id) { - u64 tmp; s8 mem_id = KFD_XCP_MEM_ID(adev, xcp_id); + u64 tmp; if (adev->gmc.num_mem_partitions && xcp_id >= 0 && mem_id >= 0) { - tmp = adev->gmc.mem_partitions[mem_id].size; + if (adev->gmc.is_app_apu && adev->gmc.num_mem_partitions == 1) { + /* In NPS1 mode, we should restrict the vram reporting + * tied to the ttm_pages_limit which is 1/2 of the system + * memory. For other partition modes, the HBM is uniformly + * divided already per numa node reported. If user wants to + * go beyond the default ttm limit and maximize the ROCm + * allocations, they can go up to max ttm and sysmem limits. + */ + + tmp = (ttm_tt_pages_limit() << PAGE_SHIFT) / num_online_nodes(); + } else { + tmp = adev->gmc.mem_partitions[mem_id].size; + } do_div(tmp, adev->xcp_mgr->num_xcp_per_mem_partition); return ALIGN_DOWN(tmp, PAGE_SIZE); } else { |
