| Age | Commit message (Collapse) | Author |
|
Pull drm updates from Dave Airlie:
"Highlights:
- new DRM RAS infrastructure using netlink
- amdgpu: enable DC on CIK APUs, and more IP enablement, and more
user queue work
- xe: purgeable BO support, and new hw enablement
- dma-buf : add revocable operations
Full summary:
mm:
- two-pass MMU interval notifiers
- add gpu active/reclaim per-node stat counters
math:
- provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI
- implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST()
rust:
- shared tag with driver-core: register macro and io infra
- core: rework DMA coherent API
- core: add interop::list to interop with C linked lists
- core: add more num::Bounded operations
- core: enable generic_arg_infer and add EMSGSIZE
- workqueue: add ARef<T> support for work and delayed work
- add GPU buddy allocator abstraction
- add DRM shmem GEM helper abstraction
- allow drm:::Device to dispatch work and delayed work items
to driver private data
- add dma_resv_lock helper and raw accessors
core:
- introduce DRM RAS infrastructure over netlink
- add connector panel_type property
- fourcc: add ARM interleaved 64k modifier
- colorop: add destroy helper
- suballoc: split into alloc and init helpers
- mode: provide DRM_ARGB_GET*() macros for reading color components
edid:
- provide drm_output_color_Format
dma-buf:
- provide revoke mechanism for shared buffers
- rename move_notify to invalidate_mappings
- always enable move_notify
- protect dma_fence_ops with RCU and improve locking
- clean pages with helpers
atomic:
- allocate drm_private_state via callback
- helper: use system_percpu_wq
buddy:
- make buddy allocator available to gpu level
- add kernel-doc for buddy allocator
- improve aligned allocation
ttm:
- fix fence signalling
- improve tests and docs
- improve handling of gfp_retry_mayfail
- use per-node stat counters to track memory allocations
- port pool to use list_lru
- drop NUMA specific pools
- make pool shrinker numa aware
- track allocated pages per numa node
coreboot:
- cleanup coreboot framebuffer support
sched:
- fix race condition in drm_sched_fini
pagemap:
- enable THP support
- pass pagemap_addr by reference
gem-shmem:
- Track page accessed/dirty status across mmap/vmap
gpusvm:
- reenable device to device migration
- fix unbalanced unclock
bridge:
- anx7625: Support USB-C plus DT bindings
- connector: Fix EDID detection
- dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve
others
- fsl-ldb: Fix visual artifacts plus related DT property
'enable-termination-resistor'
- imx8qxp-pixel-link: Improve bridge reference handling
- lt9611: Support Port-B-only input plus DT bindings
- tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up
- Support TH1520 HDMI plus DT bindings
- waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus
DT bindings
- anx7625: Fix USB Type-C handling
- cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check
- Support Lontium LT8713SX DP MST bridge plus DT bindings
- analogix_dp: Use DP helpers for link training
panel:
- panel-jdi-lt070me05000: Use mipi-dsi multi functions
- panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN
N116BCL-EAK (C2); Support FriendlyELEC plus DT changes
- panel-edp: Fix timings for BOE NV140WUM-N64
- ilitek-ili9882t: Allow GPIO calls to sleep
- jadard: Support TAIGUAN XTI05101-01A
- lxd: Support LXD M9189A plus DT bindings
- mantix: Fix pixel clock; Clean up
- motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings
- novatek: Support Novatek/Tianma NT37700F plus DT bindings
- simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip
PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3"
- novatek-nt36672a: Use mipi_dsi_*_multi() functions
- panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW
MNF307QS3-2
- support Himax HX83121A plus DT bindings
- support JuTouch JT070TM041 plus DT bindings
- support Samsung S6E8FC0 plus DT bindings
- himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support
backlight
- ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings
- simple: support Tianma TM050RDH03 plus DT bindings
amdgpu:
- enable DC by default on CIK APUs
- userq fence ioctl param size fixes
- set panel_type to OLED for eDP
- refactor DC i2c code
- FAMS2 update
- rework ttm handling to allow multiple engines
- DC DCE 6.x cleanup
- DC support for NUTMEG/TRAVIS DP bridge
- DCN 4.2 support
- GC12 idle power fix for compute
- use struct drm_edid in non-DC code
- enable NV12/P010 support on primary planes
- support newer IP discovery tables
- VCN/JPEG 5.0.2 support
- GC/MES 12.1 updates
- USERQ fixes
- add DC idle state manager
- eDP DSC seamless boot
amdkfd:
- GC 12.1 updates
- non 4K page fixes
xe:
- basic Xe3p_LPG and NVL-P enabling patches
- allow VM_BIND decompress support
- add purgeable buffer object support
- add xe_vm_get_property_ioctl
- restrict multi-lrc to VCS/VECS engines
- allow disabling VM overcommit in fault mode
- dGPU memory optimizations
- Workaround cleanups and simplification
- Allow VFs VRAM quote changes using sysfs
- convert GT stats to per-cpu counters
- pagefault refactors
- enable multi-queue on xe3p_xpc
- disable DCC on PTL
- make MMIO communication more robust
- disable D3Cold for BMG on specific platforms
- vfio: improve FLR sync for Xe VFIO
i915/display:
- C10/C20/LT PHY PLL divider verification
- use trans push mechanism to generate PSR frame change on LNL+
- refactor DP DSC slice config
- VGA decode refactoring
- refactor DPT, gen2-4 overlay, masked field register macro helpers
- refactor stolen memory allocation decisions
- prepare for UHBR DP tunnels
- refactor LT PHY PLL to use DPLL framework
- implement register polling/waiting in display code
- add shared stepping header between i915 and display
i915:
- fix potential overflow of shmem scatterlist length
nouveau:
- provide Z cull info to userspace
- initial GA100 support
- shutdown on PCI device shutdown
nova-core:
- harden GSP command queue
- add support for large RPCs
- simplify GSP sequencer and message handling
- refactor falcon firmware handling
- convert to new register macro
- conver to new DMA coherent API
- use checked arithmetic
- add debugfs support for gsp-rm log buffers
- fix aux device registration for multi-GPU
msm:
- CI:
- Uprev mesa
- Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices
- Core:
- Switched to of_get_available_child_by_name()
- DPU:
- Fixes for DSC panels
- Fixed brownout because of the frequency / OPP mismatch
- Quad pipe preparation (not enabled yet)
- Switched to virtual planes by default
- Dropped VBIF_NRT support
- Added support for Eliza platform
- Reworked alpha handling
- Switched to correct CWB definitions on Eliza
- Dropped dummy INTF_0 on MSM8953
- Corrected INTFs related to DP-MST
- DP:
- Removed debug prints looking into PHY internals
- DSI:
- Fixes for DSC panels
- RGB101010 support
- Support for SC8280XP
- Moved PHY bindings from display/ to phy/
- GPU:
- Preemption support for x2-85 and a840
- IFPC support for a840
- SKU detection support for x2-85 and a840
- Expose AQE support (VK ray-pipeline)
- Avoid locking in VM_BIND fence signaling path
- Fix to avoid reclaim in GPU snapshot path
- Disallow foreign mapping of _NO_SHARE BOs
- HDMI:
- Fixed infoframes programming
- MDP5:
- Dropped support for MSM8974v1
- Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998
panthor:
- add tracepoints for power and IRQs
- fix fence handling
- extend timestamp query with flags
- support various sources for timestamp queries
tyr:
- fix names and model/versions
rockchip:
- vop2: use drm logging function
- rk3576 displayport support
- support CRTC background color
atmel-hlcdc:
- support sana5d65 LCD controller
tilcdc:
- use DT bindings schema
- use managed DRM interfaces
- support DRM_BRIDGE_ATTACH_NO_CONNECTOR
verisilicon:
- support DC8200 + DT bindings
virtgpu:
- support PRIME import with 3D enabled
komeda:
- fix integer overflow in AFBC checks
mcde:
- improve bridge handling
gma500:
- use drm client buffer for fbdev framebuffer
amdxdna:
- add sensors ioctls
- provide NPU power estimate
- support column utilization sensor
- allow forcing DMA through IOMMU IOVA
- support per-BO mem usage queries
- refactor GEM implementation
ivpu:
- update boot API to v3.29.4
- limit per-user number of doorbells/contexts
- perform engine reset on TDR error
loongson:
- replace custom code with drm_gem_ttm_dumb_map_offset()
imx:
- support planes behind the primary plane
- fix bus-format selection
vkms:
- support CRTC background color
v3d:
- improve handling of struct v3d_stats
komeda:
- support Arm China Linlon D6 plus DT bindings
imagination:
- improve power-off sequence
- support context-reset notification from firmware
mediatek:
- mtk_dsi: enable hs clock during pre-enable
- Remove all conflicting aperture devices during probe
- Add support for mt8167 display blocks"
* tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits)
drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc
drm/ttm/tests: fix lru_count ASSERT
drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs
drm/fb-helper: Fix a locking bug in an error path
dma-fence: correct kernel-doc function parameter @flags
ttm/pool: track allocated_pages per numa node.
ttm/pool: make pool shrinker NUMA aware (v2)
ttm/pool: drop numa specific pools
ttm/pool: port to list_lru. (v2)
drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)
mm: add gpu active/reclaim per-node stat counters (v2)
gpu: nova-core: fix missing colon in SEC2 boot debug message
gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing
gpu: nova-core: bitfield: fix broken Default implementation
gpu: nova-core: falcon: pad firmware DMA object size to required block alignment
gpu: nova-core: gsp: fix undefined behavior in command queue code
drm/shmem_helper: Make sure PMD entries get the writeable upgrade
accel/ivpu: Trigger recovery on TDR with OS scheduling
drm/msm: Use of_get_available_child_by_name()
dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir
...
|
|
https://github.com/Rust-for-Linux/linux into rust-next
Pull timekeeping updates from Andreas Hindborg:
- Expand the example section in the 'HrTimer' documentation.
- Mark the 'ClockSource' trait as unsafe to ensure valid values for
'ktime_get()'.
- Add 'Delta::from_nanos()'.
This is a back merge since the pull request has a newer base -- we will
avoid that in the future.
And, given it is a back merge, it happens to resolve the "subtle" conflict
around '--remap-path-{prefix,scope}' that I discussed in linux-next [1],
plus a few other common conflicts. The result matches what we did for
next-20260407.
The actual diffstat (i.e. using a temporary merge of upstream first) is:
rust/kernel/time.rs | 32 ++++-
rust/kernel/time/hrtimer.rs | 336 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 362 insertions(+), 6 deletions(-)
Link: https://lore.kernel.org/linux-next/CANiq72kdxB=W3_CV1U44oOK3SssztPo2wLDZt6LP94TEO+Kj4g@mail.gmail.com/ [1]
* tag 'rust-timekeeping-for-v7.1' of https://github.com/Rust-for-Linux/linux:
hrtimer: add usage examples to documentation
rust: time: make ClockSource unsafe trait
rust/time: Add Delta::from_nanos()
|
|
Following the Rust compiler bump, we can now update Clippy's MSRV we
set in the configuration, which will improve the diagnostics it generates.
Thus do so and clean a few of the `allow`s that are not needed anymore.
Reviewed-by: Tamir Duberstein <tamird@kernel.org>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260405235309.418950-7-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
These were likely copied from the `bindings` and `uapi` crates, but are
unneeded since there are no `cfg(test)`s in the bindings.
In addition, the issue that triggered the addition in those crates
originally is also fixed in `bindgen` (please see the previous commit).
Thus remove them.
Reviewed-by: Tamir Duberstein <tamird@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260405235309.418950-5-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
The SEC2 mailbox debug output formats MBOX1 without a colon separator,
producing "MBOX10xdead" instead of "MBOX1: 0xdead". The GSP debug
message a few lines above uses the correct format.
Fixes: 5949d419c193 ("gpu: nova-core: gsp: Boot GSP")
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260331103744.605683-1-devnexen@gmail.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Clippy fires two clippy::precedence warnings on the manual
byte-shifting expression:
warning: operator precedence can trip the unwary
--> drivers/gpu/nova-core/vbios.rs:511:17
|
511 | / u32::from(data[29]) << 24
512 | | | u32::from(data[28]) << 16
513 | | | u32::from(data[27]) << 8
| |______________________________________________^
Clear the warnings by replacing manual byte-shifting with
u32::from_le_bytes(). Using from_le_bytes() is also a tiny code
improvement, because it uses less code and is clearer about the intent.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260404212831.78971-2-jhubbard@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The current implementation does not actually set the default values for
the fields in the bitfield.
Fixes: 3fa145bef533 ("gpu: nova-core: register: generate correct `Default` implementation")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260401-fix-bitfield-v2-1-2fa68c98114a@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Commit a88831502c8f ("gpu: nova-core: falcon: use dma::Coherent")
dropped the nova-local `DmaObject` device memory type for the
kernel-global `Coherent` one.
This switch had a side-effect: `DmaObject` always aligned the requested
size to `PAGE_SIZE`, and also reported that adjusted size when queried.
`Coherent`, on the other hand, does page-align allocation sizes but only
allows CPU access on the exact size provided by the caller.
This change runs into a limitation of falcon DMA copies, namely that DMA
accesses are done on blocks of exactly 256 bytes. If the provided data
does not have a length that is a multiple of 256, `dma_wr` returns
an error.
It was expected that all firmwares would present the proper adjusted
size, but this is not the case at least on my GA107:
NovaCore 0000:08:00.0: DMA transfer goes beyond range of DMA object
NovaCore 0000:08:00.0: Failed to load FWSEC firmware: EINVAL
NovaCore 0000:08:00.0: probe with driver NovaCore failed with error -22
Fix this by padding the `Coherent`'s size to `MEM_BLOCK_ALIGNMENT` (i.e.
256) when allocating it and filling it with zeroes, before copying the
firmware on top of it.
Fixes: a88831502c8f ("gpu: nova-core: falcon: use dma::Coherent")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260405-falcon-dma-roundup-v2-1-4af5b2ff9c16@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
`driver_read_area` and `driver_write_area` are internal methods that
return slices containing the area of the command queue buffer that the
driver has exclusive read or write access, respectively.
While their returned value is correct and safe to use, internally they
temporarily create a reference to the whole command-buffer slice,
including GSP-owned regions. These regions can change without notice,
and thus creating a slice to them, even if never accessed, is undefined
behavior.
Fix this by making these methods create slices to valid regions only.
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Reported-by: Danilo Krummrich <dakr@kernel.org>
Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260404-cmdq-ub-fix-v5-1-53d21f4752f5@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Factor out a chunk of complexity into a new subroutine. This is an
incremental step in adding ELF32 support to the existing ELF64 section
support, for handling GPU firmware.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260326013902.588242-9-jhubbard@nvidia.com
[acourbot: use fuller prefix in commit message.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Up until now, only the GSP required parsing of its firmware headers.
However, upcoming support for Hopper/Blackwell+ adds another firmware
image (FMC), along with another format (ELF32).
Therefore, the current ELF64 section parsing support needs to be moved
up a level, so that both of the above can use it.
There are no functional changes. This is pure code movement.
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260326013902.588242-8-jhubbard@nvidia.com
[acourbot: use fuller prefix in commit message.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `CoherentBox` that can
fulfill the same role.
Since `CoherentBox` is more flexible than `DmaObject`, we can use the
native `u64` type for page table entries instead of messing with bytes.
The `dma` module becomes unused with that change, so remove it as well.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-7-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `Coherent` that can
fulfill the same role.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-6-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `CoherentHandle` that can
fulfill the same role.
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-5-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `Coherent` that can
fulfill the same role.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-4-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `Coherent` that can
fulfill the same role.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-3-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Replace the nova-core local `DmaObject` with a `Coherent` that can
fulfill the same role.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260327-b4-nova-dma-removal-v2-2-616e1d0b5cb3@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Switch LogBuffer from Coherent<[u8]> (unsized) to
Coherent<[u8; LOG_BUFFER_SIZE]> (sized). The buffer size is a
compile-time constant (RM_LOG_BUFFER_NUM_PAGES * GSP_PAGE_SIZE), so a
fixed-size array is more precise and avoids the need for the runtime
length parameter of zeroed_slice().
Acked-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260325003921.3420-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The kernel's `register` macro would clash with nova-core's own version
if it was imported directly, so it was accessed through its `io` module
during the conversion phase.
Now that nova-core's `register` macro doesn't exist anymore, we can
import and use it directly without risk of name collision.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-9-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all PFALCON, PFALCON2 and PRISCV registers to use the kernel's
register macro and update the code accordingly.
Because they rely on the same types to implement relative registers,
they need to be updated in lockstep.
nova-core's local register macro is now unused, so remove it.
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-8-bdf172f0f6ca@nvidia.com
[acourbot@nvidia.com: remove unused import.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all PDISP registers to use the kernel's register macro and
update the code accordingly.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-7-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all FUSE registers to use the kernel's register macro and update
the code accordingly.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-6-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all GC6 registers to use the kernel's register macro and update
the code accordingly.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-5-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all PFB registers to use the kernel's register macro and update
the code accordingly.
NV_PGSP_QUEUE_HEAD was somehow caught in the PFB section, so move it to
its own section and convert it as well.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-4-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all PBUS registers to use the kernel's register macro and update
the code accordingly.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-3-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Convert all PMC registers to use the kernel's register macro and update
the code accordingly.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-2-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Introduce a powered-up version of our ad-hoc `impl_from_enum_to_u8`
macro that allows the definition of an enum type associated to a
`Bounded` of a given width, and provides the `From` and `TryFrom`
implementations required to use that enum as a register field member.
This allows us to generate the required conversion implementations for
using the kernel register macro and skip some tedious boilerplate.
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260325-b4-nova-register-v4-1-bdf172f0f6ca@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Create read-only debugfs entries for LOGINIT, LOGRM, and LOGINTR, which
are the three primary printf logging buffers from GSP-RM. LOGPMU will
be added at a later date, as it requires support for its RPC message
first.
This patch uses the `pin_init_scope` feature to create the entries.
`pin_init_scope` solves the lifetime issue over the `DEBUGFS_ROOT`
reference by delaying its acquisition until the time the entry is
actually initialized.
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-7-ttabi@nvidia.com
[ Rebase onto Coherent<T> changes. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Create the 'nova_core' root debugfs entry when the driver loads.
Normally, non-const global variables need to be protected by a
mutex. Instead, we use unsafe code, as we know the entry is never
modified after the driver is loaded. This solves the lifetime
issue of the mutex guard, which would otherwise have required the
use of `pin_init_scope`.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-6-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Replace the module_pci_driver! macro with an explicit module
initialization using the standard module! macro and InPlaceModule
trait implementation. No functional change intended, with the
exception that the driver now prints a message when loaded.
This change is necessary so that we can create a top-level "nova_core"
debugfs entry when the driver is loaded.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260319212658.2541610-5-ttabi@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The command-queue structure has a `dma_handle` method that returns the
DMA handle to the memory segment shared with the GSP. This works, but is
not ideal for the following reasons:
- That method is effectively only ever called once, and is technically
an accessor method since the handle doesn't change over time,
- It feels a bit out-of-place with the other methods of `Cmdq` which
only deal with the sending or receiving of messages,
- The method has `pub(crate)` visibility, allowing other driver code to
access this highly-sensitive handle.
Address all these issues by turning `dma_handle` into a struct member
with `pub(super)` visibility. This keeps the method space focused, and
also ensures the member is not visible outside of the modules that need
it.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260319-b4-cmdq-dma-handle-v1-1-57840b4a4f90@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Remove all usages of dma::CoherentAllocation and use the new
dma::Coherent type instead.
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-9-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Convert libos (LibosMemoryRegionInitArgument) and rmargs
(GspArgumentsPadded) to use CoherentBox / Coherent::init() and simplify
the initialization. This also avoids separate initialization on the
stack.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-8-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Convert wpr_meta to use Coherent::init() and simplify the
initialization. It also avoids a separate initialization of
GspFwWprMeta on the stack.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-7-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Wrap `Cmdq`'s mutable state in a new struct `CmdqInner` and wrap that in
a Mutex. This lets `Cmdq` methods take &self instead of &mut self, which
lets required commands be sent e.g. while unloading the driver.
The mutex is held over both send and receive in `send_command` to make
sure that it doesn't get the reply of some other command that could have
been sent just beforehand.
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260318-cmdq-locking-v5-5-18b37e3f9069@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Make `Cmdq` a pinned type. This is needed to use Mutex, which is needed
to add locking to `Cmdq`.
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260318-cmdq-locking-v5-4-18b37e3f9069@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Add type infrastructure to know what reply is expected from each
`CommandToGsp`. Uses a marker type `NoReply` which does not implement
`MessageFromGsp` to mark commands which don't expect a response.
Update `send_command` to wait for a reply and add `send_command_no_wait`
which sends a command that has no reply, without blocking.
This prepares for adding locking to the queue.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260318-cmdq-locking-v5-3-18b37e3f9069@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Remove magic numbers and add a default timeout for callers to use.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260318-cmdq-locking-v5-2-18b37e3f9069@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Fix some inaccuracies / old doc comments.
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260318-cmdq-locking-v5-1-18b37e3f9069@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
We need the latest fixes from drm-rust-fixes in drm-rust-next as well to
build on top of.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The DmaGspMem pointer accessor methods (gsp_write_ptr, gsp_read_ptr,
cpu_read_ptr, cpu_write_ptr, advance_cpu_read_ptr,
advance_cpu_write_ptr) dereference a raw pointer to DMA memory, creating
an intermediate reference before calling volatile read/write methods.
This is undefined behavior since DMA memory can be concurrently modified
by the device.
Fix this by moving the implementations into a gsp_mem module in fw.rs
that uses the dma_read!() / dma_write!() macros, making the original
methods on DmaGspMem thin forwarding wrappers.
An alternative approach would have been to wrap the shared memory in
Opaque, but that would have required even more unsafe code.
Since the gsp_mem module lives in fw.rs (to access firmware-specific
binding field names), GspMem, Msgq and their relevant fields are
temporarily widened to pub(super). This will be reverted once IoView
projections are available.
Cc: Gary Guo <gary@garyguo.net>
Closes: https://lore.kernel.org/nouveau/DGUT14ILG35P.1UMNRKU93JUM1@kernel.org/
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260309225408.27714-1-dakr@kernel.org
[ Use pub(super) where possible; replace bitwise-and with modulo
operator analogous to [1]. - Danilo ]
Link: https://lore.kernel.org/all/20260129-nova-core-cmdq1-v3-1-2ede85493a27@nvidia.com/ [1]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.
Modify the `PteArray` to calculate and write the Page Table Entries
directly into the coherent DMA buffer one-by-one. This reduces the stack
usage quite a lot.
Reported-by: Gary Guo <gary@garyguo.net>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/509436-Nova/topic/.60Cmdq.3A.3Anew.60.20uses.20excessive.20stack.20size/near/570375549
Link: https://lore.kernel.org/rust-for-linux/CANiq72mAQxbRJZDnik3Qmd4phvFwPA01O2jwaaXRh_T+2=L-qA@mail.gmail.com/
Fixes: f38b4f105cfc ("gpu: nova-core: Create initial Gsp")
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
Link: https://patch.msgid.link/20260309-drm-rust-next-v4-4-4ef485b19a4c@proton.me
[ * Use PteArray::entry() in LogBuffer::new(),
* Add TODO comment to use IoView projections once available,
* Add PTE_ARRAY_SIZE constant to avoid duplication.
- Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
As per [1], we need one "use" item per line, in order to reduce merge
conflicts. Furthermore, we need a trailing ", //" in order to tell
rustfmt(1) to leave it alone.
This does that for commands.rs, which is the only file in nova-core that
has any remaining instances of the old style.
[1] https://docs.kernel.org/rust/coding-guidelines.html#imports
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260310021125.117855-7-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
A tiny simplification: now that FbLayout uses its own specific FbRange
type, add an FbRange.len() method, and use that to (very slightly)
simplify the calculation of Frts::frts_size initialization.
Suggested-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260310021125.117855-3-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
For convenience of the reader: now you can directly see the sizes of
each range. It is surprising just how much this helps.
Sample output (using an Ampere GA104):
NovaCore 0000:e1:00.0: FbLayout {
fb: 0x0..0x3ff800000 (16376 MiB),
vga_workspace: 0x3ff700000..0x3ff800000 (1 MiB),
frts: 0x3ff600000..0x3ff700000 (1 MiB),
boot: 0x3ff5fa000..0x3ff600000 (24 KiB),
elf: 0x3fb960000..0x3ff5f9000 (60 MiB),
wpr2_heap: 0x3f3900000..0x3fb900000 (128 MiB),
wpr2: 0x3f3800000..0x3ff700000 (191 MiB),
heap: 0x3f3700000..0x3f3800000 (1 MiB),
vf_partition_count: 0x0,
}
Cc: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Link: https://patch.msgid.link/20260310021125.117855-2-jhubbard@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Add tests for continuation record splitting. They cover boundary
conditions at the split points to make sure the right number of
continuation records are made. They also check that the data
concatenated is correct.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260306-cmdq-continuation-v6-9-cc7b629200ee@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Splits large RPCs if necessary and sends the remaining parts using
continuation records. RPCs that do not need continuation records
continue to write directly into the command buffer. Ones that do write
into a staging buffer first, so there is one copy.
Continuation record for receive is not necessary to support at the
moment because those replies do not need to be read and are currently
drained by retrying `receive_msg` on `ERANGE`.
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260306-cmdq-continuation-v6-8-cc7b629200ee@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Add a default method to `CommandToGsp` which computes the size of a
command.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260306-cmdq-continuation-v6-7-cc7b629200ee@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Unconditionally call the variable length payload code, which is a no-op
if there is no such payload but could defensively catch some coding
errors by e.g. checking that the allocated size is completely filled.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260306-cmdq-continuation-v6-6-cc7b629200ee@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Clarify why using only the first returned slice from allocate_command
for the message headers is okay.
Tested-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260306-cmdq-continuation-v6-5-cc7b629200ee@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|