summaryrefslogtreecommitdiff
path: root/tools/testing/cxl
AgeCommit message (Collapse)Author
2025-11-18Merge branch 'for-6.19/cxl-misc' into cxl-for-nextDave Jiang
- Remove ret_limit race condition in mock_get_event() - Assign overflow_err_count from log->nr_overflow
2025-11-18cxl/test: Assign overflow_err_count from log->nr_overflowAlison Schofield
mock_get_event() uses an uninitialized local variable, nr_overflow, to populate the overflow_err_count field. That results in incorrect overflow_err_count values in mocked cxl_overflow trace events, such as this case where the records are reported as 0 and should be non-zero: [] cxl_overflow: memdev=mem7 host=cxl_mem.6 serial=7: log=Failure : 0 records from 1763228189130895685 to 1763228193130896180 Fix by using log->nr_overflow and remove the unused local variable. A follow-up change was considered in cxl_mem_get_records_log() to confirm that the overflow_err_count is non-zero when the overflow flag is set [1]. Since the driver has no functional dependency on this constraint, and a device that violates this specific requirement does not cause incorrect driver behavior, no validation check is added. [1] CXL 3.2, Table 8-65 Get Event Records Output Payload Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com>> --- Link: https://patch.msgid.link/20251116013036.1713313-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-18cxl/test: Remove ret_limit race condition in mock_get_event()Alison Schofield
Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the loop iterator in mock_get_event() from a static constant, CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The intent was to vary the number of events returned per call to simulate events occurring while logs are being read. However, ret_limit is modified without synchronization. When multiple threads call mock_get_event() concurrently, one thread may read ret_limit, another thread may increment it, and the first thread's loop condition and size calculation see and use the updated value. This is visible during cxl_test module load when all memdevs are initializing simultaneously, which includes getting event records. It is not tied to the cxl-events.sh unit test specifically, as that operates on a single memdev. While no actual harm results (the buffer is always large enough and the record count fields correctly reflect what was written), this is a correctness issue. The race creates an inconsistent state within mock_get_event() and adding variability based on a race appears unintended. Make ret_limit a local variable populated from an atomic counter. Each call gets a stable value that won't change during execution. That preserves the intended behavior of varying the return counts across calls while eliminating the race condition. This implementation uses "+ 1" to produce the full range of 1 to CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were produced. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com>> --- Link: https://patch.msgid.link/20251116013819.1713780-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-18Merge branch 'for-6.19/cxl-misc' into cxl-for-nextDave Jiang
- remove unused mock function for cxl_rcd_component_reg_phys()
2025-11-18cxl/test: remove unused mock function for cxl_rcd_component_reg_phys()Alejandro Lucero
Since commit 733b57f262b0 ("cxl/pci: Early setup RCH dport component registers from RCRB") is not necessary under mocking tests. [ dj: Fixup commit representation flagged by checkpatch. ] [ dj: Ammend subject line to indicate which function. ] Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com>> --- Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20251118182202.2083244-1-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17Merge branch 'for-6.19/cxl-elc-test' into cxl-for-nextDave Jiang
Extended linear cache unit testing support - Standardize CXL auto region size - Add cxl_test CFMWS support for extended linear cache - Add support for acpi extended linear cache
2025-11-17cxl/test: Add support for acpi extended linear cacheDave Jiang
Add the mock wrappers for hmat_get_extended_linear_cache_size() in order to emulate the ACPI helper function for the regions that are mock'd by cxl_test. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20251117144611.903692-4-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17cxl/test: Add cxl_test CFMWS support for extended linear cacheDave Jiang
Add a module parameter to allow activation of extended linear cache on the auto region for cxl_test. The current platform implementation for extended linear cache is 1:1 of DRAM and CXL memory. A CFMWS is created with the size of both memory together where DRAM takes the first part of the memory range and CXL covers the second part. The current CXL auto region on cxl_test consists of 2 256M devices that creates a 512M region. The new extended linear cache setup will have 512M DRAM and 512M CXL memory for a total of 1G CFMWS. The hardware decoders must have their starting offset moved to after the DRAM region to handle the CXL regions. [ dj: Fixup commenting style. (Jonathan) ] Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20251117144611.903692-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-17cxl/test: Standardize CXL auto region sizeDave Jiang
Create a global define for the size of the mock CXL auto region used in cxl_test. Remove the declared size in mock_init_hdm_decoder() function. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20251117144611.903692-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-13Merge branch 'for-6.19/cxl-addr-xlat' into cxl-for-nextDave Jiang
Enable unit testing for XOR address translation of SPA to DPA and vice versa.
2025-11-03cxl/test: Add cxl_translate module for address translation testingAlison Schofield
Add a loadable test module that validates CXL address translation calculations using parameterized test vectors. The module tests both host-to-device and device-to-host address translations for Modulo and XOR interleave arithmetic. Two types of testing are provided: 1. Parameterized test vectors: Test vectors are passed as module parameters in the format: "dpa pos r_eiw r_eig hb_ways math expected_spa". Round-trip validation is performed: - Translate a DPA and position to a SPA - Verify the result matches expected SPA - Translate that SPA back to a DPA and position - Verify round-trip consistency 2. Internal validation testing: When no test vectors are provided, the module performs validation of the translation functions by checking parameter boundaries and running 10,000 iterations of randomly generated valid parameters to exercise the core calculation functions. The module uses the CXL Driver translation functions through symbols exported exclusively for cxl_translate. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-03cxl/port: Remove devm_cxl_port_enumerate_dports()Li Ming
devm_cxl_port_enumerate_dports() is not longer used after below commit commit 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports") Delete it and the relevant interface implemented in cxl_test. Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18Merge branch 'for-6.18/cxl-delay-dport' into cxl-for-nextDave Jiang
Add changes to delay the allocation and setup of dports until when the endpoint device is being probed. At this point, the CXL link is established from endpoint to host bridge. Addresses issues seen on some platforms when dports are probed earlier. Link: https://lore.kernel.org/linux-cxl/20250829180928.842707-1-dave.jiang@intel.com/
2025-09-18cxl/test: Setup target_map for cxl_test decoder initializationDave Jiang
cxl_test uses mock functions for decoder enumaration. Add initialization of the cxld->target_map[] for cxl_test based decoders in the mock functions. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18cxl/test: Adjust the mock version of devm_cxl_switch_port_decoders_setup()Dave Jiang
With devm_cxl_switch_port_decoders_setup() being called within cxl_core instead of by the port driver probe, adjustments are needed to deal with circular symbol dependency when this function is being mock'd. Add the appropriate changes to get around the circular dependency. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18cxl/test: Add mock version of devm_cxl_add_dport_by_dev()Dave Jiang
devm_cxl_add_dport_by_dev() outside of cxl_test is done through PCI hierarchy. However with cxl_test, it needs to be done through the platform device hierarchy. Add the mock function for devm_cxl_add_dport_by_dev(). When cxl_core calls a cxl_core exported function and that function is mocked by cxl_test, the call chain causes a circular dependency issue. Dan provided a workaround to avoid this issue. Apply the method to changes from the late dport allocation changes in order to enable cxl-test. In cxl_core they are defined with "__" added in front of the function. A macro is used to define the original function names for when non-test version of the kernel is built. A bit of macros and typedefs are used to allow mocking of those functions in cxl_test. Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Robert Richter <rrichter@amd.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-18cxl/test: Refactor decoder setup to reduce cxl_test burdenDave Jiang
Group the decoder setup code in switch and endpoint port probe into a single function for each to reduce the number of functions to be mocked in cxl_test. Introduce devm_cxl_switch_port_decoders_setup() and devm_cxl_endpoint_decoders_setup(). These two functions will be mocked instead with some functions optimized out since the mock version does not do anything. Remove devm_cxl_setup_hdm(), devm_cxl_add_passthrough_decoder(), and devm_cxl_enumerate_decoders() in cxl_test mock code. In turn, mock_cxl_add_passthrough_decoder() can be removed since cxl_test does not setup passthrough decoders. __wrap_cxl_hdm_decode_init() and __wrap_cxl_dvsec_rr_decode() can be removed as well since they only return 0 when called. [dj: drop 'struct cxl_port' forward declaration (Robert)] Suggested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-17cxl: Add a cached copy of target_map to cxl_decoderDave Jiang
Add a cached copy of the hardware port-id list that is available at init before all @dport objects have been instantiated. Change is in preparation of delayed dport instantiation. Reviewed-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-09-10cxl/acpi: Rename CFMW coherency restrictionsDavidlohr Bueso
ACPICA commit 710745713ad3a2543dbfb70e84764f31f0e46bdc This has been renamed in more recent CXL specs, as type3 (memory expanders) can also use HDM-DB for device coherent memory. Link: https://github.com/acpica/acpica/commit/710745713ad3a2543dbfb70e84764f31f0e46bdc Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20250908160034.86471-1-dave@stgolabs.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-08-01Merge tag 'cxl-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: "The most significant changes in this pull request is the series that introduces ACQUIRE() and ACQUIRE_ERR() macros to replace conditional locking and ease the pain points of scoped_cond_guard(). The series also includes follow on changes that refactor the CXL sub-system to utilize the new macros. Detail summary: - Add documentation template for CXL conventions to document CXL platform quirks - Replace mutex_lock_io() with mutex_lock() for mailbox - Add location limit for fake CFMWS range for cxl_test, ARM platform enabling - CXL documentation typo and clarity fixes - Use correct format specifier for function cxl_set_ecs_threshold() - Make cxl_bus_type constant - Introduce new helper cxl_resource_contains_addr() to check address availability - Fix wrong DPA checking for PPR operation - Remove core/acpi.c and CXL core dependency on ACPI - Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks - Add CXL updates utilizing ACQUIRE() macro to remove gotos and improve readability - Add return for the dummy version of cxl_decoder_detach() without CONFIG_CXL_REGION - CXL events updates for spec r3.2 - Fix return of __cxl_decoder_detach() error path - CXL debugfs documentation fix" * tag 'cxl-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (28 commits) Documentation/ABI/testing/debugfs-cxl: Add 'cxl' to clear_poison path cxl/region: Fix an ERR_PTR() vs NULL bug cxl/events: Trace Memory Sparing Event Record cxl/events: Add extra validity checks for CVME count in DRAM Event Record cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record cxl/events: Update Common Event Record to CXL spec rev 3.2 cxl: Fix -Werror=return-type in cxl_decoder_detach() cleanup: Fix documentation build error for ACQUIRE updates cxl: Convert to ACQUIRE() for conditional rwsem locking cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach() cxl/region: Move ready-to-probe state check to a helper cxl/region: Split commit_store() into __commit() and queue_reset() helpers cxl/decoder: Drop pointless locking cxl/decoder: Move decoder register programming to a helper cxl/mbox: Convert poison list mutex to ACQUIRE() cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks cxl: Remove core/acpi.c and cxl core dependency on ACPI cxl/core: Using cxl_resource_contains_addr() to check address availability cxl/edac: Fix wrong dpa checking for PPR operation cxl/core: Introduce a new helper cxl_resource_contains_addr() ...
2025-07-15cxl: Remove core/acpi.c and cxl core dependency on ACPIRobert Richter
From Dave [1]: """ It was a mistake to introduce core/acpi.c and putting ACPI dependency on cxl_core when adding the extended linear cache support. """ Current implementation calls hmat_get_extended_linear_cache_size() of the ACPI subsystem. That external reference causes issue running cxl_test as there is no way to "mock" that function and ignore it when using cxl test. Instead of working around that using cxlrd ops and extensively expanding cxl_test code [1], just move HMAT calls out of the core module to cxl_acpi. Implement this by adding a @cache_size member to struct cxl_root_decoder. During initialization the cache size is determined and added to the root decoder object in cxl_acpi. Later on in cxl_core the cache_size parameter is used to setup extended linear caching. [1] https://patch.msgid.link/20250610172938.139428-1-dave.jiang@intel.com [ dj: Remove core/acpi.o from tools/testing/cxl/Kbuild ] [ dj: Add kdoc for cxlrd->cache_size ] Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20250711151529.787470-1-rrichter@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-04cxl/test: Simplify fw_buf_checksum_show()Eric Biggers
First, just use sha256() instead of a sequence of sha256_init(), sha256_update(), and sha256_final(). The result is the same. Second, use *phN instead of open-coding the conversion of bytes to hex. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30cxl_test: Limit location for fake CFMWS to mappable rangeJonathan Cameron
Some architectures (e.g. arm64) only support memory hotplug operations on a restricted set of physical addresses. This applies even when we are faking some CXL fixed memory windows for the purposes of cxl_test. That range can be queried with mhp_get_pluggable_range(true). Use the minimum of that the top of that range and iomem_resource.end to establish the 64GiB region used by cxl_test. From thread #2 which was related to the issue in #1. [ dj: Add CONFIG_MEMORY_HOTPLUG config check, from Alison ] Link: https://lore.kernel.org/linux-cxl/20250522145622.00002633@huawei.com/ #2 Reported-by: Itaru Kitayama <itaru.kitayama@linux.dev> Closes: https://github.com/pmem/ndctl/issues/278 #1 Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com <mailto:itaru.kitayama@fujitsu.com> Tested-by: Marc Herbert <marc.herbert@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250527153451.82858-1-Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-23Merge branch 'for-6.16/cxl-features-ras' into cxl-for-nextDave Jiang
Add CXL RAS Features support. Features include "patrol scrub control", "error check scrub", "perform maintenance", and "memory sparing". This support connects the RAS Featurs to EDAC.
2025-05-23cxl/edac: Add CXL memory device patrol scrub control featureShiju Jose
CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub control feature. The device patrol scrub proactively locates and makes corrections to errors in regular cycle. Allow specifying the number of hours within which the patrol scrub must be completed, subject to minimum and maximum limits reported by the device. Also allow disabling scrub allowing trade-off error rates against performance. Add support for patrol scrub control on CXL memory devices. Register with the EDAC device driver, which retrieves the scrub attribute descriptors from EDAC scrub and exposes the sysfs scrub control attributes to userspace. For example, scrub control for the CXL memory device "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/. Additionally, add support for region-based CXL memory patrol scrub control. CXL memory regions may be interleaved across one or more CXL memory devices. For example, region-based scrub control for "cxl_region1" is exposed in /sys/bus/edac/devices/cxl_region1/scrubX/. [dj: A few formatting fixes from Jonathan] Reviewed-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-06cxl/test: Address missing MODULE_DESCRIPTION warnings for cxl_testDave Jiang
Add MODULE_DESCRIPTION() to address the following warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_test.o WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_mock.o WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_mock_mem.o [dj: s/CXL test/cxl_test:/ per djbw's comment] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250429235953.4175408-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-04-18cxl: Fix devm host device for CXL fwctl initializationDave Jiang
Testing revealed the following error message for a CXL memdev that has Feature support: [ 56.690430] cxl mem0: Resources present before probing Attach the allocation of cxl_fwctl to the parent device of cxl_memdev. devm_add_* calls for cxl_memdev should not happen before the memdev probe function or outside the scope of the memdev driver. cxl_test missed this bug because cxl_test always arranges for the cxl_mem driver to be loaded before cxl_mock_mem runs. So the driver core always finds the devres list idle in that case. [DJ: Updated subject title and added commit log suggestion from djbw] Fixes: 858ce2f56b52 ("cxl: Add FWCTL support to CXL") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/linux-cxl/6801aea053466_71fe2944c@dwillia2-xfh.jf.intel.com.notmuch/ Link: https://patch.msgid.link/20250418002933.406439-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-04-02Merge tag 'cxl-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: - Add support for Global Persistent Flush (GPF) - Cleanup of DPA partition metadata handling: - Remove the CXL_DECODER_MIXED enum that's not needed anymore - Introduce helpers to access resource and perf meta data - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' - Make cxl_dpa_alloc() DPA partition number agnostic - Remove cxl_decoder_mode - Cleanup partition size and perf helpers - Remove unused CXL partition values - Add logging support for CXL CPER endpoint and port protocol errors: - Prefix protocol error struct and function names with cxl_ - Move protocol error definitions and structures to a common location - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h - Add support in GHES to process CXL CPER protocol errors - Process CXL CPER protocol errors - Add trace logging for CXL PCIe port RAS errors - Remove redundant gp_port init - Add validation of cxl device serial number - CXL ABI documentation updates/fixups - A series that uses guard() to clean up open coded mutex lockings and remove gotos for error handling. - Some followup patches to support dirty shutdown accounting: - Add helper to retrieve DVSEC offset for dirty shutdown registers - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown() - Add support for dirty shutdown count via sysfs - cxl_test support for dirty shutdown - A series to support CXL mailbox Features commands. Mostly in preparation for CXL EDAC code to utilize the Features commands. It's also in preparation for CXL fwctl support to utilize the CXL Features. The commands include "Get Supported Features", "Get Feature", and "Set Feature". - A series to support extended linear cache support described by the ACPI HMAT table. The addition helps enumerate the cache and also provides additional RAS reporting support for configuration with extended linear cache. (and related fixes for the series). - An update to cxl_test to support a 3-way capable CFMWS - A documentation fix to remove unused "mixed mode" * tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits) cxl/region: Fix the first aliased address miscalculation cxl/region: Quiet some dev_warn()s in extended linear cache setup cxl/Documentation: Remove 'mixed' from sysfs mode doc cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems cxl/test: Define a CFMWS capable of a 3 way HB interleave cxl/mem: Do not return error if CONFIG_CXL_MCE unset tools/testing/cxl: Set Shutdown State support cxl/pmem: Export dirty shutdown count via sysfs cxl/pmem: Rename cxl_dirty_shutdown_state() cxl/pci: Introduce cxl_gpf_get_dvsec() cxl/pci: Support Global Persistent Flush (GPF) cxl: Document missing sysfs files cxl: Plug typos in ABI doc cxl/pmem: debug invalid serial number data cxl/cdat: Remove redundant gp_port initialization cxl/memdev: Remove unused partition values cxl/region: Drop goto pattern of construct_region() cxl/region: Drop goto pattern in cxl_dax_region_alloc() cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc() cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free() ...
2025-03-17cxl/test: Add Set Feature support to cxl_testDave Jiang
Add emulation to support Set Feature mailbox command to cxl_test. The only feature supported is the device patrol scrub feature. The set feature allows activation of patrol scrub for the cxl_test emulated device. The command does not support partial data transfer even though the spec allows it. This restriction is to reduce complexity of the emulation given the patrol scrub feature is very minimal. Link: https://patch.msgid.link/r/20250307205648.1021626-8-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17cxl/test: Add Get Feature support to cxl_testDave Jiang
Add emulation of Get Feature command to cxl_test. The feature for device patrol scrub is returned by the emulation code. This is the only feature currently supported by cxl_test. It returns the information for the device patrol scrub feature. Link: https://patch.msgid.link/r/20250307205648.1021626-7-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17cxl: Add FWCTL support to CXLDave Jiang
Add fwctl support code to allow sending of CXL feature commands from userspace through as ioctls via FWCTL. Provide initial setup bits. The CXL PCI probe function will call devm_cxl_setup_fwctl() after the cxl_memdev has been enumerated in order to setup FWCTL char device under the cxl_memdev like the existing memdev char device for issuing CXL raw mailbox commands from userspace via ioctls. Link: https://patch.msgid.link/r/20250307205648.1021626-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17Merge branch 'for-6.15/features' into cxl-for-nextDave Jiang
Add CXL Features support. Setup code for enabling in kernel usage of CXL Features. Expecting EDAC/RAS to utilize CXL Features in kernel for things such as memory sparing. Also prepartion for enabling of CXL FWCTL support to issue allowed Features from user space.
2025-03-14cxl/test: Define a CFMWS capable of a 3 way HB interleaveAlison Schofield
The CXL unit test cxl-xor-region.sh is skipping a 1+1+1 region interleave test case because the window is not defined. Additionally, upcoming expansion of 3 way HB interleave test cases (like 2+2+2) require the same window. Replace an unused CFMWS with a 3-way capable CFMWS in the set of CFMWS's loaded when interleave_arithmetic=1. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20250226221931.2352061-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14Merge branch 'for-6.15/extended-linear-cache' into cxl-for-next2Dave Jiang
Add support for Extended Linear Cache for CXL. Add enumeration support of the cache. Add MCE notification of the aliased memory address.
2025-03-14Merge branch 'for-6.15/dirty-shutdown' into cxl-for-next2Dave Jiang
Add support for Global Persistent Flush (GPF) and dirty shutdown accounting.
2025-03-14tools/testing/cxl: Set Shutdown State supportDavidlohr Bueso
Add support to emulate the CXL Set Shutdown State operation. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250220220235.276831-5-dave@stgolabs.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14cxl/pmem: debug invalid serial number dataYuquan Wang
In a nvdimm interleave-set each device with an invalid or zero serial number may cause pmem region initialization to fail, but in cxl case such device could still set cookies of nd_interleave_set and create a nvdimm pmem region. This adds the validation of serial number in cxl pmem region creation. The event of no serial number would cause to fail to set the cookie and pmem region. For cxl-test to work properly, always +1 on mock device's serial number. Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250219040029.515451-2-wangyuquan1236@phytium.com.cn Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14Merge branch 'for-6.15/fw-first-error-logging' into cxl-for-next2Dave Jiang
Add logging support for CXL CPER endpoint and port protocol errors. Including the 2 patches that was completed later. Link: https://lore.kernel.org/linux-cxl/20250123084421.127697-1-Smita.KoralahalliChannabasappa@amd.com/ Link: https://lore.kernel.org/linux-cxl/20250310223839.31342-1-Smita.KoralahalliChannabasappa@amd.com/
2025-03-14acpi/ghes, cxl/pci: Process CXL CPER Protocol ErrorsSmita Koralahalli
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors. The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors. Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing. Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations. Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors. [DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl: Add mce notifier to emit aliased address for extended linear cacheDave Jiang
Below is a setup with extended linear cache configuration with an example layout of memory region shown below presented as a single memory region consists of 256G memory where there's 128G of DRAM and 128G of CXL memory. The kernel sees a region of total 256G of system memory. 128G DRAM 128G CXL memory |-----------------------------------|-------------------------------------| Data resides in either DRAM or far memory (FM) with no replication. Hot data is swapped into DRAM by the hardware behind the scenes. When error is detected in one location, it is possible that error also resides in the aliased location. Therefore when a memory location that is flagged by MCE is part of the special region, the aliased memory location needs to be offlined as well. Add an mce notify callback to identify if the MCE address location is part of an extended linear cache region and handle accordingly. Added symbol export to set_mce_nospec() in x86 code in order to call set_mce_nospec() from the CXL MCE notify callback. Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250226162224.3633792-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26acpi/hmat / cxl: Add extended linear cache support for CXLDave Jiang
The current cxl region size only indicates the size of the CXL memory region without accounting for the extended linear cache size. Retrieve the cache size from HMAT and append that to the cxl region size for the cxl region range that matches the SRAT range that has extended linear cache enabled. The SRAT defines the whole memory range that includes the extended linear cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side Cache Information Structure defines the size of the extended linear cache size and matches to the SRAT Memory Affinity Structure by the memory proxmity domain. Add a helper to match the cxl range to the SRAT memory range in order to retrieve the cache size. There are several places that checks the cxl region range against the decoder range. Use new helper to check between the two ranges and address the new cache size. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250226162224.3633792-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl/test: Add Get Supported Features mailbox command supportDave Jiang
Add cxl-test emulation of Get Supported Features mailbox command. Currently only adding a test feature with feature identifier of all f's for testing. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250220194438.2281088-4-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl: Add Get Supported Features command for kernel usageDave Jiang
CXL spec r3.2 8.2.9.6.1 Get Supported Features (Opcode 0500h) The command retrieve the list of supported device-specific features (identified by UUID) and general information about each Feature. The driver will retrieve the Feature entries in order to make checks and provide information for the Get Feature and Set Feature command. One of the main piece of information retrieved are the effects a Set Feature command would have for a particular feature. The retrieved Feature entries are stored in the cxl_mailbox context. The setup of Features is initiated via devm_cxl_setup_features() during the pci probe function before the cxl_memdev is enumerated. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-04cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'Dan Williams
The pending efforts to add CXL Accelerator (type-2) device [1], and Dynamic Capacity (DCD) support [2], tripped on the no-longer-fit-for-purpose design in the CXL subsystem for tracking device-physical-address (DPA) metadata. Trip hazards include: - CXL Memory Devices need to consider a PMEM partition, but Accelerator devices with CXL.mem likely do not in the common case. - CXL Memory Devices enumerate DPA through Memory Device mailbox commands like Partition Info, Accelerators devices do not. - CXL Memory Devices that support DCD support more than 2 partitions. Some of the driver algorithms are awkward to expand to > 2 partition cases. - DPA performance data is a general capability that can be shared with accelerators, so tracking it in 'struct cxl_memdev_state' is no longer suitable. - Hardcoded assumptions around the PMEM partition always being index-1 if RAM is zero-sized or PMEM is zero sized. - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a memory property, it should be phased in favor of a partition id and the memory property comes from the partition info. Towards cleaning up those issues and allowing a smoother landing for the aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared way for Memory Devices and Accelerators to initialize the DPA information in 'struct cxl_dev_state'. For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to get the new data structure initialized, and cleanup some qos_class init. Follow on patches will go further to use the new data structure to cleanup algorithms that are better suited to loop over all possible partitions. cxl_dpa_setup() follows the locking expectations of mutating the device DPA map, and is suitable for Accelerator drivers to use. Accelerators likely only have one hardcoded 'ram' partition to convey to the cxl_core. Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Link: https://patch.msgid.link/173864305827.668823.13978794102080021276.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-04cxl: Introduce to_{ram,pmem}_{res,perf}() helpersDan Williams
In preparation for consolidating all DPA partition information into an array of DPA metadata, introduce helpers that hide the layout of the current data. I.e. make the eventual replacement of ->ram_res, ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a no-op for code paths that consume that information, and reduce the noise of follow-on patches. The end goal is to consolidate all DPA information in 'struct cxl_dev_state', but for now the helpers just make it appear that all DPA metadata is relative to @cxlds. As the conversion to generic partition metadata walking is completed, these helpers will naturally be eliminated, or reduced in scope. Cc: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Link: https://patch.msgid.link/173864305238.668823.16553986866633608541.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-29Merge tag 'cxl-for-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: "A tweak to the HMAT output that was acked by Rafael, a prep patch for CXL type2 devices support that's coming soon, refactoring of the CXL regblock enumeration code, and a series of patches to update the event records to CXL spec r3.1: - Move HMAT printouts to pr_debug() - Add CXL type2 support to cxl_dvsec_rr_decode() in preparation for type2 support - A series that updates CXL event records to spec r3.1 and related changes - Refactoring of cxl_find_regblock_instance() to count regblocks" * tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/regs: Refactor out functions to count regblocks of given type cxl/test: Update test code for event records to CXL spec rev 3.1 cxl/events: Update Memory Module Event Record to CXL spec rev 3.1 cxl/events: Update DRAM Event Record to CXL spec rev 3.1 cxl/events: Update General Media Event Record to CXL spec rev 3.1 cxl/events: Add Component Identifier formatting for CXL spec rev 3.1 cxl/events: Update Common Event Record to CXL spec rev 3.1 cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode() ACPI/HMAT: Move HMAT messages to pr_debug()
2025-01-13cxl/test: Update test code for event records to CXL spec rev 3.1Shiju Jose
Update test code for General Media, DRAM, Memory Module Event Records to CXL spec rev 3.1. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-7-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-03driver core: Constify API device_find_child() and adapt for various usagesZijun Hu
Constify the following API: struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); To : struct device *device_find_child(struct device *dev, const void *data, device_match_t match); typedef int (*device_match_t)(struct device *dev, const void *data); with the following reasons: - Protect caller's match data @*data which is for comparison and lookup and the API does not actually need to modify @*data. - Make the API's parameters (@match)() and @data have the same type as all of other device finding APIs (bus|class|driver)_find_device(). - All kinds of existing device match functions can be directly taken as the API's argument, they were exported by driver core. Constify the API and adapt for various existing usages. BTW, various subsystem changes are squashed into this commit to meet 'git bisect' requirement, and this commit has the minimal and simplest changes to complement squashing shortcoming, and that may bring extra code improvement. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-02cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode()Alejandro Lucero
In cxl_dvsec_rr_decode() the pci driver expects to retrieve a cxlds, struct cxl_dev_state, from the driver_data field of struct device. While that works for Type 3, drivers for Type 1/2 devices may not put a cxlds in the driver_data field. In preparation for supporting Type 1/2 devices, replace parameter 'struct device' with 'struct cxl_dev_state' in cxl_dvsec_rr_decode(). Remove the unused parameter 'cxl_port' in cxl_dvsec_rr_decode(). Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20241203162112.5088-1-alucerop@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>