From 7cb735d7c0cb4307b2072aae6268b5b2069a8658 Mon Sep 17 00:00:00 2001 From: Yazen Ghannam Date: Tue, 4 Nov 2025 14:55:39 +0000 Subject: x86/mce: Unify AMD DFR handler with MCA Polling AMD systems optionally support a deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the interrupt, etc. However, Scalable MCA systems include a pair of registers, MCA_DESTAT and MCA_DEADDR, that should be checked for valid errors. This check should be done whenever MCA registers are polled. Currently, the deferred error interrupt does this check, but the MCA polling function does not. Call the MCA polling function when handling the deferred error interrupt. This keeps all "polling" cases in a common function. Add an SMCA status check helper. This will do the same status check and register clearing that the interrupt handler has done. And it extends the common polling flow to find AMD deferred errors. Clear the MCA_DESTAT register at the end of the handler rather than the beginning. This maintains the procedure that the 'status' register must be cleared as the final step. Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com --- arch/x86/include/asm/mce.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 31e3cb550fb3..7d6588195d56 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -165,6 +165,12 @@ */ #define MCE_IN_KERNEL_COPYIN BIT_ULL(7) +/* + * Indicates that handler should check and clear Deferred error registers + * rather than common ones. + */ +#define MCE_CHECK_DFR_REGS BIT_ULL(8) + /* * This structure contains all data related to the MCE log. Also * carries a signature to make it easier to find from external -- cgit v1.2.3 From eeb3f76d73baed4c8ecc883e1eaafba3cb8aae1d Mon Sep 17 00:00:00 2001 From: Yazen Ghannam Date: Tue, 4 Nov 2025 14:55:45 +0000 Subject: x86/mce: Save and use APEI corrected threshold limit The MCA threshold limit generally is not something that needs to change during runtime. It is common for a system administrator to decide on a policy for their managed systems. If MCA thresholding is OS-managed, then the threshold limit must be set at every boot. However, many systems allow the user to set a value in their BIOS. And this is reported through an APEI HEST entry even if thresholding is not in FW-First mode. Use this value, if available, to set the OS-managed threshold limit. Users can still override it through sysfs if desired for testing or debug. APEI is parsed after MCE is initialized. So reset the thresholding blocks later to pick up the threshold limit. Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com --- arch/x86/include/asm/mce.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 7d6588195d56..1cfbfff0be3f 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -308,6 +308,12 @@ DECLARE_PER_CPU(struct mce, injectm); /* Disable CMCI/polling for MCA bank claimed by firmware */ extern void mce_disable_bank(int bank); +#ifdef CONFIG_X86_MCE_THRESHOLD +void mce_save_apei_thr_limit(u32 thr_limit); +#else +static inline void mce_save_apei_thr_limit(u32 thr_limit) { } +#endif /* CONFIG_X86_MCE_THRESHOLD */ + /* * Exception handler */ -- cgit v1.2.3 From 821f5fe4dbcb82357f0dfcfca3dd98f2b4097e53 Mon Sep 17 00:00:00 2001 From: Avadhut Naik Date: Tue, 18 Nov 2025 19:15:03 +0000 Subject: x86/mce: Add support for physical address valid bit Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System Physical Address (SPA) is present in MCA_ADDR. PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it indicates validity of SPA in MCA_ADDR. PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if it is implementation specific. Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. Signed-off-by: Avadhut Naik Signed-off-by: Borislav Petkov (AMD) Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com --- arch/x86/include/asm/mce.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/x86/include') diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 1cfbfff0be3f..2d98886de09a 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -48,6 +48,7 @@ /* AMD-specific bits */ #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */ +#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */ #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */ #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */ #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */ @@ -62,6 +63,7 @@ */ #define MCI_CONFIG_MCAX 0x1 #define MCI_CONFIG_FRUTEXT BIT_ULL(9) +#define MCI_CONFIG_PADDRV BIT_ULL(11) #define MCI_IPID_MCATYPE 0xFFFF0000 #define MCI_IPID_HWID 0xFFF -- cgit v1.2.3