From c1f1fda141373d7253b4c1497043b0ef85f534ce Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Mon, 14 Jul 2025 19:42:12 +0800 Subject: ACPI: APEI: handle synchronous exceptions in task work The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous errors use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, which: - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed a poisoned page. This is done by sending a SIGBUS signal with error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who has the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not factually correct. After this change: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Since the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue Tested-by: Ma Wupeng Reviewed-by: Kefeng Wang Reviewed-by: Xiaofei Tan Reviewed-by: Baolin Wang Reviewed-by: Jarkko Sakkinen Reviewed-by: Jonathan Cameron Reviewed-by: Jane Chu Reviewed-by: Yazen Ghannam Reviewed-by: Hanjun Guo Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0ef2ba0c667a..02dffff2508b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3896,7 +3896,6 @@ enum mf_flags { int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); -extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern atomic_long_t num_poisoned_pages __read_mostly; extern int soft_offline_page(unsigned long pfn, int flags); -- cgit v1.2.3