From 9544f9e6947f6508d29f0d0cc2dacaa749fc1613 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Wed, 15 Oct 2025 14:36:15 +0800 Subject: hung_task: panic when there are more than N hung tasks at the same time The hung_task_panic sysctl is currently a blunt instrument: it's all or nothing. Panicking on a single hung task can be an overreaction to a transient glitch. A more reliable indicator of a systemic problem is when multiple tasks hang simultaneously. Extend hung_task_panic to accept an integer threshold, allowing the kernel to panic only when N hung tasks are detected in a single scan. This provides finer control to distinguish between isolated incidents and system-wide failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic on the first hung task (unchanged) - N > 1: Panic after N hung tasks are detected in a single scan The original behavior is preserved for values 0 and 1, maintaining full backward compatibility. [lance.yang@linux.dev: new changelog] Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com Signed-off-by: Li RongQing Reviewed-by: Masami Hiramatsu (Google) Reviewed-by: Lance Yang Tested-by: Lance Yang Acked-by: Andrew Jeffery [aspeed_g5_defconfig] Cc: Anshuman Khandual Cc: Arnd Bergmann Cc: David Hildenbrand Cc: Florian Wesphal Cc: Jakub Kacinski Cc: Jason A. Donenfeld Cc: Joel Granados Cc: Joel Stanley Cc: Jonathan Corbet Cc: Kees Cook Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: "Paul E . McKenney" Cc: Pawan Gupta Cc: Petr Mladek Cc: Phil Auld Cc: Randy Dunlap Cc: Russell King Cc: Shuah Khan Cc: Simon Horman Cc: Stanislav Fomichev Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide/sysctl') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index f3ee807b5d8b..0065a55bc09e 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -397,13 +397,14 @@ a hung task is detected. hung_task_panic =============== -Controls the kernel's behavior when a hung task is detected. +When set to a non-zero value, a kernel panic will be triggered if the +number of hung tasks found during a single scan reaches this value. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. -= ================================================= += ======================================================= 0 Continue operation. This is the default behavior. -1 Panic immediately. -= ================================================= +N Panic when N hung tasks are found during a single scan. += ======================================================= hung_task_check_count -- cgit v1.2.3 From 5f264c00b669b934300dff506d0aa9f6f8f8c53e Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:36 +0800 Subject: docs: panic: correct some sys_ifo names in sysctl doc Patch series "Enable hung_task and lockup cases to dump system info on demand", v2. When working on kernel stability issues: panic, task-hung and soft/hard lockup are frequently met. And to debug them, user may need lots of system information at that time, like task call stacks, lock info, memory info, ftrace dump, etc. panic case already uses sys_info() for this purpose, and has a 'panic_sys_info' sysctl(also support cmdline setup) interface to take human readable string like "tasks,mem,timers,locks,ftrace,..." to control what kinds of information is needed. Which is also helpful to debug task-hung and lockup cases. So this patchset introduces the similar sys_info sysctl interface for task-hung and lockup cases. his is mainly for debugging and the info dumping could be intrusive, like dumping call stack for all tasks when system has huge number of tasks, similarly for ftrace dump (we may add tracing_stop() and tracing_start() around it) Locally these have been used in our bug chasing for stability issues and were helpful. As Andrew suggested, add a configurable global 'kernel_sys_info' knob. When error scenarios like panic/hung-task/lockup etc doesn't setup their own sys_info knob and calls sys_info() with parameter "0", this global knob will take effect. It could be used for other kernel cases like OOM, which may not need one dedicated sys_info knob. This patch (of 4): Some sys_info names wered forgotten to change in patch iterations, while the right names are defined in kernel/sys_info.c. Link: https://lkml.kernel.org/r/20251113111039.22701-1-feng.tang@linux.alibaba.com Link: https://lkml.kernel.org/r/20251113111039.22701-2-feng.tang@linux.alibaba.com Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter") Signed-off-by: Feng Tang Reviewed-by: Petr Mladek Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/admin-guide/sysctl') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0065a55bc09e..a397eeccaea7 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -911,8 +911,8 @@ to 'panic_print'. Possible values are: ============= =================================================== tasks print all tasks info mem print system memory info -timer print timers info -lock print locks info if CONFIG_LOCKDEP is on +timers print timers info +locks print locks info if CONFIG_LOCKDEP is on ftrace print ftrace buffer all_bt print all CPUs backtrace (if available in the arch) blocked_tasks print only tasks in uninterruptible (blocked) state -- cgit v1.2.3 From 8b2b9b4f6f4f7a61b7e323479ed7d9faa21d6287 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:37 +0800 Subject: hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung When task-hung happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'hung_task_sys_info' sysctl knob to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all requested information will be dumped. (refer kernel/sys_info.c for more details). Meanwhile, the newly introduced sys_info() call is used to unify some existing info-dumping knobs. [feng.tang@linux.alibaba.com: maintain consistecy established behavior, per Lance and Petr] Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local Link: https://lkml.kernel.org/r/20251113111039.22701-3-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Reviewed-by: Lance Yang Cc: Jonathan Corbet Cc: "Paul E . McKenney" Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/admin-guide/sysctl') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index a397eeccaea7..45b4408dad31 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -422,6 +422,11 @@ the system boot. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. +hung_task_sys_info +================== +A comma separated list of extra system information to be dumped when +hung task is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. hung_task_timeout_secs ====================== -- cgit v1.2.3 From a9af76a78760717361cccc884dc649e30db61c8b Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:38 +0800 Subject: watchdog: add sys_info sysctls to dump sys info on system lockup When soft/hard lockup happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all requested information will be printed out. (refer kernel/sys_info.c for more details). Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang Reviewed-by: Petr Mladek Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Petr Mladek Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/admin-guide/sysctl') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 45b4408dad31..176520283f1a 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -582,6 +582,11 @@ if leaking kernel pointer values to unprivileged users is a concern. When ``kptr_restrict`` is set to 2, kernel pointers printed using %pK will be replaced with 0s regardless of privileges. +softlockup_sys_info & hardlockup_sys_info +========================================= +A comma separated list of extra system information to be dumped when +soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. modprobe ======== -- cgit v1.2.3 From 03ef32d665e8a23d7ce5965b8b035666cfb47866 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:39 +0800 Subject: sys_info: add a default kernel sys_info mask Which serves as a global default sys_info mask. When users want the same system information for many error cases (panic, hung, lockup ...), they can chose to set this global knob only once, while not setting up each individual sys_info knobs. This just adds a 'lazy' option, and doesn't change existing kernel behavior as the mask is 0 by default. Link: https://lkml.kernel.org/r/20251113111039.22701-5-feng.tang@linux.alibaba.com Suggested-by: Andrew Morton Signed-off-by: Feng Tang Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Petr Mladek Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation/admin-guide/sysctl') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 176520283f1a..239da22c4e28 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -521,6 +521,15 @@ default), only processes with the CAP_SYS_ADMIN capability may create io_uring instances. +kernel_sys_info +=============== +A comma separated list of extra system information to be dumped when +soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. + +It serves as the default kernel control knob, which will take effect +when a kernel module calls sys_info() with parameter==0. + kexec_load_disabled =================== -- cgit v1.2.3