| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"The usual trickle of EFI contributions:
- Parse SMBIOS tables in memory directly on Macbooks that do not
implement the EFI SMBIOS protocol
- Obtain EDID information from the primary display while running in
the EFI stub, and expose it via bootparams on x86 (generic method
is in the works, and will likely land during the next cycle)
- Bring CPER handling for ARM systems up to data with the latest EFI
spec changes
- Various cosmetic changes"
* tag 'efi-next-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
docs: efi: add CPER functions to driver-api
efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
efi/cper: Add a new helper function to print bitmasks
efi/cper: Adjust infopfx size to accept an extra space
RAS: Report all ARM processor CPER information to userspace
efi/libstub: x86: Store EDID in boot_params
efi/libstub: gop: Add support for reading EDID
efi/libstub: gop: Initialize screen_info in helper function
efi/libstub: gop: Find GOP handle instead of GOP data
efi: Fix trailing whitespace in header file
efi/memattr: Convert efi_memattr_init() return type to void
efi: stmm: fix kernel-doc "bad line" warnings
efi/riscv: Remove the useless failure return message print
efistub/x86: Add fallback for SMBIOS record lookup
|
|
Introduce a generic infrastructure for tracking recoverable hardware
errors (HW errors that are visible to the OS but does not cause a panic)
and record them for vmcore consumption. This aids post-mortem crash
analysis tools by preserving a count and timestamp for the last occurrence
of such errors. On the other side, correctable errors, which the OS
typically remains unaware of because the underlying hardware handles them
transparently, are less relevant for crash dump and therefore are NOT
tracked in this infrastructure.
Add centralized logging for sources of recoverable hardware errors based
on the subsystem it has been notified.
hwerror_data is write-only at kernel runtime, and it is meant to be read
from vmcore using tools like crash/drgn. For example, this is how it
looks like when opening the crashdump from drgn.
>>> prog['hwerror_data']
(struct hwerror_info[1]){
{
.count = (int)844,
.timestamp = (time64_t)1752852018,
},
...
This helps fleet operators quickly triage whether a crash may be
influenced by hardware recoverable errors (which executes a uncommon code
path in the kernel), especially when recoverable errors occurred shortly
before a panic, such as the bug fixed by commit ee62ce7a1d90 ("page_pool:
Track DMA-mapped pages and unmap them when destroying the pool")
This is not intended to replace full hardware diagnostics but provides a
fast way to correlate hardware events with kernel panics quickly.
Rare machine check exceptions—like those indicated by mce_flags.p5 or
mce_flags.winchip—are not accounted for in this method, as they fall
outside the intended usage scope for this feature's user base.
[leitao@debian.org: add hw-recoverable-errors to toctree]
Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org
Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI]
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
was defined simply as:
Type at byte offset 4:
- Cache error
- TLB Error
- Bus Error
- Micro-architectural Error
All other values are reserved
Yet, there was no information about how this would be encoded.
Spec 2.9A errata corrected it by defining:
- Bit 1 - Cache Error
- Bit 2 - TLB Error
- Bit 3 - Bus Error
- Bit 4 - Micro-architectural Error
All other values are reserved
That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.
Spec 2.10 also preserve the same encoding as 2.9A.
Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The ARM processor CPER record was added in UEFI v2.6 and remained
unchanged up to v2.10.
Yet, the original arm_event trace code added by
e9279e83ad1f ("trace, ras: add ARM processor error trace event")
is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
exporting any information from tables N.17 to N.29 of the record.
This is not enough for the user to be able to figure out what has
exactly happened or to take appropriate action.
According to the UEFI v2.9 specification chapter N2.4.4, the ARM
processor error section includes:
- several (ERR_INFO_NUM) ARM processor error information structures
(Tables N.17 to N.20);
- several (CONTEXT_INFO_NUM) ARM processor context information
structures (Tables N.21 to N.29);
- several vendor specific error information structures. The
size is given by Section Length minus the size of the other
fields.
In addition, it also exports two fields that are parsed by the GHES
driver when firmware reports it, e.g.:
- error severity
- CPU logical index
Report all of these information to userspace via a the ARM tracepoint so
that userspace can properly record the error and take decisions related
to CPU core isolation according to error severity and other info.
The updated ARM trace event now contains the following fields:
====================================== =============================
UEFI field on table N.16 ARM Processor trace fields
====================================== =============================
Validation handled when filling data for
affinity MPIDR and running
state.
ERR_INFO_NUM pei_len
CONTEXT_INFO_NUM ctx_len
Section Length indirectly reported by
pei_len, ctx_len and oem_len
Error affinity level affinity
MPIDR_EL1 mpidr
MIDR_EL1 midr
Running State running_state
PSCI State psci_state
Processor Error Information Structure pei_err - count at pei_len
Processor Context ctx_err- count at ctx_len
Vendor Specific Error Info oem - count at oem_len
====================================== =============================
It should be noted that decoding of tables N.17 to N.29, if needed, will
be handled in userspace. That gives more flexibility, as there won't be
any need to flood the kernel with micro-architecture specific error
decoding.
Also, decoding the other fields require a complex logic, and should be
done for each of the several values inside the record field. So, let
userspace daemons like rasdaemon decode them, parsing such tables and
having vendor-specific micro-architecture-specific decoders.
[mchehab: modified description, solved merge conflicts and fixed coding style]
Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com> # rebased
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
ACPI 6.6 specification for EINJV2 appends an extra structure to
the end of the existing struct set_error_type_with_address.
Several issues showed up in testing.
1) Initialization was broken by an earlier fix [1] since is_v2 is only
set while performing an injection, not during initialization.
2) A buggy BIOS provided invalid "revision" and "length" for the
extension structure. Add several sanity checks.
3) When injecting legacy error types on an EINJV2 capable system,
don't copy the component arrays.
Fixes: 6c7058514991 ("ACPI: APEI: EINJ: Check if user asked for EINJV2 injection") # [1]
Fixes: b47610296d17 ("ACPI: APEI: EINJ: Enable EINJv2 error injections")
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ rjw: Changelog edits ]
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+
Link: https://patch.msgid.link/20251119012712.178715-1-tony.luck@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Poison (or ECC) errors can be very common on a large size cluster. The
kernel MM currently handles ECC errors / poison only on memory page backed
by struct page. The handling is currently missing for the PFNMAP memory
that does not have struct pages. The series adds such support.
Implement a new ECC handling for memory without struct pages. Kernel MM
expose registration APIs to allow modules that are managing the device to
register its device memory region. MM then tracks such regions using
interval tree.
The mechanism is largely similar to that of ECC on pfn with struct pages.
If there is an ECC error on a pfn, all the mapping to it are identified
and a SIGBUS is sent to the user space processes owning those mappings.
Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN.
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2. This can greatly increase the cost of
table walks and have a major performance impact.
nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using
remap_pfn_range without being added to the kernel [2]. These device
memory PFNs are not backed by struct page. So make nvgrace-gpu-vfio-pci
module make use of the mechanism to get poison handling support on the
device memory.
This patch (of 3):
The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check. This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().
Update code to allow memory_failure() call on PFNs failing pfn_valid().
Link: https://lkml.kernel.org/r/20251102184434.2406-1-ankita@nvidia.com
Link: https://lkml.kernel.org/r/20251102184434.2406-2-ankita@nvidia.com
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Aniket Agashe <aniketa@nvidia.com>
Cc: Ankit Agrawal <ankita@nvidia.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: Vikram Sethi <vsethi@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since commit a8bb74acd8efe ("rcu: Consolidate RCU-sched update-side
function definitions") there is no difference between rcu_read_lock(),
rcu_read_lock_bh() and rcu_read_lock_sched() in terms of RCU read
section and the relevant grace period. That means that spin_lock(),
which implies rcu_read_lock_sched(), also implies rcu_read_lock().
There is no need no explicitly start a RCU read section if one has
already been started implicitly by spin_lock().
Simplify the code and remove the inner rcu_read_lock() invocation.
Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
Signed-off-by: pengdonglin <dolinux.peng@gmail.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250916044735.2316171-2-dolinux.peng@gmail.com
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
EINJ driver today only allows injection request to go through for two
kinds of IORESOURCE_MEM: IORES_DESC_PERSISTENT_MEMORY and
IORES_DESC_SOFT_RESERVED. This check prevents user of EINJ to test
memory corrupted in many interesting areas:
- Legacy persistent memory
- Memory claimed to be used by ACPI tables or NV storage
- Kernel crash memory and others
There is need to test how kernel behaves when something consumes memory
errors in these memory regions. For example, if certain ACPI table is
corrupted, does kernel crash gracefully to prevent "silent data
corruption". For another example, legacy persistent memory, when managed
by Device DAX, does support recovering from Machine Check Exception
raised by memory failure, hence worth to be tested.
However, attempt to inject memory error via EINJ to legacy persistent
memory or ACPI owned memory fails with -EINVAL.
Allow EINJ to inject at address except it is MMIO. Leave it to the BIOS
or firmware to decide what is a legitimate injection target.
In addition to the test done in [1], on a machine having the following
iomem resources:
...
01000000-08ffffff : Crash kernel
768f0098-768f00a7 : APEI EINJ
...
768f4000-77323fff : ACPI Non-volatile Storage
77324000-777fefff : ACPI Tables
777ff000-777fffff : System RAM
77800000-7fffffff : Reserved
80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
90040000-957fffff : PCI Bus 0000:00
...
300000000-3ffffffff : Persistent Memory (legacy)
...
I commented __einj_error_inject during the test and just tested when
injecting a memory error at each start address shown above:
- 0x80000000 and 0x90040000 both failed with EINVAL
- request passed through for all other addresses
Link: https://lore.kernel.org/linux-acpi/20250825223348.3780279-1-jiaqiyan@google.com [1]
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250910044531.264043-1-jiaqiyan@google.com
[ rjw: Adding a link tag for the [1] reference ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Use the result of copy_from_user() directly instead of assigning it to
the local variable 'rc' and then overwriting it in erst_dbg_write() or
immediately returning from erst_dbg_ioctl().
No intentional functional changes.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://patch.msgid.link/20250903224913.242928-2-thorsten.blum@linux.dev
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The .remove() callback is also used during error handling in
faux_probe(). As einj_remove() was marked with __exit it's not linked
into the kernel if the driver is built-in, potentially resulting in
resource leaks.
Also remove the comment justifying the __exit annotation which doesn't
apply any more since the driver was converted to the faux device
interface.
Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Link: https://patch.msgid.link/20250814051157.35867-2-u.kleine-koenig@baylibre.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The __einj_error_inject() function allocates memory via kmalloc()
without checking for allocation failure, which could lead to a
NULL pointer dereference.
Return -ENOMEM in case allocation fails.
Fixes: b47610296d17 ("ACPI: APEI: EINJ: Enable EINJv2 error injections")
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250815024207.3038-1-hanchunchao@inspur.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On an EINJV2 capable system, users may still use the old injection
interface but einj_get_parameter_address() takes the EINJV2 path to map
the parameter structure. This results in the address the user supplied
being stored to the wrong location and the BIOS injecting based on an
uninitialized field (0x0 in the reported case).
Check the version of the request when mapping the EINJ parameter
structure in BIOS reserved memory.
Fixes: 691a0f0a557b ("ACPI: APEI: EINJ: Discover EINJv2 parameters")
Reported-by: Lai, Yi1 <yi1.lai@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Zaid Alali <zaidal@os.amperecomputing.com>
Reviewed-by: Hanjun Guo <gouhanjun@huawei.com>
Link: https://patch.msgid.link/20250814161706.4489-1-tony.luck@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The memory uncorrected error could be signaled by asynchronous interrupt
(specifically, SPI in arm64 platform), e.g. when an error is detected by
a background scrubber, or signaled by synchronous exception
(specifically, data abort exception in arm64 platform), e.g. when a CPU
tries to access a poisoned cache line. Currently, both synchronous and
asynchronous errors use memory_failure_queue() to schedule
memory_failure() to exectute in a kworker context.
As a result, when a user-space process is accessing a poisoned data, a
data abort is taken and the memory_failure() is executed in the kworker
context, which:
- will send wrong si_code by SIGBUS signal in early_kill mode, and
- can not kill the user-space in some cases resulting a synchronous
error infinite loop
Issue 1: send wrong si_code in early_kill mode
Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as
MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED
could be used to determine whether a synchronous exception occurs on
ARM64 platform. When a synchronous exception is detected, the kernel is
expected to terminate the current process which has accessed a poisoned
page. This is done by sending a SIGBUS signal with error code
BUS_MCEERR_AR, indicating an action-required machine check error on
read.
However, when kill_proc() is called to terminate the processes who has
the poisoned page mapped, it sends the incorrect SIGBUS error code
BUS_MCEERR_AO because the context in which it operates is not the one
where the error was triggered.
To reproduce this problem:
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 5 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO
error and it is not factually correct.
After this change:
# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1
# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR
error as expected.
Issue 2: a synchronous error infinite loop
If a user-space process, e.g. devmem, accesses a poisoned page for which
the HWPoison flag is set, kill_accessing_process() is called to send
SIGBUS to current processs with error info. Since the memory_failure()
is executed in the kworker context, it will just do nothing but return
EFAULT. So, devmem will access the posioned page and trigger an
exception again, resulting in a synchronous error infinite loop. Such
exception loop may cause platform firmware to exceed some threshold and
reboot when Linux could have recovered from this error.
To reproduce this problem:
# STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed
# STEP 2: access the same page and it will trigger a synchronous error infinite loop
devmem 0x4092d55b400
To fix above two issues, queue memory_failure() as a task_work so that
it runs in the context of the process that is actually consuming the
poisoned data.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
recovered
If a synchronous error is detected as a result of user-space process
triggering a 2-bit uncorrected error, the CPU will take a synchronous
error exception such as Synchronous External Abort (SEA) on Arm64. The
kernel will queue a memory_failure() work which poisons the related
page, unmaps the page, and then sends a SIGBUS to the process, so that
a system wide panic can be avoided.
However, no memory_failure() work will be queued when abnormal
synchronous errors occur. These errors can include situations like
invalid PA, unexpected severity, no memory failure config support,
invalid GUID section, etc. In such a case, the user-space process will
trigger SEA again. This loop can potentially exceed the platform
firmware threshold or even trigger a kernel hard lockup, leading to a
system reboot.
Fix it by performing a force kill if no memory_failure() work is queued
for synchronous errors.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250714114212.31660-2-xueshuai@linux.alibaba.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The trigger events are in BIOS memory immediately following the
acpi_einj_trigger structure. These were not copied to regular
kernel memory for use by apei_exec_ctx_init() so injections in
"notrigger=0" mode failed with a message like this:
APEI: Invalid action table, unknown instruction type: 123
Fix by allocating a "table_size" block of memory and copying the whole
table for use in the rest of the trigger flow.
Fixes: 1a35c88302a3 ("ACPI: APEI: EINJ: Fix kernel test sparse warnings")
Reported-by: Yi1 Lai <yi1.lai@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20250703200421.28012-1-tony.luck@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When a GHES (Generic Hardware Error Source) triggers a panic, add the
TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the
kernel as tainted due to a machine check event, improving diagnostics
and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to
indicate lockdep remains valid.
At large scale deployment, this helps to quickly determine panics that
are coming due to hardware failures.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20250702-add_tain-v1-1-9187b10914b9@debian.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In the case where a request_mem_region call fails and pointer r is null
the error exit path via label 'out' will check for a non-null pointer
p and try to iounmap it. However, pointer p has not been assigned a
value at this point, so it may potentially contain any garbage value.
Fix this by ensuring pointer p is initialized to NULL.
Fixes: 1a35c88302a3 ("ACPI: APEI: EINJ: Fix kernel test sparse warnings")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250624202937.523013-1-colin.i.king@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The check for c < 0 is always false because variable c is a size_t which
is not a signed type. Fix this by making c a ssize_t.
Fixes: 90711f7bdf76 ("ACPI: APEI: EINJ: Create debugfs files to enter device id and syndrome")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20250624201032.522168-1-colin.i.king@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The "einj_buf" buffer is 32 chars. If "count" is larger than that it
results in memory corruption. Cap it at 31 so that we leave the last
character as a NUL terminator. By the way, the highest reasonable value
for "count" is 24.
Fixes: 0c6176e1e186 ("ACPI: APEI: EINJ: Enable the discovery of EINJv2 capabilities")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/ae6286cf-4d73-4b97-8c0f-0782a65b8f51@sabinyo.mountain
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Enable injection using EINJv2 mode of operation.
[Tony: Mostly Zaid's original code. I just changed how the error ID
and syndrome bits are implemented. Also swapped out some camelcase
variable names]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-7-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
EINJv2 allows users to inject multiple errors at the same time by
specifying the device id and syndrome bits for each error in a flex
array.
Create files in the einj debugfs directory to enter data for each
device id and syndrome value. Note that the specification says these
are 128-bit little-endian values. Linux doesn't have a handy helper
to manage objects of this type.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-6-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The EINJv2 set_error_type_with_address structure has a flex array
to hold the component IDs and syndrome values used when injecting
multiple errors at once.
Discover the size of this array by taking the address from the
ACPI_EINJ_SET_ERROR_TYPE_WITH_ADDRESS entry in the EINJ table
and reading the BIOS copy of the structure.
Derive the maximum number of components from the length field
in the einjv2_extension_struct at the end of the BIOS copy.
Map the whole of the structure into kernel memory (and unmap
on module unload).
[Tony: Code unchanged from Zaid's original. New commit message]
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-5-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add einjv2 extension struct and EINJv2 error types to prepare
the driver for EINJv2 support. ACPI specifications[1] enables
EINJv2 by extending set_error_type_with_address struct.
Link: https://uefi.org/specs/ACPI/6.6/18_Platform_Error_Interfaces.html#einjv2-extension-structure [1]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-4-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Enable the driver to show all supported error injections for EINJ
and EINJv2 at the same time. EINJv2 capabilities can be discovered
by checking the return value of get_error_type, where bit 30 set
indicates EINJv2 support.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-3-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This patch fixes the kernel test robot warning reported here:
Link: https://lore.kernel.org/all/202410241620.oApALow5-lkp@intel.com/
Use pointers annotated with the __iomem marker for all iomem map calls,
and creates a local copy of the mapped IO memory for future access in
the code. memcpy_fromio() and memcpy_toio() are used to read/write data
from/to mapped IO memory.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250617193026.637510-2-zaidal@os.amperecomputing.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
einj_probe() failures were tracked by the einj_initialized flag without
failing einj_init().
Revert to that behavior and always succeed einj_init() given there is no
way, and no pressing need, to discern faux device-create vs device-probe
failures.
This situation arose because CXL knows proper kernel named objects to
trigger errors against, but acpi-einj knows how to perform the error
injection. The injection mechanism is shared with non-CXL use cases. The
result is CXL now has a module dependency on einj-core.ko, and init/probe
failures are handled at runtime.
Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250607033228.1475625-4-dan.j.williams@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These address issues introduced by recent ACPI changes merged
previously:
- Unbreak acpi_ut_safe_strncpy() by restoring its previous behavior
changed incorrectly by a recent update (Ahmed Salem)
- Make a new static checker warning in the recently introduced ACPI
MRRM table parser go away (Dan Carpenter)
- Fix ACPI table referece leak in error path of einj_probe() (Dan
Carpenter)"
* tag 'acpi-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Switch back to using strncpy() in acpi_ut_safe_strncpy()
ACPI: MRRM: Silence error code static checker warning
ACPI: APEI: EINJ: Clean up on error in einj_probe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The headline feature is the re-enablement of support for Arm's
Scalable Matrix Extension (SME) thanks to a bumper crop of fixes
from Mark Rutland.
If matrices aren't your thing, then Ryan's page-table optimisation
work is much more interesting.
Summary:
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls
- Fix a node refcount imbalance in the PSCI device-tree code
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
guests can reliably query the underlying CPU types from the VMM
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code
Entry code:
- Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be
selected
Memory management:
- Prevent BSS exports from being used by the early PI code
- Propagate level and stride information to the low-level TLB
invalidation routines when operating on hugetlb entries
- Use the page-table contiguous hint for vmap() mappings with
VM_ALLOW_HUGE_VMAP where possible
- Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode"
and hook this up on arm64 so that the trailing DSB (used to publish
the updates to the hardware walker) can be deferred until the end
of the mapping operation
- Extend mmap() randomisation for 52-bit virtual addresses (on par
with 48-bit addressing) and remove limited support for
randomisation of the linear map
Perf and PMUs:
- Add support for probing the CMN-S3 driver using ACPI
- Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers
Selftests:
- Fix FPSIMD and SME tests to align with the freshly re-enabled SME
support
- Fix default setting of the OUTPUT variable so that tests are
installed in the right location
vDSO:
- Replace raw counter access from inline assembly code with a call to
the the __arch_counter_get_cntvct() helper function
Miscellaneous:
- Add some missing header inclusions to the CCA headers
- Rework rendering of /proc/cpuinfo to follow the x86-approach and
avoid repeated buffer expansion (the user-visible format remains
identical)
- Remove redundant selection of CONFIG_CRC32
- Extend early error message when failing to map the device-tree
blob"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: cputype: Add cputype definition for HIP12
arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again
perf/arm-cmn: Add CMN S3 ACPI binding
arm64/boot: Disallow BSS exports to startup code
arm64/boot: Move global CPU override variables out of BSS
arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace
perf/arm-cmn: Initialise cmn->cpu earlier
kselftest/arm64: Set default OUTPUT path when undefined
arm64: Update comment regarding values in __boot_cpu_mode
arm64: mm: Drop redundant check in pmd_trans_huge()
arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1
arm64/mm: Permit lazy_mmu_mode to be nested
arm64/mm: Disable barrier batching in interrupt contexts
arm64/cpuinfo: only show one cpu's info in c_show()
arm64/mm: Batch barriers when updating kernel mappings
mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes
arm64/mm: Support huge pte-mapped pages in vmap
mm/vmalloc: Gracefully unmap huge ptes
mm/vmalloc: Warn on improper use of vunmap_range()
arm64/mm: Hoist barriers out of set_ptes_anysz() loop
...
|
|
Call acpi_put_table() before returning the error code.
Fixes: e54b1dc1c4f0 ("ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aDVRBok33LZhXcId@stanley.mountain
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
A single call to einj_get_available_error_type() in init function is
sufficient to save the return value in a global variable to be used
later in various places in the code.
This change has no functional impact, but only removes unnecessary
redundant function calls.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
Link: https://patch.msgid.link/20250506213814.2365788-5-zaidal@os.amperecomputing.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
SDEI usually initialize with the ACPI table, but on platforms where
ACPI is not used, the SDEI feature can still be used to handle
specific firmware calls or other customized purposes. Therefore, it
is not necessary for ARM_SDE_INTERFACE to depend on ACPI_APEI_GHES.
In commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES
in acpi_init()"), to make APEI ready earlier, sdei_init was moved
into acpi_ghes_init instead of being a standalone initcall, adding
ACPI_APEI_GHES dependency to ARM_SDE_INTERFACE. This restricts the
flexibility and usability of SDEI.
This patch corrects the dependency in Kconfig and splits sdei_init()
into two separate functions: sdei_init() and acpi_sdei_init().
sdei_init() will be called by arch_initcall and will only initialize
the platform driver, while acpi_sdei_init() will initialize the
device from acpi_ghes_init() when ACPI is ready. This allows the
initialization of SDEI without ACPI_APEI_GHES enabled.
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250507045757.2658795-1-quic_hyiwei@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device
interface") updated the APEI error injection driver to use the faux
device interface and now for devices that don't support ACPI, the
following error message is seen on boot:
ERR KERN faux acpi-einj: probe did not succeed, tearing down the device
The APEI error injection driver returns -ENODEV in the probe function
if ACPI is not supported and so after transitioning the driver to the
faux device interface, the error returned from the probe now causes the
above error message to be displayed.
Fix this by moving the code that detects if ACPI is supported to the
einj_init() function to fix the false error message displayed for
devices that don't support ACPI.
Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250501124621.1251450-1-jonathanh@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The APEI error injection driver does not require the creation of a
platform device. Originally, this approach was chosen for simplicity
when the driver was first implemented.
With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the driver to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will simplify the code, reducing its footprint
while maintaining functionality.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250317-plat2faux_dev-v1-8-5fe67c085ad5@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records. Introduce support for handling and logging CXL Protocol
errors.
The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them
to trace FW-First Protocol errors.
Since the CXL code is required to be called from process context and
GHES is in interrupt context, use workqueues for processing.
Similar to CXL CPER event handling, use kfifo to handle errors as it
simplifies queue processing by providing lock free fifo operations.
Add the ability for the CXL sub-system to register a workqueue to
process CXL CPER protocol errors.
[DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()]
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add support in GHES to detect and process CXL CPER Protocol errors, as
defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
information, including RAS capabilities and severity, for further
handling.
These cached CXL CPER records will later be processed by workqueues
within the CXL subsystem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20250123084421.127697-5-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The GHES driver overrides the panic= setting by force-rebooting the
system after a fatal hw error has been reported. The intent being that
such an error would be reported earlier.
However, this is not optimal when a hard-to-debug issue requires long
time to reproduce and when that happens, the box will get rebooted after
30 seconds and thus destroy the whole hw context of when the error
happened.
So rip out the default GHES panic timeout and honor the global one.
In the panic disabled (panic=0) case, the error will still be logged to
dmesg for later inspection and if panic after a hw error is really
required, then that can be controlled the usual way - use panic= on the
cmdline or set it in the kernel .config's CONFIG_PANIC_TIMEOUT.
Reported-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20250113125224.GFZ4UMiNtWIJvgpveU@fat_crate.local
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/acpi to use .remove(), with
the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/9ee1a9813f53698be62aab9d810b2d97a2a9f186.1731397722.git.u.kleine-koenig@baylibre.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fix from Ira Weiny:
- Fix calculation for SBDF in error injection
* tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
EINJ, CXL: Fix CXL device SBDF calculation
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
The SBDF of the target CXL 2.0 compliant root port is required to inject a CXL
protocol error as per ACPI 6.5. The SBDF given has to be in the
following format:
31 24 23 16 15 11 10 8 7 0
+-------------------------------------------------+
| segment | bus | device | function | reserved |
+-------------------------------------------------+
The SBDF calculated in cxl_dport_get_sbdf() doesn't account for
the reserved bits currently, causing the wrong SBDF to be used.
Fix said calculation to properly shift the SBDF.
Without this fix, error injection into CXL 2.0 root ports through the
CXL debugfs interface (<debugfs>/cxl) is broken. Injection
through the legacy interface (<debugfs>/apei/einj/) will still work
because the SBDF is manually provided by the user.
Fixes: 12fb28ea6b1cf ("EINJ: Add CXL error type support")
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Reviewed-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Link: https://patch.msgid.link/20240927163428.366557-1-Benjamin.Cheatham@amd.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) updates from Dave Jiang:
"Major changes address HDM decoder initialization from DVSEC ranges,
refactoring the code related to cxl mailboxes to be independent of the
memory devices, and adding support for shared upstream link
access_coordinate calculation, as well as a change to remove locking
from memory notifier callback.
In addition, a number of misc cleanups and refactoring of the code are
also included.
Address HDM decoder initialization from DVSEC ranges:
- Only register non-zero DVSEC ranges
- Remove duplicate implementation of waiting for memory_info_valid
- Simplify the checking of mem_enabled in cxl_hdm_decode_init()
Refactor the code related to cxl mailboxes to be independent of the memory devices:
- Move cxl headers in include/linux/ to include/cxl
- Move all mailbox related data to 'struct cxl_mailbox'
- Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of
memory device state
Add support for shared upstream link access_coordinate calculation for
configurations that have multiple targets under a switch or a root
port where the aggregated bandwidth can be greater than the upstream
link of the switch/RP upstream link:
- Preserve the CDAT access_coordinate from an endpoint
- Add the support for shared upstream link access_coordinate calculation
- Add documentation to explain how the calculations are done
Remove locking from memory notifier callback.
Misc cleanups:
- Convert devm_cxl_add_root() to return using ERR_CAST()
- cxl_test use dev_is_platform() instead of open coding
- Remove duplicate include of header core.h in core/cdat.c
- use scoped resource management to drop put_device() for cxl_port
- Use scoped_guard to drop device_lock() for cxl_port
- Refactor __devm_cxl_add_port() to drop gotos
- Rename cxl_setup_parent_dport to cxl_dport_init_aer and
cxl_dport_map_regs() to cxl_dport_map_ras()
- Refactor cxl_dport_init_aer() to be more concise
- Remove duplicate host_bridge->native_aer checking in
cxl_dport_init_ras_reporting()
- Fix comment for cxl_query_cmd()"
* tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
cxl: Add documentation to explain the shared link bandwidth calculation
cxl: Calculate region bandwidth of targets with shared upstream link
cxl: Preserve the CDAT access_coordinate for an endpoint
cxl: Fix comment regarding cxl_query_cmd() return data
cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input
cxl: Move mailbox related bits to the same context
cxl: move cxl headers to new include/cxl/ directory
cxl/region: Remove lock from memory notifier callback
cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init()
cxl/pci: Check Mem_info_valid bit for each applicable DVSEC
cxl/pci: Remove duplicated implementation of waiting for memory_info_valid
cxl/pci: Fix to record only non-zero ranges
cxl/pci: Remove duplicate host_bridge->native_aer checking
cxl/pci: cxl_dport_map_rch_aer() cleanup
cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()
cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern
cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
cxl/port: Use __free() to drop put_device() for cxl_port
cxl: Remove duplicate included header file core.h
tools/testing/cxl: Use dev_is_platform()
...
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Group all cxl related kernel headers into include/cxl/ directory.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20240905223711.1990186-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Merge ACPI EC driver fixes, an ACPI APEI fix and PNP fixes for
6.10-rc3:
- Fix error handling during EC operation region accesses in the ACPI EC
driver (Armin Wolf).
- Fix a memory leak in the APEI error injection driver introduced
during its converion to a platform driver (Dan Williams).
- Fix build failures related to the dev_is_pnp() macro by redefining it
as a proper function and exporting it to modules as appropriate and
unexport pnp_bus_type which need not be exported any more (Andy
Shevchenko).
* acpi-ec:
ACPI: EC: Avoid returning AE_OK on errors in address space handler
ACPI: EC: Abort address space access upon error
* acpi-apei:
ACPI: APEI: EINJ: Fix einj_dev release leak
* pnp:
PNP: Hide pnp_bus_type from the non-PNP code
PNP: Make dev_is_pnp() to be a function and export it for modules
|
|
The platform driver conversion of EINJ mistakenly used
platform_device_del() to unwind platform_device_register_full() at
module exit. This leads to a small leak of one 'struct platform_device'
instance per module load/unload cycle. Switch to
platform_device_unregister() which performs both device_del() and final
put_device().
Fixes: 5621fafaac00 ("EINJ: Migrate to a platform driver")
Cc: 6.9+ <stable@vger.kernel.org> # 6.9+
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
- Three CXL mailbox passthrough commands are added to support the
populating and clearing of vendor debug logs:
- Get Log Capabilities
- Get Supported Log Sub-List Commands
- Clear Log
- Add support of Device Phyiscal Address (DPA) to Host Physical Address
(HPA) translation for CXL events of cxl_dram and cxl_general media.
This allows user space to figure out which CXL region the event
occured via trace event.
- Connect CXL to CPER reporting.
If a device is configured for firmware first, CXL event records are
not sent directly to the host. Those records are reported through EFI
Common Platform Error Records (CPER). Add support to route the CPER
records through the CXL sub-system in order to provide DPA to HPA
translation and also event decoding and tracing. This is useful for
users to determine which system issues may correspond to specific
hardware events.
- A number of misc cleanups and fixes:
- Fix for compile warning of cxl_security_ops
- Add debug message for invalid interleave granularity
- Enhancement to cxl-test event testing
- Add dev_warn() on unsupported mixed mode decoder
- Fix use of phys_to_target_node() for x86
- Use helper function for decoder enum instead of open coding
- Include missing headers for cxl-event
- Fix MAINTAINERS file entry
- Fix cxlr_pmem memory leak
- Cleanup __cxl_parse_cfmws via scope-based resource menagement
- Convert cxl_pmem_region_alloc() to scope-based resource management
* tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
cxl/cper: Remove duplicated GUID defines
cxl/cper: Fix non-ACPI-APEI-GHES build
cxl/pci: Process CPER events
acpi/ghes: Process CXL Component Events
cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management
cxl/acpi: Cleanup __cxl_parse_cfmws()
cxl/region: Fix cxlr_pmem leaks
cxl/core: Add region info to cxl_general_media and cxl_dram events
cxl/region: Move cxl_trace_hpa() work to the region driver
cxl/region: Move cxl_dpa_to_region() work to the region driver
cxl/trace: Correct DPA field masks for general_media & dram events
MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK
cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>
cxl/hdm: Debug, use decoder name function
cxl: Fix use of phys_to_target_node() for x86
cxl/hdm: dev_warn() on unsupported mixed mode decoder
cxl/test: Enhance event testing
cxl/hdm: Add debug message for invalid interleave granularity
cxl: Fix compile warning for cxl_security_ops extern
cxl/mbox: Add Clear Log mailbox command
...
|