summaryrefslogtreecommitdiff
path: root/arch/x86/boot/compressed
AgeCommit message (Collapse)Author
2025-09-05Merge branch 'x86/apic' into x86/sev, to resolve conflictIngo Molnar
Conflicts: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-09-03x86/boot: Move startup code out of __head sectionArd Biesheuvel
Move startup code out of the __head section, now that this no longer has a special significance. Move everything into .text or .init.text as appropriate, so that startup code is not kept around unnecessarily. [ bp: Fold in hunk to fix 32-bit CPU hotplug: Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509022207.56fd97f4-lkp@intel.com ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-45-ardb+git@google.com
2025-09-03efistub/x86: Remap inittext read-execute when neededArd Biesheuvel
Recent EFI x86 systems are more strict when it comes to mapping boot images, and require that mappings are either read-write or read-execute. Now that the boot code is being cleaned up and refactored, most of it is being moved into .init.text [where it arguably belongs] but that implies that when booting on such strict EFI firmware, we need to take care to map .init.text (and the .altinstr_aux section that follows it) read-execute as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-44-ardb+git@google.com
2025-09-03x86/sev: Provide PIC aliases for SEV related data objectsArd Biesheuvel
Provide PIC aliases for data objects that are shared between the SEV startup code and the SEV code that executes later. This is needed so that the confined startup code is permitted to access them. This requires some of these variables to be moved into a source file that is not part of the startup code, as the PIC alias is already implied, and exporting variables in the opposite direction is not supported. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-36-ardb+git@google.com
2025-09-03x86/boot: Drop redundant RMPADJUST in SEV SVSM presence checkArd Biesheuvel
snp_vmpl will be assigned a non-zero value when executing at a VMPL other than 0, and this is inferred from a call to RMPADJUST, which only works when running at VMPL0. This means that testing snp_vmpl is sufficient, and there is no need to perform the same check again. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-34-ardb+git@google.com
2025-09-03x86/sev: Use boot SVSM CA for all startup and init codeArd Biesheuvel
To avoid having to reason about whether or not to use the per-CPU SVSM calling area when running startup and init code on the boot CPU, reuse the boot SVSM calling area as the per-CPU area for the BSP. Thus, remove the need to make the per-CPU variables and associated state in sev_cfg accessible to the startup code once confined. [ bp: Massage commit message. ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-33-ardb+git@google.com
2025-09-03x86/sev: Pass SVSM calling area down to early page state change APIArd Biesheuvel
The early page state change API is mostly only used very early, when only the boot time SVSM calling area is in use. However, this API is also called by the kexec finishing code, which runs very late, and potentially from a different CPU (which uses a different calling area). To avoid pulling the per-CPU SVSM calling area pointers and related SEV state into the startup code, refactor the page state change API so the SVSM calling area virtual and physical addresses can be provided by the caller. No functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-32-ardb+git@google.com
2025-09-03x86/sev: Share implementation of MSR-based page state changeArd Biesheuvel
Both the decompressor and the SEV startup code implement the exact same sequence for invoking the MSR based communication protocol to effectuate a page state change. Before tweaking the internal APIs used in both versions, merge them and share them so those tweaks are only needed in a single place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-31-ardb+git@google.com
2025-09-03x86/sev: Avoid global variable to store virtual address of SVSM areaArd Biesheuvel
The boottime SVSM calling area is used both by the startup code running from a 1:1 mapping, and potentially later on running from the ordinary kernel mapping. This SVSM calling area is statically allocated, and so its physical address doesn't change. However, its virtual address depends on the calling context (1:1 mapping or kernel virtual mapping), and even though the variable that holds the virtual address of this calling area gets updated from 1:1 address to kernel address during the boot, it is hard to reason about why this is guaranteed to be safe. So instead, take the RIP-relative address of the boottime SVSM calling area whenever its virtual address is required, and only use a global variable for the physical address. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-30-ardb+git@google.com
2025-09-03x86/sev: Move GHCB page based HV communication out of startup codeArd Biesheuvel
Both the decompressor and the core kernel implement an early #VC handler, which only deals with CPUID instructions, and full featured one, which can handle any #VC exception. The former communicates with the hypervisor using the MSR based protocol, whereas the latter uses a shared GHCB page, which is configured a bit later during the boot, when the kernel runs from its ordinary virtual mapping, rather than the 1:1 mapping that the startup code uses. Accessing this shared GHCB page from the core kernel's startup code is problematic, because it involves converting the GHCB address provided by the caller to a physical address. In the startup code, virtual to physical address translations are problematic, given that the virtual address might be a 1:1 mapped address, and such translations should therefore be avoided. This means that exposing startup code dealing with the GHCB to callers that execute from the ordinary kernel virtual mapping should be avoided too. So move all GHCB page based communication out of the startup code, now that all communication occurring before the kernel virtual mapping is up relies on the MSR protocol only. As an exception, add a flag representing the need to apply the coherency fix in order to avoid exporting CPUID* helpers because of the code running too early for the *cpu_has* infrastructure. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-29-ardb+git@google.com
2025-09-01x86/sev: Indicate the SEV-SNP guest supports Secure AVICNeeraj Upadhyay
Now that Secure AVIC support is complete, make it part of to the SNP present features. Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828113225.209174-1-Neeraj.Upadhyay@amd.com
2025-08-31x86/sev: Run RMPADJUST on SVSM calling area page to test VMPLArd Biesheuvel
Determining the VMPL at which the kernel runs involves performing a RMPADJUST operation on an arbitrary page of memory, and observing whether it succeeds. The use of boot_ghcb_page in the core kernel in this case is completely arbitrary, but results in the need to provide a PIC alias for it. So use boot_svsm_ca_page instead, which already needs this alias for other reasons. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-28-ardb+git@google.com
2025-08-31x86/sev: Use MSR protocol only for early SVSM PVALIDATE callArd Biesheuvel
The early page state change API performs an SVSM call to PVALIDATE each page when running under a SVSM, and this involves either a GHCB page based call or a call based on the MSR protocol. The GHCB page based variant involves VA to PA translation of the GHCB address, and this is best avoided in the startup code, where virtual addresses are ambiguous (1:1 or kernel virtual). As this is the last remaining occurrence of svsm_perform_call_protocol() in the startup code, switch to the MSR protocol exclusively in this particular case, so that the GHCB based plumbing can be moved out of the startup code entirely in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/20250828102202.1849035-27-ardb+git@google.com
2025-08-28x86/apic: Add new driver for Secure AVICNeeraj Upadhyay
The Secure AVIC feature provides SEV-SNP guests hardware acceleration for performance sensitive APIC accesses while securely managing the guest-owned APIC state through the use of a private APIC backing page. This helps prevent the hypervisor from generating unexpected interrupts for a vCPU or otherwise violate architectural assumptions around the APIC behavior. Add a new x2APIC driver that will serve as the base of the Secure AVIC support. It is initially the same as the x2APIC physical driver (without IPI callbacks), but will be modified as features are implemented. As the new driver does not implement Secure AVIC features yet, if the hypervisor sets the Secure AVIC bit in SEV_STATUS, maintain the existing behavior to enforce the guest termination. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828070334.208401-2-Neeraj.Upadhyay@amd.com
2025-06-21x86/efi: Implement support for embedding SBAT data for x86Vitaly Kuznetsov
Similar to zboot architectures, implement support for embedding SBAT data for x86. Put '.sbat' section in between '.data' and '.text' as the former also covers '.bss' and '.pgtable' and thus must be the last one in the file. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/20250603091951.57775-1-vkuznets@redhat.com
2025-05-31Merge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-17x86/mm/64: Make 5-level paging support unconditionalKirill A. Shutemov
Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. All major x86 Linux distributions have the feature enabled. Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
2025-05-15x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID headerAhmed S. Darwish
The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
2025-05-12x86/boot: make sure KASLR does not step over KHO preserved memoryAlexander Graf
During kexec handover (KHO) memory contains data that should be preserved and this data would be consumed by kexec'ed kernel. To make sure that the preserved memory is not overwritten, KHO uses "scratch regions" to bootstrap kexec'ed kernel. These regions are guaranteed to not have any memory that KHO would preserve and are used as the only memory the kernel sees during the early boot. The scratch regions are passed in the setup_data by the first kernel with other KHO parameters. If the setup_data contains the KHO parameters, limit randomization to scratch areas only to make sure preserved memory won't get overwritten. Since all the pointers in setup_data are represented by u64, they require double casting (first to unsigned long and then to the actual pointer type) to compile on 32-bits. This looks goofy out of context, but it is unfortunately the way that this is handled across the tree. There are at least a dozen instances of casting like this. Link: https://lkml.kernel.org/r/20250509074635.3187114-14-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-05x86/sev: Disentangle #VC handling code from startup codeArd Biesheuvel
Most of the SEV support code used to reside in a single C source file that was included in two places: the core kernel, and the decompressor. The code that is actually shared with the decompressor was moved into a separate, shared source file under startup/, on the basis that the decompressor also executes from the early 1:1 mapping of memory. However, while the elaborate #VC handling and instruction decoding that it involves is also performed by the decompressor, it does not actually occur in the core kernel at early boot, and therefore, does not need to be part of the confined early startup code. So split off the #VC handling code and move it back into arch/x86/coco where it came from, into another C source file that is included from both the decompressor and the core kernel. Code movement only - no functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-31-ardb+git@google.com
2025-05-04x86/sev: Move instruction decoder into separate source fileArd Biesheuvel
As a first step towards disentangling the SEV #VC handling code -which is shared between the decompressor and the core kernel- from the SEV startup code, move the decompressor's copy of the instruction decoder into a separate source file. Code movement only - no functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-30-ardb+git@google.com
2025-05-04x86/sev: Make sev_snp_enabled() a static functionArd Biesheuvel
sev_snp_enabled() is no longer used outside of the source file that defines it, so make it static and drop the extern declarations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-29-ardb+git@google.com
2025-05-04Merge branch 'x86/urgent' into x86/boot, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-04x86/boot/sev: Support memory acceptance in the EFI stub under SVSMArd Biesheuvel
Commit: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
2025-04-22x86/boot: Move SEV startup code into startup/Ard Biesheuvel
Move the SEV startup code into arch/x86/boot/startup/, where it will reside along with other code that executes extremely early, and therefore needs to be built in a special manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com
2025-04-22x86/sev: Split off startup code from core codeArd Biesheuvel
Disentangle the SEV core code and the SEV code that is called during early boot. The latter piece will be moved into startup/ in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com
2025-04-22Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and ↵Ingo Molnar
upstream fixes In particular we need this fix before applying subsequent changes: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-18x86/boot/sev: Avoid shared GHCB page for early memory acceptanceArd Biesheuvel
Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
2025-04-18x86/boot: Remove semicolon from "rep" prefixesUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Mares <mj@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250418071437.4144391-1-ubizjak@gmail.com
2025-04-12x86/sev: Prepare for splitting off early SEV codeArd Biesheuvel
Prepare for splitting off parts of the SEV core.c source file into a file that carries code that must tolerate being called from the early 1:1 mapping. This will allow special build-time handling of thise code, to ensure that it gets generated in a way that is compatible with the early execution context. So create a de-facto internal SEV API and put the definitions into sev-internal.h. No attempt is made to allow this header file to be included in arbitrary other sources - this is explicitly not the intent. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com
2025-04-12x86/boot: Move the early GDT/IDT setup code into startup/Ard Biesheuvel
Move the early GDT/IDT setup code that runs long before the kernel virtual mapping is up into arch/x86/boot/startup/, and build it in a way that ensures that the code tolerates being called from the 1:1 mapping of memory. The code itself is left unchanged by this patch. Also tweak the sed symbol matching pattern in the decompressor to match on lower case 't' or 'b', as these will be emitted by Clang for symbols with hidden linkage. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-15-ardb+git@google.com
2025-04-06x86/boot: Move the 5-level paging trampoline into /startupArd Biesheuvel
The 5-level paging trampoline is used by both the EFI stub and the traditional decompressor. Move it out of the decompressor sources into the newly minted arch/x86/boot/startup/ sub-directory which will hold startup code that may be shared between the decompressor, the EFI stub and the kernel proper, and needs to tolerate being called during early boot, before the kernel virtual mapping has been created. This will allow the 5-level paging trampoline to be used by EFI boot images such as zboot that omit the traditional decompressor entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401133416.1436741-10-ardb+git@google.com
2025-04-06x86/boot/compressed: Merge the local pgtable.h include into <asm/boot.h>Ard Biesheuvel
Merge the local include "pgtable.h" -which declares the API of the 5-level paging trampoline- into <asm/boot.h> so that its implementation in la57toggle.S as well as the calling code can be decoupled from the traditional decompressor. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401133416.1436741-9-ardb+git@google.com
2025-04-05Merge tag 'kbuild-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Improve performance in gendwarfksyms - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS - Support CONFIG_HEADERS_INSTALL for ARCH=um - Use more relative paths to sources files for better reproducibility - Support the loong64 Debian architecture - Add Kbuild bash completion - Introduce intermediate vmlinux.unstripped for architectures that need static relocations to be stripped from the final vmlinux - Fix versioning in Debian packages for -rc releases - Treat missing MODULE_DESCRIPTION() as an error - Convert Nios2 Makefiles to use the generic rule for built-in DTB - Add debuginfo support to the RPM package * tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits) kbuild: rpm-pkg: build a debuginfo RPM kconfig: merge_config: use an empty file as initfile nios2: migrate to the generic rule for built-in DTB rust: kbuild: skip `--remap-path-prefix` for `rustdoc` kbuild: pacman-pkg: hardcode module installation path kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally modpost: require a MODULE_DESCRIPTION() kbuild: make all file references relative to source root x86: drop unnecessary prefix map configuration kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS kbuild: Add a help message for "headers" kbuild: deb-pkg: remove "version" variable in mkdebian kbuild: deb-pkg: fix versioning for -rc releases Documentation/kbuild: Fix indentation in modules.rst example x86: Get rid of Makefile.postlink kbuild: Create intermediate vmlinux build with relocations preserved kbuild: Introduce Kconfig symbol for linking vmlinux with relocations kbuild: link-vmlinux.sh: Make output file name configurable kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y Revert "kheaders: Ignore silly-rename files" ...
2025-03-29Merge tag 'efi-next-for-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Decouple mixed mode startup code from the traditional x86 decompressor - Revert zero-length file hack in efivarfs - Prevent EFI zboot from using the CopyMem/SetMem boot services after ExitBootServices() - Update EFI zboot to use the ZLIB/ZSTD library interfaces directly * tag 'efi-next-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/libstub: Avoid legacy decompressor zlib/zstd wrappers efi/libstub: Avoid CopyMem/SetMem EFI services after ExitBootServices efi: efibc: change kmalloc(size * count, ...) to kmalloc_array() efivarfs: Revert "allow creation of zero length files" x86/efi/mixed: Move mixed mode startup code into libstub x86/efi/mixed: Simplify and document thunking logic x86/efi/mixed: Remove dependency on legacy startup_32 code x86/efi/mixed: Set up 1:1 mapping of lower 4GiB in the stub x86/efi/mixed: Factor out and clean up long mode entry x86/efi/mixed: Check CPU compatibility without relying on verify_cpu() x86/efistub: Merge PE and handover entrypoints
2025-03-24Merge tag 'x86-boot-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot code updates from Ingo Molnar: - Memblock setup and other early boot code cleanups (Mike Rapoport) - Export e820_table_kexec[] to sysfs (Dave Young) - Baby steps of adding relocate_kernel() debugging support (David Woodhouse) - Replace open-coded parity calculation with parity8() (Kuan-Wei Chiu) - Move the LA57 trampoline to separate source file (Ard Biesheuvel) - Misc micro-optimizations (Uros Bizjak) - Drop obsolete E820_TYPE_RESERVED_KERN and related code (Mike Rapoport) * tag 'x86-boot-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Add relocate_kernel() debugging support: Load a GDT x86/boot: Move the LA57 trampoline to separate source file x86/boot: Do not test if AC and ID eflags are changeable on x86_64 x86/bootflag: Replace open-coded parity calculation with parity8() x86/bootflag: Micro-optimize sbf_write() x86/boot: Add missing has_cpuflag() prototype x86/kexec: Export e820_table_kexec[] to sysfs x86/boot: Change some static bootflag functions to bool x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related code x86/boot: Split parsing of boot_params into the parse_boot_params() helper function x86/boot: Split kernel resources setup into the setup_kernel_resources() helper function x86/boot: Move setting of memblock parameters to e820__memblock_setup()
2025-03-24Merge tag 'x86-build-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: - Drop CRC-32 checksum and the build tool that generates it (Ard Biesheuvel) - Fix broken copy command in genimage.sh when making isoimage (Nir Lichtman) * tag 'x86-build-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Add back some padding for the CRC-32 checksum x86/boot: Drop CRC-32 checksum and the build tool that generates it x86/build: Fix broken copy command in genimage.sh when making isoimage
2025-03-22x86: drop unnecessary prefix map configurationThomas Weißschuh
The toplevel Makefile already provides -fmacro-prefix-map as part of KBUILD_CPPFLAGS. In contrast to the KBUILD_CFLAGS and KBUILD_AFLAGS variables, KBUILD_CPPFLAGS is not redefined in the architecture specific Makefiles. Therefore the toplevel KBUILD_CPPFLAGS do apply just fine, to both C and ASM sources. The custom configuration was necessary when it was added in commit 9e2276fa6eb3 ("arch/x86/boot: Use prefix map to avoid embedded paths") but has since become unnecessary in commit a716bd743210 ("kbuild: use -fmacro-prefix-map for .S sources"). Drop the now unnecessary custom prefix map configuration. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-03-19Merge tag 'v6.14-rc7' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-17x86: Get rid of Makefile.postlinkArd Biesheuvel
Instead of generating the vmlinux.relocs file (needed by the decompressor build to construct the KASLR relocation tables) as a vmlinux postlink step, which is dubious because it depends on data that is stripped from vmlinux before the build completes, generate it from vmlinux.unstripped, which has been introduced specifically for this purpose. This ensures that each artifact is rebuilt as needed, rather than as a side effect of another build rule. This effectively reverts commit 9d9173e9ceb6 ("x86/build: Avoid relocation information in final vmlinux") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-03-13x86/boot: Move the LA57 trampoline to separate source fileArd Biesheuvel
To permit the EFI stub to call this code even when building the kernel without the legacy decompressor, move the trampoline out of the latter's startup code. This is part of an ongoing WIP effort on my part to make the existing, generic EFI zboot format work on x86 as well. No functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250313120324.1095968-2-ardb+git@google.com
2025-03-12x86/boot: Add back some padding for the CRC-32 checksumArd Biesheuvel
Even though no uses of the bzImage CRC-32 checksum are known, ensure that the last 4 bytes of the image are unused zero bytes, so that the checksum can be generated post-build if needed. [ mingo: Added the 'obsolete' qualifier to the comment. ] Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250312081204.521411-2-ardb+git@google.com
2025-03-07x86/boot: Drop CRC-32 checksum and the build tool that generates itArd Biesheuvel
Apart from some sanity checks on the size of setup.bin, the only remaining task carried out by the arch/x86/boot/tools/build.c build tool is generating the CRC-32 checksum of the bzImage. This feature was added in commit 7d6e737c8d2698b6 ("x86: add a crc32 checksum to the kernel image.") without any motivation (or any commit log text, for that matter). This checksum is not verified by any known bootloader, and given that a) the checksum of the entire bzImage is reported by most tools (zlib, rhash) as 0xffffffff and not 0x0 as documented, b) the checksum is corrupted when the image is signed for secure boot, which means that no distro ships x86 images with valid CRCs, it seems quite unlikely that this checksum is being used, so let's just drop it, along with the tool that generates it. Instead, use simple file concatenation and truncation to combine the two pieces into bzImage, and replace the checks on the size of the setup block with a couple of ASSERT()s in the linker script. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250307164801.885261-2-ardb+git@google.com
2025-03-07Merge branch 'x86-mixed-mode' into efi/nextArd Biesheuvel
2025-03-06x86/boot: Sanitize boot params before parsing command lineArd Biesheuvel
The 5-level paging code parses the command line to look for the 'no5lvl' string, and does so very early, before sanitize_boot_params() has been called and has been given the opportunity to wipe bogus data from the fields in boot_params that are not covered by struct setup_header, and are therefore supposed to be initialized to zero by the bootloader. This triggers an early boot crash when using syslinux-efi to boot a recent kernel built with CONFIG_X86_5LEVEL=y and CONFIG_EFI_STUB=n, as the 0xff padding that now fills the unused PE/COFF header is copied into boot_params by the bootloader, and interpreted as the top half of the command line pointer. Fix this by sanitizing the boot_params before use. Note that there is no harm in calling this more than once; subsequent invocations are able to spot that the boot_params have already been cleaned up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> # v6.1+ Link: https://lore.kernel.org/r/20250306155915.342465-2-ardb+git@google.com Closes: https://lore.kernel.org/all/202503041549.35913.ulrich.gemkow@ikr.uni-stuttgart.de
2025-02-21x86/efi/mixed: Move mixed mode startup code into libstubArd Biesheuvel
The EFI mixed mode code has been decoupled from the legacy decompressor, in order to be able to reuse it with generic EFI zboot images for x86. Move the source file into the libstub source directory to facilitate this. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Simplify and document thunking logicArd Biesheuvel
Now that the GDT/IDT and data segment selector preserve/restore logic has been removed from the boot-time EFI mixed mode thunking routines, the remaining logic to handle the function arguments can be simplified: the setup of the arguments on the stack can be moved into the 32-bit callee, which is able to use a more idiomatic sequence of PUSH instructions. This, in turn, allows the far call and far return to be issued using plain LCALL and LRET instructions, removing the need to set up the return explicitly. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Remove dependency on legacy startup_32 codeArd Biesheuvel
The EFI mixed mode startup code calls into startup_32 in the legacy decompressor with a mocked up boot_params struct, only to get it to set up the 1:1 mapping of the lower 4 GiB of memory and switch to a GDT that supports 64-bit mode. In order to be able to reuse the EFI mixed mode startup code in EFI zboot images, which do not incorporate the legacy decompressor code, decouple it, by dealing with the GDT and IDT directly. Doing so makes it possible to construct a GDT that is compatible with the one the firmware uses, with one additional entry for a 64-bit mode code segment appended. This removes the need entirely to switch between GDTs and IDTs or data segment selector values and all of this code can be removed. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Set up 1:1 mapping of lower 4GiB in the stubArd Biesheuvel
In preparation for dropping the dependency on startup_32 entirely in the next patch, add the code that sets up the 1:1 mapping of the lower 4 GiB of system RAM to the mixed mode stub. The reload of CR3 after the long mode switch will be removed in a subsequent patch, when it is no longer needed. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-21x86/efi/mixed: Factor out and clean up long mode entryArd Biesheuvel
Entering long mode involves setting the EFER_LME and CR4.PAE bits before enabling paging by setting CR0.PG bit. It also involves disabling interrupts, given that the firmware's 32-bit IDT becomes invalid as soon as the CPU transitions into long mode. Reloading the CR3 register is not necessary at boot time, given that the EFI firmware as well as the kernel's EFI stub use a 1:1 mapping of the 32-bit addressable memory in the system. Break out this code into a separate helper for clarity, and so that it can be reused in a subsequent patch. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>