| Age | Commit message (Collapse) | Author |
|
Conflicts:
arch/x86/include/asm/sev-internal.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Move startup code out of the __head section, now that this no longer has
a special significance. Move everything into .text or .init.text as
appropriate, so that startup code is not kept around unnecessarily.
[ bp: Fold in hunk to fix 32-bit CPU hotplug:
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509022207.56fd97f4-lkp@intel.com ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-45-ardb+git@google.com
|
|
Recent EFI x86 systems are more strict when it comes to mapping boot
images, and require that mappings are either read-write or read-execute.
Now that the boot code is being cleaned up and refactored, most of it is
being moved into .init.text [where it arguably belongs] but that implies
that when booting on such strict EFI firmware, we need to take care to
map .init.text (and the .altinstr_aux section that follows it)
read-execute as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-44-ardb+git@google.com
|
|
Provide PIC aliases for data objects that are shared between the SEV startup
code and the SEV code that executes later. This is needed so that the confined
startup code is permitted to access them.
This requires some of these variables to be moved into a source file that is
not part of the startup code, as the PIC alias is already implied, and
exporting variables in the opposite direction is not supported.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-36-ardb+git@google.com
|
|
snp_vmpl will be assigned a non-zero value when executing at a VMPL other than
0, and this is inferred from a call to RMPADJUST, which only works when
running at VMPL0.
This means that testing snp_vmpl is sufficient, and there is no need to
perform the same check again.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-34-ardb+git@google.com
|
|
To avoid having to reason about whether or not to use the per-CPU SVSM calling
area when running startup and init code on the boot CPU, reuse the boot SVSM
calling area as the per-CPU area for the BSP.
Thus, remove the need to make the per-CPU variables and associated state in
sev_cfg accessible to the startup code once confined.
[ bp: Massage commit message. ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-33-ardb+git@google.com
|
|
The early page state change API is mostly only used very early, when only the
boot time SVSM calling area is in use. However, this API is also called by the
kexec finishing code, which runs very late, and potentially from a different
CPU (which uses a different calling area).
To avoid pulling the per-CPU SVSM calling area pointers and related SEV state
into the startup code, refactor the page state change API so the SVSM calling
area virtual and physical addresses can be provided by the caller.
No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-32-ardb+git@google.com
|
|
Both the decompressor and the SEV startup code implement the exact same
sequence for invoking the MSR based communication protocol to effectuate
a page state change.
Before tweaking the internal APIs used in both versions, merge them and
share them so those tweaks are only needed in a single place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-31-ardb+git@google.com
|
|
The boottime SVSM calling area is used both by the startup code running from
a 1:1 mapping, and potentially later on running from the ordinary kernel
mapping.
This SVSM calling area is statically allocated, and so its physical address
doesn't change. However, its virtual address depends on the calling context
(1:1 mapping or kernel virtual mapping), and even though the variable that
holds the virtual address of this calling area gets updated from 1:1 address
to kernel address during the boot, it is hard to reason about why this is
guaranteed to be safe.
So instead, take the RIP-relative address of the boottime SVSM calling area
whenever its virtual address is required, and only use a global variable for
the physical address.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-30-ardb+git@google.com
|
|
Both the decompressor and the core kernel implement an early #VC handler,
which only deals with CPUID instructions, and full featured one, which can
handle any #VC exception.
The former communicates with the hypervisor using the MSR based protocol,
whereas the latter uses a shared GHCB page, which is configured a bit later
during the boot, when the kernel runs from its ordinary virtual mapping,
rather than the 1:1 mapping that the startup code uses.
Accessing this shared GHCB page from the core kernel's startup code is
problematic, because it involves converting the GHCB address provided by the
caller to a physical address. In the startup code, virtual to physical address
translations are problematic, given that the virtual address might be a 1:1
mapped address, and such translations should therefore be avoided.
This means that exposing startup code dealing with the GHCB to callers that
execute from the ordinary kernel virtual mapping should be avoided too. So
move all GHCB page based communication out of the startup code, now that all
communication occurring before the kernel virtual mapping is up relies on the
MSR protocol only.
As an exception, add a flag representing the need to apply the coherency
fix in order to avoid exporting CPUID* helpers because of the code
running too early for the *cpu_has* infrastructure.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-29-ardb+git@google.com
|
|
Now that Secure AVIC support is complete, make it part of to the SNP present
features.
Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/20250828113225.209174-1-Neeraj.Upadhyay@amd.com
|
|
Determining the VMPL at which the kernel runs involves performing a RMPADJUST
operation on an arbitrary page of memory, and observing whether it succeeds.
The use of boot_ghcb_page in the core kernel in this case is completely
arbitrary, but results in the need to provide a PIC alias for it. So use
boot_svsm_ca_page instead, which already needs this alias for other reasons.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-28-ardb+git@google.com
|
|
The early page state change API performs an SVSM call to PVALIDATE each page
when running under a SVSM, and this involves either a GHCB page based call or
a call based on the MSR protocol.
The GHCB page based variant involves VA to PA translation of the GHCB address,
and this is best avoided in the startup code, where virtual addresses are
ambiguous (1:1 or kernel virtual).
As this is the last remaining occurrence of svsm_perform_call_protocol() in
the startup code, switch to the MSR protocol exclusively in this particular
case, so that the GHCB based plumbing can be moved out of the startup code
entirely in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-27-ardb+git@google.com
|
|
The Secure AVIC feature provides SEV-SNP guests hardware acceleration for
performance sensitive APIC accesses while securely managing the guest-owned
APIC state through the use of a private APIC backing page.
This helps prevent the hypervisor from generating unexpected interrupts for
a vCPU or otherwise violate architectural assumptions around the APIC
behavior.
Add a new x2APIC driver that will serve as the base of the Secure AVIC
support. It is initially the same as the x2APIC physical driver (without IPI
callbacks), but will be modified as features are implemented.
As the new driver does not implement Secure AVIC features yet, if the
hypervisor sets the Secure AVIC bit in SEV_STATUS, maintain the existing
behavior to enforce the guest termination.
[ bp: Massage commit message. ]
Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/20250828070334.208401-2-Neeraj.Upadhyay@amd.com
|
|
Similar to zboot architectures, implement support for embedding SBAT data
for x86. Put '.sbat' section in between '.data' and '.text' as the former
also covers '.bss' and '.pgtable' and thus must be the last one in the
file.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/20250603091951.57775-1-vkuznets@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
|
|
Both Intel and AMD CPUs support 5-level paging, which is expected to
become more widely adopted in the future. All major x86 Linux
distributions have the feature enabled.
Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
|
|
The main CPUID header <asm/cpuid.h> was originally a storefront for the
headers:
<asm/cpuid/api.h>
<asm/cpuid/leaf_0x2_api.h>
Now that the latter CPUID(0x2) header has been merged into the former,
there is no practical difference between <asm/cpuid.h> and
<asm/cpuid/api.h>.
Migrate all users to the <asm/cpuid/api.h> header, in preparation of
the removal of <asm/cpuid.h>.
Don't remove <asm/cpuid.h> just yet, in case some new code in -next
started using it.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
|
|
During kexec handover (KHO) memory contains data that should be preserved
and this data would be consumed by kexec'ed kernel.
To make sure that the preserved memory is not overwritten, KHO uses
"scratch regions" to bootstrap kexec'ed kernel. These regions are
guaranteed to not have any memory that KHO would preserve and are used as
the only memory the kernel sees during the early boot.
The scratch regions are passed in the setup_data by the first kernel with
other KHO parameters. If the setup_data contains the KHO parameters,
limit randomization to scratch areas only to make sure preserved memory
won't get overwritten.
Since all the pointers in setup_data are represented by u64, they require
double casting (first to unsigned long and then to the actual pointer
type) to compile on 32-bits. This looks goofy out of context, but it is
unfortunately the way that this is handled across the tree. There are at
least a dozen instances of casting like this.
Link: https://lkml.kernel.org/r/20250509074635.3187114-14-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Most of the SEV support code used to reside in a single C source file
that was included in two places: the core kernel, and the decompressor.
The code that is actually shared with the decompressor was moved into a
separate, shared source file under startup/, on the basis that the
decompressor also executes from the early 1:1 mapping of memory.
However, while the elaborate #VC handling and instruction decoding that
it involves is also performed by the decompressor, it does not actually
occur in the core kernel at early boot, and therefore, does not need to
be part of the confined early startup code.
So split off the #VC handling code and move it back into arch/x86/coco
where it came from, into another C source file that is included from
both the decompressor and the core kernel.
Code movement only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250504095230.2932860-31-ardb+git@google.com
|
|
As a first step towards disentangling the SEV #VC handling code -which
is shared between the decompressor and the core kernel- from the SEV
startup code, move the decompressor's copy of the instruction decoder
into a separate source file.
Code movement only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250504095230.2932860-30-ardb+git@google.com
|
|
sev_snp_enabled() is no longer used outside of the source file that
defines it, so make it static and drop the extern declarations.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250504095230.2932860-29-ardb+git@google.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit:
d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
provided a fix for SEV-SNP memory acceptance from the EFI stub when
running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP
guests running at VMPL >0, as those rely on a SVSM calling area, which
is a shared buffer whose address is programmed into a SEV-SNP MSR, and
the SEV init code that sets up this calling area executes much later
during the boot.
Given that booting via the EFI stub at VMPL >0 implies that the firmware
has configured this calling area already, reuse it for performing memory
acceptance in the EFI stub.
Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
|
|
Move the SEV startup code into arch/x86/boot/startup/, where it will
reside along with other code that executes extremely early, and
therefore needs to be built in a special manner.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com
|
|
Disentangle the SEV core code and the SEV code that is called during
early boot. The latter piece will be moved into startup/ in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com
|
|
upstream fixes
In particular we need this fix before applying subsequent changes:
d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1
Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2
Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
|
|
Minimum version of binutils required to compile the kernel is 2.25.
This version correctly handles the "rep" prefixes, so it is possible
to remove the semicolon, which was used to support ancient versions
of GNU as.
Due to the semicolon, the compiler considers "rep; insn" (or its
alternate "rep\n\tinsn" form) as two separate instructions. Removing
the semicolon makes asm length calculations more accurate, consequently
making scheduling and inlining decisions of the compiler more accurate.
Removing the semicolon also enables assembler checks involving "rep"
prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in:
Error: invalid instruction `add' after `rep'
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250418071437.4144391-1-ubizjak@gmail.com
|
|
Prepare for splitting off parts of the SEV core.c source file into a
file that carries code that must tolerate being called from the early
1:1 mapping. This will allow special build-time handling of thise code,
to ensure that it gets generated in a way that is compatible with the
early execution context.
So create a de-facto internal SEV API and put the definitions into
sev-internal.h. No attempt is made to allow this header file to be
included in arbitrary other sources - this is explicitly not the intent.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com
|
|
Move the early GDT/IDT setup code that runs long before the kernel
virtual mapping is up into arch/x86/boot/startup/, and build it in a way
that ensures that the code tolerates being called from the 1:1 mapping
of memory. The code itself is left unchanged by this patch.
Also tweak the sed symbol matching pattern in the decompressor to match
on lower case 't' or 'b', as these will be emitted by Clang for symbols
with hidden linkage.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410134117.3713574-15-ardb+git@google.com
|
|
The 5-level paging trampoline is used by both the EFI stub and the
traditional decompressor. Move it out of the decompressor sources into
the newly minted arch/x86/boot/startup/ sub-directory which will hold
startup code that may be shared between the decompressor, the EFI stub
and the kernel proper, and needs to tolerate being called during early
boot, before the kernel virtual mapping has been created.
This will allow the 5-level paging trampoline to be used by EFI boot
images such as zboot that omit the traditional decompressor entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401133416.1436741-10-ardb+git@google.com
|
|
Merge the local include "pgtable.h" -which declares the API of the
5-level paging trampoline- into <asm/boot.h> so that its implementation
in la57toggle.S as well as the calling code can be decoupled from the
traditional decompressor.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401133416.1436741-9-ardb+git@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Improve performance in gendwarfksyms
- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
- Support CONFIG_HEADERS_INSTALL for ARCH=um
- Use more relative paths to sources files for better reproducibility
- Support the loong64 Debian architecture
- Add Kbuild bash completion
- Introduce intermediate vmlinux.unstripped for architectures that need
static relocations to be stripped from the final vmlinux
- Fix versioning in Debian packages for -rc releases
- Treat missing MODULE_DESCRIPTION() as an error
- Convert Nios2 Makefiles to use the generic rule for built-in DTB
- Add debuginfo support to the RPM package
* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: rpm-pkg: build a debuginfo RPM
kconfig: merge_config: use an empty file as initfile
nios2: migrate to the generic rule for built-in DTB
rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
kbuild: pacman-pkg: hardcode module installation path
kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
modpost: require a MODULE_DESCRIPTION()
kbuild: make all file references relative to source root
x86: drop unnecessary prefix map configuration
kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
kbuild: Add a help message for "headers"
kbuild: deb-pkg: remove "version" variable in mkdebian
kbuild: deb-pkg: fix versioning for -rc releases
Documentation/kbuild: Fix indentation in modules.rst example
x86: Get rid of Makefile.postlink
kbuild: Create intermediate vmlinux build with relocations preserved
kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
kbuild: link-vmlinux.sh: Make output file name configurable
kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
Revert "kheaders: Ignore silly-rename files"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Decouple mixed mode startup code from the traditional x86
decompressor
- Revert zero-length file hack in efivarfs
- Prevent EFI zboot from using the CopyMem/SetMem boot services after
ExitBootServices()
- Update EFI zboot to use the ZLIB/ZSTD library interfaces directly
* tag 'efi-next-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: Avoid legacy decompressor zlib/zstd wrappers
efi/libstub: Avoid CopyMem/SetMem EFI services after ExitBootServices
efi: efibc: change kmalloc(size * count, ...) to kmalloc_array()
efivarfs: Revert "allow creation of zero length files"
x86/efi/mixed: Move mixed mode startup code into libstub
x86/efi/mixed: Simplify and document thunking logic
x86/efi/mixed: Remove dependency on legacy startup_32 code
x86/efi/mixed: Set up 1:1 mapping of lower 4GiB in the stub
x86/efi/mixed: Factor out and clean up long mode entry
x86/efi/mixed: Check CPU compatibility without relying on verify_cpu()
x86/efistub: Merge PE and handover entrypoints
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot code updates from Ingo Molnar:
- Memblock setup and other early boot code cleanups (Mike Rapoport)
- Export e820_table_kexec[] to sysfs (Dave Young)
- Baby steps of adding relocate_kernel() debugging support (David
Woodhouse)
- Replace open-coded parity calculation with parity8() (Kuan-Wei Chiu)
- Move the LA57 trampoline to separate source file (Ard Biesheuvel)
- Misc micro-optimizations (Uros Bizjak)
- Drop obsolete E820_TYPE_RESERVED_KERN and related code (Mike
Rapoport)
* tag 'x86-boot-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kexec: Add relocate_kernel() debugging support: Load a GDT
x86/boot: Move the LA57 trampoline to separate source file
x86/boot: Do not test if AC and ID eflags are changeable on x86_64
x86/bootflag: Replace open-coded parity calculation with parity8()
x86/bootflag: Micro-optimize sbf_write()
x86/boot: Add missing has_cpuflag() prototype
x86/kexec: Export e820_table_kexec[] to sysfs
x86/boot: Change some static bootflag functions to bool
x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related code
x86/boot: Split parsing of boot_params into the parse_boot_params() helper function
x86/boot: Split kernel resources setup into the setup_kernel_resources() helper function
x86/boot: Move setting of memblock parameters to e820__memblock_setup()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
- Drop CRC-32 checksum and the build tool that generates it (Ard
Biesheuvel)
- Fix broken copy command in genimage.sh when making isoimage (Nir
Lichtman)
* tag 'x86-build-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Add back some padding for the CRC-32 checksum
x86/boot: Drop CRC-32 checksum and the build tool that generates it
x86/build: Fix broken copy command in genimage.sh when making isoimage
|
|
The toplevel Makefile already provides -fmacro-prefix-map as part of
KBUILD_CPPFLAGS. In contrast to the KBUILD_CFLAGS and KBUILD_AFLAGS
variables, KBUILD_CPPFLAGS is not redefined in the architecture specific
Makefiles. Therefore the toplevel KBUILD_CPPFLAGS do apply just fine, to
both C and ASM sources.
The custom configuration was necessary when it was added in
commit 9e2276fa6eb3 ("arch/x86/boot: Use prefix map to avoid embedded
paths") but has since become unnecessary in commit a716bd743210
("kbuild: use -fmacro-prefix-map for .S sources").
Drop the now unnecessary custom prefix map configuration.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Instead of generating the vmlinux.relocs file (needed by the
decompressor build to construct the KASLR relocation tables) as a
vmlinux postlink step, which is dubious because it depends on data that
is stripped from vmlinux before the build completes, generate it from
vmlinux.unstripped, which has been introduced specifically for this
purpose.
This ensures that each artifact is rebuilt as needed, rather than as a
side effect of another build rule.
This effectively reverts commit
9d9173e9ceb6 ("x86/build: Avoid relocation information in final vmlinux")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
To permit the EFI stub to call this code even when building the kernel
without the legacy decompressor, move the trampoline out of the latter's
startup code.
This is part of an ongoing WIP effort on my part to make the existing,
generic EFI zboot format work on x86 as well.
No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250313120324.1095968-2-ardb+git@google.com
|
|
Even though no uses of the bzImage CRC-32 checksum are known, ensure
that the last 4 bytes of the image are unused zero bytes, so that the
checksum can be generated post-build if needed.
[ mingo: Added the 'obsolete' qualifier to the comment. ]
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250312081204.521411-2-ardb+git@google.com
|
|
Apart from some sanity checks on the size of setup.bin, the only
remaining task carried out by the arch/x86/boot/tools/build.c build tool
is generating the CRC-32 checksum of the bzImage. This feature was added
in commit
7d6e737c8d2698b6 ("x86: add a crc32 checksum to the kernel image.")
without any motivation (or any commit log text, for that matter). This
checksum is not verified by any known bootloader, and given that
a) the checksum of the entire bzImage is reported by most tools (zlib,
rhash) as 0xffffffff and not 0x0 as documented,
b) the checksum is corrupted when the image is signed for secure boot,
which means that no distro ships x86 images with valid CRCs,
it seems quite unlikely that this checksum is being used, so let's just
drop it, along with the tool that generates it.
Instead, use simple file concatenation and truncation to combine the two
pieces into bzImage, and replace the checks on the size of the setup
block with a couple of ASSERT()s in the linker script.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250307164801.885261-2-ardb+git@google.com
|
|
|
|
The 5-level paging code parses the command line to look for the 'no5lvl'
string, and does so very early, before sanitize_boot_params() has been
called and has been given the opportunity to wipe bogus data from the
fields in boot_params that are not covered by struct setup_header, and
are therefore supposed to be initialized to zero by the bootloader.
This triggers an early boot crash when using syslinux-efi to boot a
recent kernel built with CONFIG_X86_5LEVEL=y and CONFIG_EFI_STUB=n, as
the 0xff padding that now fills the unused PE/COFF header is copied into
boot_params by the bootloader, and interpreted as the top half of the
command line pointer.
Fix this by sanitizing the boot_params before use. Note that there is no
harm in calling this more than once; subsequent invocations are able to
spot that the boot_params have already been cleaned up.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # v6.1+
Link: https://lore.kernel.org/r/20250306155915.342465-2-ardb+git@google.com
Closes: https://lore.kernel.org/all/202503041549.35913.ulrich.gemkow@ikr.uni-stuttgart.de
|
|
The EFI mixed mode code has been decoupled from the legacy decompressor,
in order to be able to reuse it with generic EFI zboot images for x86.
Move the source file into the libstub source directory to facilitate
this.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Now that the GDT/IDT and data segment selector preserve/restore logic
has been removed from the boot-time EFI mixed mode thunking routines,
the remaining logic to handle the function arguments can be simplified:
the setup of the arguments on the stack can be moved into the 32-bit
callee, which is able to use a more idiomatic sequence of PUSH
instructions.
This, in turn, allows the far call and far return to be issued using
plain LCALL and LRET instructions, removing the need to set up the
return explicitly.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The EFI mixed mode startup code calls into startup_32 in the legacy
decompressor with a mocked up boot_params struct, only to get it to set
up the 1:1 mapping of the lower 4 GiB of memory and switch to a GDT that
supports 64-bit mode.
In order to be able to reuse the EFI mixed mode startup code in EFI
zboot images, which do not incorporate the legacy decompressor code,
decouple it, by dealing with the GDT and IDT directly.
Doing so makes it possible to construct a GDT that is compatible with
the one the firmware uses, with one additional entry for a 64-bit mode
code segment appended. This removes the need entirely to switch between
GDTs and IDTs or data segment selector values and all of this code can
be removed.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In preparation for dropping the dependency on startup_32 entirely in the
next patch, add the code that sets up the 1:1 mapping of the lower 4 GiB
of system RAM to the mixed mode stub.
The reload of CR3 after the long mode switch will be removed in a
subsequent patch, when it is no longer needed.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Entering long mode involves setting the EFER_LME and CR4.PAE bits before
enabling paging by setting CR0.PG bit.
It also involves disabling interrupts, given that the firmware's 32-bit
IDT becomes invalid as soon as the CPU transitions into long mode.
Reloading the CR3 register is not necessary at boot time, given that the
EFI firmware as well as the kernel's EFI stub use a 1:1 mapping of the
32-bit addressable memory in the system.
Break out this code into a separate helper for clarity, and so that it
can be reused in a subsequent patch.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|