From fe3e5e65c06edb1c56e64e567f053e243142001f Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:00 -0800 Subject: efi: Enumerate EFI_MEMORY_SP UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The intent of this bit is to allow the OS to identify precious or scarce memory resources and optionally manage it separately from EfiConventionalMemory. As defined older OSes that do not know about this attribute are permitted to ignore it and the memory will be handled according to the OS default policy for the given memory type. In other words, this "specific purpose" hint is deliberately weaker than EfiReservedMemoryType in that the system continues to operate if the OS takes no action on the attribute. The risk of taking no action is potentially unwanted / unmovable kernel allocations from the designated resource that prevent the full realization of the "specific purpose". For example, consider a system with a high-bandwidth memory pool. Older kernels are permitted to boot and consume that memory as conventional "System-RAM" newer kernels may arrange for that memory to be set aside (soft reserved) by the system administrator for a dedicated high-bandwidth memory aware application to consume. Specifically, this mechanism allows for the elimination of scenarios where platform firmware tries to game OS policy by lying about ACPI SLIT values, i.e. claiming that a precious memory resource has a high distance to trigger the OS to avoid it by default. This reservation hint allows platform-firmware to instead tell the truth about performance characteristics by indicate to OS memory management to put immovable allocations elsewhere. Implement simple detection of the bit for EFI memory table dumps and save the kernel policy for a follow-on change. Reviewed-by: Ard Biesheuvel Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index d87acf62958e..78c75992b313 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -112,6 +112,7 @@ typedef struct { #define EFI_MEMORY_MORE_RELIABLE \ ((u64)0x0000000000010000ULL) /* higher reliability */ #define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */ +#define EFI_MEMORY_SP ((u64)0x0000000000040000ULL) /* soft reserved */ #define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */ #define EFI_MEMORY_DESCRIPTOR_VERSION 1 -- cgit v1.2.3 From 6950e31b35fdf4588cbbdec1813091bb02cf8871 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:05 -0800 Subject: x86/efi: Push EFI_MEMMAP check into leaf routines In preparation for adding another EFI_MEMMAP dependent call that needs to occur before e820__memblock_setup() fixup the existing efi calls to check for EFI_MEMMAP internally. This ends up being cleaner than the alternative of checking EFI_MEMMAP multiple times in setup_arch(). Reviewed-by: Dave Hansen Reviewed-by: Ard Biesheuvel Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/efi.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index 78c75992b313..44c85b559e15 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1045,7 +1045,6 @@ extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if pos extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); -extern void efi_find_mirror(void); #else static inline efi_status_t efi_query_variable_store(u32 attributes, -- cgit v1.2.3 From b617c5266eedbef2ccbb90931bb9175faa4ae0bc Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:11 -0800 Subject: efi: Common enable/disable infrastructure for EFI soft reservation UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. As for this patch, define the common helpers to determine if the EFI_MEMORY_SP attribute should be honored. The determination needs to be made early to prevent the kernel from being loaded into soft-reserved memory, or otherwise allowing early allocations to land there. Follow-on changes are needed per architecture to leverage these helpers in their respective mem-init paths. Reviewed-by: Ard Biesheuvel Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/efi.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index 44c85b559e15..88654910ce29 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1202,6 +1202,7 @@ extern int __init efi_setup_pcdp_console(char *); #define EFI_DBG 8 /* Print additional debug info at runtime */ #define EFI_NX_PE_DATA 9 /* Can runtime data regions be mapped non-executable? */ #define EFI_MEM_ATTR 10 /* Did firmware publish an EFI_MEMORY_ATTRIBUTES table? */ +#define EFI_MEM_NO_SOFT_RESERVE 11 /* Is the kernel configured to ignore soft reservations? */ #ifdef CONFIG_EFI /* @@ -1212,6 +1213,14 @@ static inline bool efi_enabled(int feature) return test_bit(feature, &efi.flags) != 0; } extern void efi_reboot(enum reboot_mode reboot_mode, const char *__unused); + +bool __pure __efi_soft_reserve_enabled(void); + +static inline bool __pure efi_soft_reserve_enabled(void) +{ + return IS_ENABLED(CONFIG_EFI_SOFT_RESERVE) + && __efi_soft_reserve_enabled(); +} #else static inline bool efi_enabled(int feature) { @@ -1225,6 +1234,11 @@ efi_capsule_pending(int *reset_type) { return false; } + +static inline bool efi_soft_reserve_enabled(void) +{ + return false; +} #endif extern int efi_status_to_err(efi_status_t status); -- cgit v1.2.3 From 262b45ae3ab4bf8e2caf1fcfd0d8307897519630 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:16 -0800 Subject: x86/efi: EFI soft reservation to E820 enumeration UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. This patch introduces 2 new concepts at once given the entanglement between early boot enumeration relative to memory that can optionally be reserved from the kernel page allocator by default. The new concepts are: - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP attribute on EFI_CONVENTIONAL memory, update the E820 map with this new type. Only perform this classification if the CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as typical ram. - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for a device driver to search iomem resources for application specific memory. Teach the iomem code to identify such ranges as "Soft Reserved". Note that the comment for do_add_efi_memmap() needed refreshing since it seemed to imply that the efi map might overflow the e820 table, but that is not an issue as of commit 7b6e4ba3cb1f "x86/boot/e820: Clean up the E820_X_MAX definition" that removed the 128 entry limit for e820__range_add(). A follow-on change integrates parsing of the ACPI HMAT to identify the node and sub-range boundaries of EFI_MEMORY_SP designated memory. For now, just identify and reserve memory of this type. Acked-by: Ard Biesheuvel Reported-by: kbuild test robot Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/ioport.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 7bddddfc76d6..a9b9170b5dd2 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -134,6 +134,7 @@ enum { IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_RESERVED = 7, + IORES_DESC_SOFT_RESERVED = 8, }; /* -- cgit v1.2.3 From 33dd70752cd76f4d883a165a674f13121a4155ed Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:31 -0800 Subject: lib: Uplevel the pmem "region" ida to a global allocator In preparation for handling platform differentiated memory types beyond persistent memory, uplevel the "region" identifier to a global number space. This enables a device-dax instance to be registered to any memory type with guaranteed unique names. Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/memregion.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 include/linux/memregion.h (limited to 'include') diff --git a/include/linux/memregion.h b/include/linux/memregion.h new file mode 100644 index 000000000000..7de7c0a1444e --- /dev/null +++ b/include/linux/memregion.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _MEMREGION_H_ +#define _MEMREGION_H_ +#include +#include + +#ifdef CONFIG_MEMREGION +int memregion_alloc(gfp_t gfp); +void memregion_free(int id); +#else +static inline int memregion_alloc(gfp_t gfp) +{ + return -ENOMEM; +} +void memregion_free(int id) +{ +} +#endif +#endif /* _MEMREGION_H_ */ -- cgit v1.2.3 From a6c7f4c6aea5f4ca6056b06cec7ebd79f8c23e33 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 6 Nov 2019 17:43:43 -0800 Subject: device-dax: Add a driver for "hmem" devices Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default. However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver. This consumes "hmem" devices the producer of "hmem" devices is saved for a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM symbol to gate performing the enumeration work. Reported-by: kbuild test robot Reviewed-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- include/linux/memregion.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/memregion.h b/include/linux/memregion.h index 7de7c0a1444e..e11595256cac 100644 --- a/include/linux/memregion.h +++ b/include/linux/memregion.h @@ -4,6 +4,10 @@ #include #include +struct memregion_info { + int target_node; +}; + #ifdef CONFIG_MEMREGION int memregion_alloc(gfp_t gfp); void memregion_free(int id); -- cgit v1.2.3