summaryrefslogtreecommitdiff
path: root/Documentation/arm64
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arm64')
-rw-r--r--Documentation/arm64/index.rst1
-rw-r--r--Documentation/arm64/kasan-offsets.sh27
-rw-r--r--Documentation/arm64/memory.rst123
-rw-r--r--Documentation/arm64/tagged-address-abi.rst156
-rw-r--r--Documentation/arm64/tagged-pointers.rst21
5 files changed, 293 insertions, 35 deletions
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 96b696ba4e6c..5c0c69dc58aa 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -16,6 +16,7 @@ ARM64 Architecture
pointer-authentication
silicon-errata
sve
+ tagged-address-abi
tagged-pointers
.. only:: subproject and html
diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh
new file mode 100644
index 000000000000..2b7a021db363
--- /dev/null
+++ b/Documentation/arm64/kasan-offsets.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW
+# start address at the mid-point of the kernel VA space
+
+print_kasan_offset () {
+ printf "%02d\t" $1
+ printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
+ + (1 << ($1 - 32 - $2)) \
+ - (1 << (64 - 32 - $2)) ))
+}
+
+echo KASAN_SHADOW_SCALE_SHIFT = 3
+printf "VABITS\tKASAN_SHADOW_OFFSET\n"
+print_kasan_offset 48 3
+print_kasan_offset 47 3
+print_kasan_offset 42 3
+print_kasan_offset 39 3
+print_kasan_offset 36 3
+echo
+echo KASAN_SHADOW_SCALE_SHIFT = 4
+printf "VABITS\tKASAN_SHADOW_OFFSET\n"
+print_kasan_offset 48 4
+print_kasan_offset 47 4
+print_kasan_offset 42 4
+print_kasan_offset 39 4
+print_kasan_offset 36 4
diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
index 464b880fc4b7..b040909e45f8 100644
--- a/Documentation/arm64/memory.rst
+++ b/Documentation/arm64/memory.rst
@@ -14,6 +14,10 @@ with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
virtual address, are used but the memory layout is the same.
+ARMv8.2 adds optional support for Large Virtual Address space. This is
+only available when running with a 64KB page size and expands the
+number of descriptors in the first level of translation.
+
User addresses have bits 63:48 set to 0 while the kernel addresses have
the same bits set to 1. TTBRx selection is given by bit 63 of the
virtual address. The swapper_pg_dir contains only kernel (global)
@@ -22,40 +26,43 @@ The swapper_pg_dir address is written to TTBR1 and never written to
TTBR0.
-AArch64 Linux memory layout with 4KB pages + 3 levels::
-
- Start End Size Use
- -----------------------------------------------------------------------
- 0000000000000000 0000007fffffffff 512GB user
- ffffff8000000000 ffffffffffffffff 512GB kernel
-
-
-AArch64 Linux memory layout with 4KB pages + 4 levels::
+AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000ffffffffffff 256TB user
- ffff000000000000 ffffffffffffffff 256TB kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 2 levels::
+ ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
+ ffff800000000000 ffff9fffffffffff 32TB kasan shadow region
+ ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
+ ffffa00008000000 ffffa0000fffffff 128MB modules
+ ffffa00010000000 fffffdffbffeffff ~93TB vmalloc
+ fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region]
+ fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings
+ fffffdfffea00000 fffffdfffebfffff 2MB [guard region]
+ fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space
+ fffffdffffc00000 fffffdffffdfffff 2MB [guard region]
+ fffffdffffe00000 ffffffffffdfffff 2TB vmemmap
+ ffffffffffe00000 ffffffffffffffff 2MB [guard region]
+
+
+AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
Start End Size Use
-----------------------------------------------------------------------
- 0000000000000000 000003ffffffffff 4TB user
- fffffc0000000000 ffffffffffffffff 4TB kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 3 levels::
-
- Start End Size Use
- -----------------------------------------------------------------------
- 0000000000000000 0000ffffffffffff 256TB user
- ffff000000000000 ffffffffffffffff 256TB kernel
-
-
-For details of the virtual kernel memory layout please see the kernel
-booting log.
+ 0000000000000000 000fffffffffffff 4PB user
+ fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map
+ fff8000000000000 fffd9fffffffffff 1440TB [gap]
+ fffda00000000000 ffff9fffffffffff 512TB kasan shadow region
+ ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
+ ffffa00008000000 ffffa0000fffffff 128MB modules
+ ffffa00010000000 fffff81ffffeffff ~88TB vmalloc
+ fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region]
+ fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings
+ fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region]
+ fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space
+ fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region]
+ fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap
+ ffffffffffe00000 ffffffffffffffff 2MB [guard region]
Translation table lookup with 4KB pages::
@@ -83,7 +90,8 @@ Translation table lookup with 64KB pages::
| | | | [15:0] in-page offset
| | | +----------> [28:16] L3 index
| | +--------------------------> [41:29] L2 index
- | +-------------------------------> [47:42] L1 index
+ | +-------------------------------> [47:42] L1 index (48-bit)
+ | [51:42] L1 index (52-bit)
+-------------------------------------------------> [63] TTBR0/1
@@ -96,3 +104,62 @@ ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.
+
+52-bit VA support in the kernel
+-------------------------------
+If the ARMv8.2-LVA optional feature is present, and we are running
+with a 64KB page size; then it is possible to use 52-bits of address
+space for both userspace and kernel addresses. However, any kernel
+binary that supports 52-bit must also be able to fall back to 48-bit
+at early boot time if the hardware feature is not present.
+
+This fallback mechanism necessitates the kernel .text to be in the
+higher addresses such that they are invariant to 48/52-bit VAs. Due
+to the kasan shadow being a fraction of the entire kernel VA space,
+the end of the kasan shadow must also be in the higher half of the
+kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
+the end of the kasan shadow is invariant and dependent on ~0UL,
+whilst the start address will "grow" towards the lower addresses).
+
+In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
+is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
+this obviates the need for an extra variable read. The physvirt
+offset and vmemmap offsets are computed at early boot to enable
+this logic.
+
+As a single binary will need to support both 48-bit and 52-bit VA
+spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
+also must be sized large enought to accommodate a fixed PAGE_OFFSET.
+
+Most code in the kernel should not need to consider the VA_BITS, for
+code that does need to know the VA size the variables are
+defined as follows:
+
+VA_BITS constant the *maximum* VA space size
+
+VA_BITS_MIN constant the *minimum* VA space size
+
+vabits_actual variable the *actual* VA space size
+
+
+Maximum and minimum sizes can be useful to ensure that buffers are
+sized large enough or that addresses are positioned close enough for
+the "worst" case.
+
+52-bit userspace VAs
+--------------------
+To maintain compatibility with software that relies on the ARMv8.0
+VA space maximum size of 48-bits, the kernel will, by default,
+return virtual addresses to userspace from a 48-bit range.
+
+Software can "opt-in" to receiving VAs from a 52-bit space by
+specifying an mmap hint parameter that is larger than 48-bit.
+For example:
+ maybe_high_address = mmap(~0UL, size, prot, flags,...);
+
+It is also possible to build a debug kernel that returns addresses
+from a 52-bit space by enabling the following kernel config options:
+ CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
+
+Note that this option is only intended for debugging applications
+and should not be used in production.
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst
new file mode 100644
index 000000000000..d4a85d535bf9
--- /dev/null
+++ b/Documentation/arm64/tagged-address-abi.rst
@@ -0,0 +1,156 @@
+==========================
+AArch64 TAGGED ADDRESS ABI
+==========================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+ Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 21 August 2019
+
+This document describes the usage and semantics of the Tagged Address
+ABI on AArch64 Linux.
+
+1. Introduction
+---------------
+
+On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
+userspace (EL0) to perform memory accesses through 64-bit pointers with
+a non-zero top byte. This document describes the relaxation of the
+syscall ABI that allows userspace to pass certain tagged pointers to
+kernel syscalls.
+
+2. AArch64 Tagged Address ABI
+-----------------------------
+
+From the kernel syscall interface perspective and for the purposes of
+this document, a "valid tagged pointer" is a pointer with a potentially
+non-zero top-byte that references an address in the user process address
+space obtained in one of the following ways:
+
+- ``mmap()`` syscall where either:
+
+ - flags have the ``MAP_ANONYMOUS`` bit set or
+ - the file descriptor refers to a regular file (including those
+ returned by ``memfd_create()``) or ``/dev/zero``
+
+- ``brk()`` syscall (i.e. the heap area between the initial location of
+ the program break at process creation and its current location).
+
+- any memory mapped by the kernel in the address space of the process
+ during creation and with the same restrictions as for ``mmap()`` above
+ (e.g. data, bss, stack).
+
+The AArch64 Tagged Address ABI has two stages of relaxation depending
+how the user addresses are used by the kernel:
+
+1. User addresses not accessed by the kernel but used for address space
+ management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use
+ of valid tagged pointers in this context is always allowed.
+
+2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
+ relaxation is disabled by default and the application thread needs to
+ explicitly enable it via ``prctl()`` as follows:
+
+ - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
+ Address ABI for the calling thread.
+
+ The ``(unsigned int) arg2`` argument is a bit mask describing the
+ control mode used:
+
+ - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
+ Default status is disabled.
+
+ Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
+
+ - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
+ Address ABI for the calling thread.
+
+ Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
+
+ The ABI properties described above are thread-scoped, inherited on
+ clone() and fork() and cleared on exec().
+
+ Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
+ returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
+ disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
+ ``sysctl abi.tagged_addr_disabled`` configuration is 0.
+
+When the AArch64 Tagged Address ABI is enabled for a thread, the
+following behaviours are guaranteed:
+
+- All syscalls except the cases mentioned in section 3 can accept any
+ valid tagged pointer.
+
+- The syscall behaviour is undefined for invalid tagged pointers: it may
+ result in an error code being returned, a (fatal) signal being raised,
+ or other modes of failure.
+
+- The syscall behaviour for a valid tagged pointer is the same as for
+ the corresponding untagged pointer.
+
+
+A definition of the meaning of tagged pointers on AArch64 can be found
+in Documentation/arm64/tagged-pointers.rst.
+
+3. AArch64 Tagged Address ABI Exceptions
+-----------------------------------------
+
+The following system call parameters must be untagged regardless of the
+ABI relaxation:
+
+- ``prctl()`` other than pointers to user data either passed directly or
+ indirectly as arguments to be accessed by the kernel.
+
+- ``ioctl()`` other than pointers to user data either passed directly or
+ indirectly as arguments to be accessed by the kernel.
+
+- ``shmat()`` and ``shmdt()``.
+
+Any attempt to use non-zero tagged pointers may result in an error code
+being returned, a (fatal) signal being raised, or other modes of
+failure.
+
+4. Example of correct usage
+---------------------------
+.. code-block:: c
+
+ #include <stdlib.h>
+ #include <string.h>
+ #include <unistd.h>
+ #include <sys/mman.h>
+ #include <sys/prctl.h>
+
+ #define PR_SET_TAGGED_ADDR_CTRL 55
+ #define PR_TAGGED_ADDR_ENABLE (1UL << 0)
+
+ #define TAG_SHIFT 56
+
+ int main(void)
+ {
+ int tbi_enabled = 0;
+ unsigned long tag = 0;
+ char *ptr;
+
+ /* check/enable the tagged address ABI */
+ if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
+ tbi_enabled = 1;
+
+ /* memory allocation */
+ ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (ptr == MAP_FAILED)
+ return 1;
+
+ /* set a non-zero tag if the ABI is available */
+ if (tbi_enabled)
+ tag = rand() & 0xff;
+ ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+
+ /* memory access to a tagged address */
+ strcpy(ptr, "tagged pointer\n");
+
+ /* syscall with a tagged pointer */
+ write(1, ptr, strlen(ptr));
+
+ return 0;
+ }
diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst
index 2acdec3ebbeb..eab4323609b9 100644
--- a/Documentation/arm64/tagged-pointers.rst
+++ b/Documentation/arm64/tagged-pointers.rst
@@ -20,7 +20,9 @@ Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
+an address tag of 0x00, unless the application enables the AArch64
+Tagged Address ABI explicitly
+(Documentation/arm64/tagged-address-abi.rst).
This includes, but is not limited to, addresses found in:
@@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in:
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
-Using non-zero address tags in any of these locations may result in an
-error code being returned, a (fatal) signal being raised, or other modes
-of failure.
+Using non-zero address tags in any of these locations when the
+userspace application did not enable the AArch64 Tagged Address ABI may
+result in an error code being returned, a (fatal) signal being raised,
+or other modes of failure.
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
+For these reasons, when the AArch64 Tagged Address ABI is disabled,
+passing non-zero address tags to the kernel via system calls is
+forbidden, and using a non-zero address tag for sp is strongly
+discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
@@ -59,6 +63,9 @@ be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
+This behaviour is maintained when the AArch64 Tagged Address ABI is
+enabled.
+
Other considerations
--------------------