diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-09 12:34:26 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-09 12:34:26 -0700 |
commit | e9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch) | |
tree | 66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/x86 | |
parent | 7011b7e1b702cc76f9e969b41d9a95969f2aecaa (diff) | |
parent | 454f96f2b738374da4b0a703b1e2e7aed82c4486 (diff) |
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/index.rst | 1 | ||||
-rw-r--r-- | Documentation/x86/protection-keys.rst | 99 | ||||
-rw-r--r-- | Documentation/x86/resctrl_ui.rst | 30 | ||||
-rw-r--r-- | Documentation/x86/x86_64/5level-paging.rst | 2 | ||||
-rw-r--r-- | Documentation/x86/x86_64/boot-options.rst | 4 | ||||
-rw-r--r-- | Documentation/x86/x86_64/fake-numa-for-cpusets.rst | 2 |
6 files changed, 22 insertions, 116 deletions
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index ae36fc5fc649..f2de1b2d3ac7 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -19,7 +19,6 @@ x86-specific Documentation tlb mtrr pat - protection-keys intel_mpx amd-memory-encryption pti diff --git a/Documentation/x86/protection-keys.rst b/Documentation/x86/protection-keys.rst deleted file mode 100644 index 49d9833af871..000000000000 --- a/Documentation/x86/protection-keys.rst +++ /dev/null @@ -1,99 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -====================== -Memory Protection Keys -====================== - -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake "Scalable Processor" Server CPUs. -It will be avalable in future non-server parts. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - -Memory Protection Keys provides a mechanism for enforcing page-based -protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. - -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. - -There are two new instructions (RDPKRU/WRPKRU) for reading and writing -to the new register. The feature is only available in 64-bit mode, -even though there is theoretically space in the PAE PTEs. These -permissions are enforced on data access only and have no effect on -instruction fetches. - -Syscalls -======== - -There are 3 system calls which directly interact with pkeys:: - - int pkey_alloc(unsigned long flags, unsigned long init_access_rights) - int pkey_free(int pkey); - int pkey_mprotect(unsigned long start, size_t len, - unsigned long prot, int pkey); - -Before a pkey can be used, it must first be allocated with -pkey_alloc(). An application calls the WRPKRU instruction -directly in order to change access permissions to memory covered -with a key. In this example WRPKRU is wrapped by a C function -called pkey_set(). -:: - - int real_prot = PROT_READ|PROT_WRITE; - pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); - ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); - ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); - ... application runs here - -Now, if the application needs to update the data at 'ptr', it can -gain access, do the update, then remove its write access:: - - pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE - *ptr = foo; // assign something - pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again - -Now when it frees the memory, it will also free the pkey since it -is no longer in use:: - - munmap(ptr, PAGE_SIZE); - pkey_free(pkey); - -.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. - An example implementation can be found in - tools/testing/selftests/x86/protection_keys.c. - -Behavior -======== - -The kernel attempts to make protection keys consistent with the -behavior of a plain mprotect(). For instance if you do this:: - - mprotect(ptr, size, PROT_NONE); - something(ptr); - -you can expect the same effects with protection keys when doing this:: - - pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); - pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); - something(ptr); - -That should be true whether something() is a direct access to 'ptr' -like:: - - *ptr = foo; - -or when the kernel does the access on the application's behalf like -with a read():: - - read(fd, ptr, 1); - -The kernel will send a SIGSEGV in both cases, but si_code will be set -to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when -the plain mprotect() permissions are violated. diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst index 225cfd4daaee..5368cedfb530 100644 --- a/Documentation/x86/resctrl_ui.rst +++ b/Documentation/x86/resctrl_ui.rst @@ -40,7 +40,7 @@ mount options are: Enable the MBA Software Controller(mba_sc) to specify MBA bandwidth in MBps -L2 and L3 CDP are controlled seperately. +L2 and L3 CDP are controlled separately. RDT features are orthogonal. A particular system may support only monitoring, only control, or both monitoring and control. Cache @@ -118,7 +118,7 @@ related to allocation: Corresponding region is pseudo-locked. No sharing allowed. -Memory bandwitdh(MB) subdirectory contains the following files +Memory bandwidth(MB) subdirectory contains the following files with respect to allocation: "min_bandwidth": @@ -209,7 +209,7 @@ All groups contain the following files: CPUs to/from this group. As with the tasks file a hierarchy is maintained where MON groups may only include CPUs owned by the parent CTRL_MON group. - When the resouce group is in pseudo-locked mode this file will + When the resource group is in pseudo-locked mode this file will only be readable, reflecting the CPUs associated with the pseudo-locked region. @@ -342,7 +342,7 @@ For cache resources we describe the portion of the cache that is available for allocation using a bitmask. The maximum value of the mask is defined by each cpu model (and may be different for different cache levels). It is found using CPUID, but is also provided in the "info" directory of -the resctrl file system in "info/{resource}/cbm_mask". X86 hardware +the resctrl file system in "info/{resource}/cbm_mask". Intel hardware requires that these masks have all the '1' bits in a contiguous block. So 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 and 0xA are not. On a system with a 20-bit mask each bit represents 5% @@ -380,7 +380,7 @@ where L2 external is 10GBps (hence aggregate L2 external bandwidth is 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20 threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3 bandwidth of 100GBps although the percentage value specified is only 50% -<< 100%. Hence increasing the bandwidth percentage will not yeild any +<< 100%. Hence increasing the bandwidth percentage will not yield any more bandwidth. This is because although the L2 external bandwidth still has capacity, the L3 external bandwidth is fully used. Also note that this would be dependent on number of cores the benchmark is run on. @@ -398,7 +398,7 @@ In order to mitigate this and make the interface more user friendly, resctrl added support for specifying the bandwidth in MBps as well. The kernel underneath would use a software feedback mechanism or a "Software Controller(mba_sc)" which reads the actual bandwidth using MBM counters -and adjust the memowy bandwidth percentages to ensure:: +and adjust the memory bandwidth percentages to ensure:: "actual bandwidth < user specified bandwidth". @@ -418,16 +418,22 @@ L3 schemata file details (CDP enabled via mount option to resctrl) When CDP is enabled L3 control is split into two separate resources so you can specify independent masks for code and data like this:: - L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... - L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... + L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... + L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... L2 schemata file details ------------------------ -L2 cache does not support code and data prioritization, so the -schemata format is always:: +CDP is supported at L2 using the 'cdpl2' mount option. The schemata +format is either:: L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... +or + + L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... + L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... + + Memory bandwidth Allocation (default mode) ------------------------------------------ @@ -671,8 +677,8 @@ allocations can overlap or not. The allocations specifies the maximum b/w that the group may be able to use and the system admin can configure the b/w accordingly. -If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB -rather than the percentage values. +If resctrl is using the software controller (mba_sc) then user can enter the +max b/w in MB rather than the percentage values. :: # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata diff --git a/Documentation/x86/x86_64/5level-paging.rst b/Documentation/x86/x86_64/5level-paging.rst index ab88a4514163..44856417e6a5 100644 --- a/Documentation/x86/x86_64/5level-paging.rst +++ b/Documentation/x86/x86_64/5level-paging.rst @@ -20,7 +20,7 @@ physical address space. This "ought to be enough for anybody" ©. QEMU 2.9 and later support 5-level paging. Virtual memory layout for 5-level paging is described in -Documentation/x86/x86_64/mm.txt +Documentation/x86/x86_64/mm.rst Enabling 5-level paging diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index 2f69836b8445..6a4285a3c7a4 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst @@ -9,7 +9,7 @@ only the AMD64 specific ones are listed here. Machine check ============= -Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. +Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables. mce=off Disable machine check @@ -89,7 +89,7 @@ APICs Don't use the local APIC (alias for i386 compatibility) pirq=... - See Documentation/x86/i386/IO-APIC.txt + See Documentation/x86/i386/IO-APIC.rst noapictimer Don't set up the APIC timer diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst index a6926cd40f70..30108684ae87 100644 --- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst +++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst @@ -18,7 +18,7 @@ For more information on the features of cpusets, see Documentation/cgroup-v1/cpusets.rst. There are a number of different configurations you can use for your needs. For more information on the numa=fake command line option and its various ways of -configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt. +configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst. For the purposes of this introduction, we'll assume a very primitive NUMA emulation setup of "numa=fake=4*512,". This will split our system memory into |