From f6e5aedf470bd4522732595a85688052e3cbc03f Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Thu, 25 Feb 2021 14:54:17 +0800 Subject: riscv: Add support for memtest The riscv [rv32_]defconfig enabled CONFIG_MEMTEST, but memtest feature is not supported in RISCV. Add early_memtest() to support for memtest. Signed-off-by: Kefeng Wang Signed-off-by: Palmer Dabbelt --- Documentation/admin-guide/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..20447b531630 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2794,7 +2794,7 @@ seconds. Use this parameter to check at some other rate. 0 disables periodic checking. - memtest= [KNL,X86,ARM,PPC] Enable memtest + memtest= [KNL,X86,ARM,PPC,RISCV] Enable memtest Format: default : 0 Specifies the number of memtest passes to be -- cgit v1.2.3 From 81dd500b1c8612a42979ee3ea788d2d9f19aa9f9 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 15 Mar 2021 18:59:40 +0200 Subject: gpio: mockup: Adjust documentation to the code First of all one of the parameter missed 'mockup' in its name, Second, the semantics of the integer pairs depends on the sign of the base (the first value in the pair). Update documentation to reflect the real code behaviour. Fixes: 2fd1abe99e5f ("Documentation: gpio: add documentation for gpio-mockup") Signed-off-by: Andy Shevchenko Signed-off-by: Bartosz Golaszewski --- Documentation/admin-guide/gpio/gpio-mockup.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/gpio/gpio-mockup.rst b/Documentation/admin-guide/gpio/gpio-mockup.rst index 9fa1618b3adc..493071da1738 100644 --- a/Documentation/admin-guide/gpio/gpio-mockup.rst +++ b/Documentation/admin-guide/gpio/gpio-mockup.rst @@ -17,17 +17,18 @@ module. gpio_mockup_ranges This parameter takes an argument in the form of an array of integer - pairs. Each pair defines the base GPIO number (if any) and the number - of lines exposed by the chip. If the base GPIO is -1, the gpiolib - will assign it automatically. + pairs. Each pair defines the base GPIO number (non-negative integer) + and the first number after the last of this chip. If the base GPIO + is -1, the gpiolib will assign it automatically. while the following + parameter is the number of lines exposed by the chip. - Example: gpio_mockup_ranges=-1,8,-1,16,405,4 + Example: gpio_mockup_ranges=-1,8,-1,16,405,409 The line above creates three chips. The first one will expose 8 lines, the second 16 and the third 4. The base GPIO for the third chip is set to 405 while for two first chips it will be assigned automatically. - gpio_named_lines + gpio_mockup_named_lines This parameter doesn't take any arguments. It lets the driver know that GPIO lines exposed by it should be named. -- cgit v1.2.3 From 0043f0b27a0406730caef61068703fcacd9c2166 Mon Sep 17 00:00:00 2001 From: Thorsten Leemhuis Date: Thu, 15 Apr 2021 12:29:14 +0200 Subject: docs: reporting-issues.rst: CC subsystem and maintainers on regressions When reporting a regression, users ideally should CC the subsystem and its maintainers, as that will ensure they get aware of the regression quickly. And if the culprit is known, they should also CC everyone who signed if off; the text mentioned the latter in once place already, but forgot to do so in two other areas where it's relevant. While fixing this also remind readers to check the mailing list archives for issues that need to be reported to a bug tracker, as someone might have reported it by mail only. All of this got triggered by a recent report where these changes would have made a difference. Signed-off-by: Thorsten Leemhuis Link: https://lore.kernel.org/lkml/dff6badf-58f5-98c8-871c-94d901ac6919@leemhuis.info/ Link: https://lore.kernel.org/lkml/CAJZ5v0hX2StQVttAciHYH-urUH+Hi92z9z2ZbcNgQPt0E2Jpwg@mail.gmail.com/ Link: https://lore.kernel.org/r/dd13f10c30e79e550215e53a8103406daec4e593.1618482489.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/reporting-issues.rst | 49 ++++++++++++++++---------- 1 file changed, 30 insertions(+), 19 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst index 48b4d0ef2b09..18d8e25ba9df 100644 --- a/Documentation/admin-guide/reporting-issues.rst +++ b/Documentation/admin-guide/reporting-issues.rst @@ -24,7 +24,8 @@ longterm series? One still supported? Then search the `LKML you don't find any, install `the latest release from that series `_. If it still shows the issue, report it to the stable mailing list (stable@vger.kernel.org) and CC the regressions list -(regressions@lists.linux.dev). +(regressions@lists.linux.dev); ideally also CC the maintainer and the mailing +list for the subsystem in question. In all other cases try your best guess which kernel part might be causing the issue. Check the :ref:`MAINTAINERS ` file for how its developers @@ -48,8 +49,9 @@ before the issue occurs. If you are facing multiple issues with the Linux kernel at once, report each separately. While writing your report, include all information relevant to the issue, like the kernel and the distro used. In case of a regression, CC the -regressions mailing list (regressions@lists.linux.dev) to your report; also try -to include the commit-id of the change causing it, which a bisection can find. +regressions mailing list (regressions@lists.linux.dev) to your report. Also try +to pin-point the culprit with a bisection; if you succeed, include its +commit-id and CC everyone in the sign-off-by chain. Once the report is out, answer any questions that come up and help where you can. That includes keeping the ball rolling by occasionally retesting with newer @@ -198,10 +200,11 @@ report them: * Send a short problem report to the Linux stable mailing list (stable@vger.kernel.org) and CC the Linux regressions mailing list - (regressions@lists.linux.dev). Roughly describe the issue and ideally - explain how to reproduce it. Mention the first version that shows the - problem and the last version that's working fine. Then wait for further - instructions. + (regressions@lists.linux.dev); if you suspect the cause in a particular + subsystem, CC its maintainer and its mailing list. Roughly describe the + issue and ideally explain how to reproduce it. Mention the first version + that shows the problem and the last version that's working fine. Then + wait for further instructions. The reference section below explains each of these steps in more detail. @@ -768,7 +771,9 @@ regular internet search engine and add something like the results to the archives at that URL. It's also wise to check the internet, LKML and maybe bugzilla.kernel.org again -at this point. +at this point. If your report needs to be filed in a bug tracker, you may want +to check the mailing list archives for the subsystem as well, as someone might +have reported it only there. For details how to search and what to do if you find matching reports see "Search for existing reports, first run" above. @@ -1249,9 +1254,10 @@ and the oldest where the issue occurs (say 5.8-rc1). When sending the report by mail, CC the Linux regressions mailing list (regressions@lists.linux.dev). In case the report needs to be filed to some web -tracker, proceed to do so; once filed, forward the report by mail to the -regressions list. Make sure to inline the forwarded report, hence do not attach -it. Also add a short note at the top where you mention the URL to the ticket. +tracker, proceed to do so. Once filed, forward the report by mail to the +regressions list; CC the maintainer and the mailing list for the subsystem in +question. Make sure to inline the forwarded report, hence do not attach it. +Also add a short note at the top where you mention the URL to the ticket. When mailing or forwarding the report, in case of a successful bisection add the author of the culprit to the recipients; also CC everyone in the signed-off-by @@ -1536,17 +1542,20 @@ Report the regression *Send a short problem report to the Linux stable mailing list (stable@vger.kernel.org) and CC the Linux regressions mailing list - (regressions@lists.linux.dev). Roughly describe the issue and ideally - explain how to reproduce it. Mention the first version that shows the - problem and the last version that's working fine. Then wait for further - instructions.* + (regressions@lists.linux.dev); if you suspect the cause in a particular + subsystem, CC its maintainer and its mailing list. Roughly describe the + issue and ideally explain how to reproduce it. Mention the first version + that shows the problem and the last version that's working fine. Then + wait for further instructions.* When reporting a regression that happens within a stable or longterm kernel line (say when updating from 5.10.4 to 5.10.5) a brief report is enough for -the start to get the issue reported quickly. Hence a rough description is all -it takes. +the start to get the issue reported quickly. Hence a rough description to the +stable and regressions mailing list is all it takes; but in case you suspect +the cause in a particular subsystem, CC its maintainers and its mailing list +as well, because that will speed things up. -But note, it helps developers a great deal if you can specify the exact version +And note, it helps developers a great deal if you can specify the exact version that introduced the problem. Hence if possible within a reasonable time frame, try to find that version using vanilla kernels. Lets assume something broke when your distributor released a update from Linux kernel 5.10.5 to 5.10.8. Then as @@ -1563,7 +1572,9 @@ pinpoint the exact change that causes the issue (which then can easily get reverted to fix the issue quickly). Hence consider to do a proper bisection right away if time permits. See the section 'Special care for regressions' and the document 'Documentation/admin-guide/bug-bisect.rst' for details how to -perform one. +perform one. In case of a successful bisection add the author of the culprit to +the recipients; also CC everyone in the signed-off-by chain, which you find at +the end of its commit message. Reference for "Reporting issues only occurring in older kernel version lines" -- cgit v1.2.3 From 3eb52226de6f14d9409fd5485e7bdb8430bf8449 Mon Sep 17 00:00:00 2001 From: Alexander Dahl Date: Mon, 29 Mar 2021 13:16:47 +0200 Subject: docs: kernel-parameters: Move gpio-mockup for alphabetic order All other sections are ordered alphabetically so do the same for gpio-mockup. Fixes: 0f98dd1b27d2 ("gpio/mockup: add virtual gpio device") Signed-off-by: Alexander Dahl Signed-off-by: Bartosz Golaszewski --- Documentation/admin-guide/kernel-parameters.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..782dc6d9b7fb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1461,6 +1461,10 @@ Don't use this when you are not running on the android emulator + gpio-mockup.gpio_mockup_ranges + [HW] Sets the ranges of gpiochip of for this device. + Format: ,,,... + gpt [EFI] Forces disk with valid GPT signature but invalid Protective MBR to be treated as GPT. If the primary GPT is corrupted, it enables the backup/alternate @@ -1484,10 +1488,6 @@ Format: such that (rxsize & ~0x1fffc0) == 0. Default: 1024 - gpio-mockup.gpio_mockup_ranges - [HW] Sets the ranges of gpiochip of for this device. - Format: ,,,... - hardlockup_all_cpu_backtrace= [KNL] Should the hard-lockup detector generate backtraces on all cpus. -- cgit v1.2.3 From 6984a320349d61e6bcf3aa03d750a78d70ca98ad Mon Sep 17 00:00:00 2001 From: Alexander Dahl Date: Mon, 29 Mar 2021 13:16:48 +0200 Subject: docs: kernel-parameters: Add gpio_mockup_named_lines Missing since introduced in the driver. Fixes: 8a68ea00a62e ("gpio: mockup: implement naming the lines") Signed-off-by: Alexander Dahl Signed-off-by: Bartosz Golaszewski --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 782dc6d9b7fb..4b12f944ca44 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1464,6 +1464,8 @@ gpio-mockup.gpio_mockup_ranges [HW] Sets the ranges of gpiochip of for this device. Format: ,,,... + gpio-mockup.gpio_mockup_named_lines + [HW] Let the driver know GPIO lines should be named. gpt [EFI] Forces disk with valid GPT signature but invalid Protective MBR to be treated as GPT. If the -- cgit v1.2.3 From b8da5cd4e5f1ce1274140e200a9116b7fe61dd87 Mon Sep 17 00:00:00 2001 From: Axel Rasmussen Date: Tue, 4 May 2021 18:35:53 -0700 Subject: userfaultfd: update documentation to describe minor fault handling Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Link: https://lkml.kernel.org/r/20210301222728.176417-6-axelrasmussen@google.com Signed-off-by: Axel Rasmussen Reviewed-by: Peter Xu Cc: Adam Ruprecht Cc: Alexander Viro Cc: Alexey Dobriyan Cc: Andrea Arcangeli Cc: Anshuman Khandual Cc: Cannon Matthews Cc: Catalin Marinas Cc: Chinwen Chang Cc: David Rientjes Cc: "Dr . David Alan Gilbert" Cc: Huang Ying Cc: Ingo Molnar Cc: Jann Horn Cc: Jerome Glisse Cc: Kirill A. Shutemov Cc: Lokesh Gidra Cc: "Matthew Wilcox (Oracle)" Cc: Michael Ellerman Cc: "Michal Koutn" Cc: Michel Lespinasse Cc: Mike Kravetz Cc: Mike Rapoport Cc: Mina Almasry Cc: Nicholas Piggin Cc: Oliver Upton Cc: Shaohua Li Cc: Shawn Anastasio Cc: Steven Price Cc: Steven Rostedt Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/mm/userfaultfd.rst | 107 +++++++++++++++++---------- 1 file changed, 66 insertions(+), 41 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..3aa38e8b8361 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory + areas. + +The userland application should set the feature flags it intends to use +when invoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which ioctl to choose depends on the kind of page fault, and what we'd +like to do to resolve it: + +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be + resolved by either providing a new page (``UFFDIO_COPY``), or mapping + the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map + the zero page for a missing fault. With userfaultfd, userspace can + decide what content to provide before the faulting thread continues. + +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in + the page cache). Userspace has the option of modifying the page's + contents before resolving the fault. Once the contents are correct + (modified or not), userspace asks the kernel to map the page and let the + faulting thread continue with ``UFFDIO_CONTINUE``. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +147,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges -- cgit v1.2.3 From fa965fd54827a6b6967602051736da9c163b79b7 Mon Sep 17 00:00:00 2001 From: Pavel Tatashin Date: Tue, 4 May 2021 18:39:12 -0700 Subject: memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning Document the special handling of page pinning when ZONE_MOVABLE present. Link: https://lkml.kernel.org/r/20210215161349.246722-11-pasha.tatashin@soleen.com Signed-off-by: Pavel Tatashin Suggested-by: David Hildenbrand Acked-by: Michal Hocko Cc: Dan Williams Cc: David Rientjes Cc: Ingo Molnar Cc: Ira Weiny Cc: James Morris Cc: Jason Gunthorpe Cc: Jason Gunthorpe Cc: John Hubbard Cc: Joonsoo Kim Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Peter Zijlstra Cc: Sasha Levin Cc: Steven Rostedt (VMware) Cc: Tyler Hicks Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/mm/memory-hotplug.rst | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 5307f90738aa..05d51d2d8beb 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -357,6 +357,15 @@ creates ZONE_MOVABLE as following. Unfortunately, there is no information to show which memory block belongs to ZONE_MOVABLE. This is TBD. +.. note:: + Techniques that rely on long-term pinnings of memory (especially, RDMA and + vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory + hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that + memory can still get hot removed - be aware that pinning can fail even if + there is plenty of free memory in ZONE_MOVABLE. In addition, using + ZONE_MOVABLE might make page pinning more expensive, because pages have to be + migrated off that zone first. + .. _memory_hotplug_how_to_offline_memory: How to offline memory -- cgit v1.2.3 From e3a9d9fcc3315993de2e9fcd7ea82fab84433815 Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Tue, 4 May 2021 18:39:48 -0700 Subject: mm,memory_hotplug: add kernel boot option to enable memmap_on_memory Self stored memmap leads to a sparse memory situation which is unsuitable for workloads that requires large contiguous memory chunks, so make this an opt-in which needs to be explicitly enabled. To control this, let memory_hotplug have its own memory space, as suggested by David, so we can add memory_hotplug.memmap_on_memory parameter. Link: https://lkml.kernel.org/r/20210421102701.25051-7-osalvador@suse.de Signed-off-by: Oscar Salvador Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Cc: Anshuman Khandual Cc: Pavel Tatashin Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/kernel-parameters.txt | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1c0a3cf6fcc9..d93fbc1c1917 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2801,6 +2801,23 @@ seconds. Use this parameter to check at some other rate. 0 disables periodic checking. + memory_hotplug.memmap_on_memory + [KNL,X86,ARM] Boolean flag to enable this feature. + Format: {on | off (default)} + When enabled, runtime hotplugged memory will + allocate its internal metadata (struct pages) + from the hotadded memory which will allow to + hotadd a lot of memory without requiring + additional memory to do so. + This feature is disabled by default because it + has some implication on large (e.g. GB) + allocations in some configurations (e.g. small + memory blocks). + The state of the flag can be read in + /sys/module/memory_hotplug/parameters/memmap_on_memory. + Note that even when enabled, there are a few cases where + the feature is not effective. + memtest= [KNL,X86,ARM,PPC] Enable memtest Format: default : 0 -- cgit v1.2.3 From e7cb072eb988e46295512617c39d004f9e1c26f8 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Thu, 6 May 2021 18:05:42 -0700 Subject: init/initramfs.c: do unpacking asynchronously Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3. These two patches are independent, but better-together. The second is a rather trivial patch that simply allows the developer to change "/sbin/modprobe" to something else - e.g. the empty string, so that all request_module() during early boot return -ENOENT early, without even spawning a usermode helper, needlessly synchronizing with the initramfs unpacking. The first patch delegates decompressing the initramfs to a worker thread, allowing do_initcalls() in main.c to proceed to the device_ and late_ initcalls without waiting for that decompression (and populating of rootfs) to finish. Obviously, some of those later calls may rely on the initramfs being available, so I've added synchronization points in the firmware loader and usermodehelper paths - there might be other places that would need this, but so far no one has been able to think of any places I have missed. There's not much to win if most of the functionality needed during boot is only available as modules. But systems with a custom-made .config and initramfs can boot faster, partly due to utilizing more than one cpu earlier, partly by avoiding known-futile modprobe calls (which would still trigger synchronization with the initramfs unpacking, thus eliminating most of the first benefit). This patch (of 2): Most of the boot process doesn't actually need anything from the initramfs, until of course PID1 is to be executed. So instead of doing the decompressing and populating of the initramfs synchronously in populate_rootfs() itself, push that off to a worker thread. This is primarily motivated by an embedded ppc target, where unpacking even the rather modest sized initramfs takes 0.6 seconds, which is long enough that the external watchdog becomes unhappy that it doesn't get attention soon enough. By doing the initramfs decompression in a worker thread, we get to do the device_initcalls and hence start petting the watchdog much sooner. Normal desktops might benefit as well. On my mostly stock Ubuntu kernel, my initramfs is a 26M xz-compressed blob, decompressing to around 126M. That takes almost two seconds: [ 0.201454] Trying to unpack rootfs image as initramfs... [ 1.976633] Freeing initrd memory: 29416K Before this patch, these lines occur consecutively in dmesg. With this patch, the timestamps on these two lines is roughly the same as above, but with 172 lines inbetween - so more than one cpu has been kept busy doing work that would otherwise only happen after the populate_rootfs() finished. Should one of the initcalls done after rootfs_initcall time (i.e., device_ and late_ initcalls) need something from the initramfs (say, a kernel module or a firmware blob), it will simply wait for the initramfs unpacking to be done before proceeding, which should in theory make this completely safe. But if some driver pokes around in the filesystem directly and not via one of the official kernel interfaces (i.e. request_firmware*(), call_usermodehelper*) that theory may not hold - also, I certainly might have missed a spot when sprinkling wait_for_initramfs(). So there is an escape hatch in the form of an initramfs_async= command line parameter. Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Reviewed-by: Luis Chamberlain Cc: Jessica Yu Cc: Borislav Petkov Cc: Jonathan Corbet Cc: Greg Kroah-Hartman Cc: Nick Desaulniers Cc: Takashi Iwai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d93fbc1c1917..7866cc1bd4a9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1833,6 +1833,18 @@ initcall functions. Useful for debugging built-in modules and initcalls. + initramfs_async= [KNL] + Format: + Default: 1 + This parameter controls whether the initramfs + image is unpacked asynchronously, concurrently + with devices being probed and + initialized. This should normally just work, + but as a debugging aid, one can get the + historical behaviour of the initramfs + unpacking being completed before device_ and + late_ initcalls. + initrd= [BOOT] Specify the location of the initial ramdisk initrdmem= [KNL] Specify a physical address and size from which to -- cgit v1.2.3 From bbcd53c960713507ae764bf81970651b5577b95a Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 6 May 2021 18:05:55 -0700 Subject: drivers/char: remove /dev/kmem for good Patch series "drivers/char: remove /dev/kmem for good". Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and memory ballooning, I started questioning the existence of /dev/kmem. Comparing it with the /proc/kcore implementation, it does not seem to be able to deal with things like a) Pages unmapped from the direct mapping (e.g., to be used by secretmem) -> kern_addr_valid(). virt_addr_valid() is not sufficient. b) Special cases like gart aperture memory that is not to be touched -> mem_pfn_is_ram() Unless I am missing something, it's at least broken in some cases and might fault/crash the machine. Looks like its existence has been questioned before in 2005 and 2010 [1], after ~11 additional years, it might make sense to revive the discussion. CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by mistake?). All distributions disable it: in Ubuntu it has been disabled for more than 10 years, in Debian since 2.6.31, in Fedora at least starting with FC3, in RHEL starting with RHEL4, in SUSE starting from 15sp2, and OpenSUSE has it disabled as well. 1) /dev/kmem was popular for rootkits [2] before it got disabled basically everywhere. Ubuntu documents [3] "There is no modern user of /dev/kmem any more beyond attackers using it to load kernel rootkits.". RHEL documents in a BZ [5] "it served no practical purpose other than to serve as a potential security problem or to enable binary module drivers to access structures/functions they shouldn't be touching" 2) /proc/kcore is a decent interface to have a controlled way to read kernel memory for debugging puposes. (will need some extensions to deal with memory offlining/unplug, memory ballooning, and poisoned pages, though) 3) It might be useful for corner case debugging [1]. KDB/KGDB might be a better fit, especially, to write random memory; harder to shoot yourself into the foot. 4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems to be incompatible with 64bit [1]. For educational purposes, /proc/kcore might be used to monitor value updates -- or older kernels can be used. 5) It's broken on arm64, and therefore, completely disabled there. Looks like it's essentially unused and has been replaced by better suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's just remove it. [1] https://lwn.net/Articles/147901/ [2] https://www.linuxjournal.com/article/10505 [3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled [4] https://sourceforge.net/projects/kme/ [5] https://bugzilla.redhat.com/show_bug.cgi?id=154796 Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Michal Hocko Acked-by: Kees Cook Cc: Linus Torvalds Cc: Greg Kroah-Hartman Cc: "Alexander A. Klimov" Cc: Alexander Viro Cc: Alexandre Belloni Cc: Andrew Lunn Cc: Andrey Zhizhikin Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Brian Cain Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Chris Zankel Cc: Corentin Labbe Cc: "David S. Miller" Cc: "Eric W. Biederman" Cc: Geert Uytterhoeven Cc: Gerald Schaefer Cc: Greentime Hu Cc: Gregory Clement Cc: Heiko Carstens Cc: Helge Deller Cc: Hillf Danton Cc: huang ying Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: "James E.J. Bottomley" Cc: James Troup Cc: Jiaxun Yang Cc: Jonas Bonn Cc: Jonathan Corbet Cc: Kairui Song Cc: Krzysztof Kozlowski Cc: Kuninori Morimoto Cc: Liviu Dudau Cc: Lorenzo Pieralisi Cc: Luc Van Oostenryck Cc: Luis Chamberlain Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Michael Ellerman Cc: Mike Rapoport Cc: Mikulas Patocka Cc: Minchan Kim Cc: Niklas Schnelle Cc: Oleksiy Avramchenko Cc: openrisc@lists.librecores.org Cc: Palmer Dabbelt Cc: Paul Mackerras Cc: "Pavel Machek (CIP)" Cc: Pavel Machek Cc: "Peter Zijlstra (Intel)" Cc: Pierre Morel Cc: Randy Dunlap Cc: Richard Henderson Cc: Rich Felker Cc: Robert Richter Cc: Rob Herring Cc: Russell King Cc: Sam Ravnborg Cc: Sebastian Andrzej Siewior Cc: Sebastian Hesselbarth Cc: sparclinux@vger.kernel.org Cc: Stafford Horne Cc: Stefan Kristiansson Cc: Steven Rostedt Cc: Sudeep Holla Cc: Theodore Dubois Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Viresh Kumar Cc: William Cohen Cc: Xiaoming Ni Cc: Yoshinori Sato Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/devices.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt index ef41f77cb979..9c2be821c225 100644 --- a/Documentation/admin-guide/devices.txt +++ b/Documentation/admin-guide/devices.txt @@ -4,7 +4,7 @@ 1 char Memory devices 1 = /dev/mem Physical memory access - 2 = /dev/kmem Kernel virtual memory access + 2 = /dev/kmem OBSOLETE - replaced by /proc/kcore 3 = /dev/null Null device 4 = /dev/port I/O port access 5 = /dev/zero Null byte source -- cgit v1.2.3