From 4c1d61389e8e4307449eb2ebad997241cbf08fef Mon Sep 17 00:00:00 2001 From: Richard Chang Date: Mon, 1 Dec 2025 18:47:49 +0900 Subject: zram: introduce writeback_compressed device attribute Introduce witeback_compressed device attribute to toggle compressed writeback (decompression on demand) feature. [senozhatsky@chromium.org: rewrote original patch, added documentation] Link: https://lkml.kernel.org/r/20251201094754.4149975-3-senozhatsky@chromium.org Signed-off-by: Richard Chang Co-developed-by: Sergey Senozhatsky Signed-off-by: Sergey Senozhatsky Cc: Brian Geffon Cc: David Stevens Cc: Minchan Kim Signed-off-by: Andrew Morton --- Documentation/admin-guide/blockdev/zram.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 3e273c1bb749..9547e4e95979 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -214,6 +214,7 @@ mem_limit WO specifies the maximum amount of memory ZRAM can writeback_limit WO specifies the maximum amount of write IO zram can write out to backing device as 4KB unit writeback_limit_enable RW show and set writeback_limit feature +writeback_compressed RW show and set compressed writeback feature comp_algorithm RW show and change the compression algorithm algorithm_params WO setup compression algorithm parameters compact WO trigger memory compaction @@ -434,6 +435,18 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback happened until you reset the zram to allocate extra writeback budget in next setting is user's job. +By default zram stores written back pages in decompressed (raw) form, which +means that writeback operation involves decompression of the page before +writing it to the backing device. This behavior can be changed by enabling +`writeback_compressed` feature, which causes zram to write compressed pages +to the backing device, thus avoiding decompression overhead. To enable +this feature, execute:: + + $ echo yes > /sys/block/zramX/writeback_compressed + +Note that this feature should be configured before the `zramX` device is +initialized. + If admin wants to measure writeback count in a certain period, they could know it via /sys/block/zram0/bd_stat's 3rd column. -- cgit v1.2.3 From 2502673aed6c66befc7efc2dc008e2a8a50508cd Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 1 Dec 2025 18:47:50 +0900 Subject: zram: document writeback_batch_size Add missing writeback_batch_size documentation. Link: https://lkml.kernel.org/r/20251201094754.4149975-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Brian Geffon Cc: David Stevens Cc: Minchan Kim Cc: Richard Chang Signed-off-by: Andrew Morton --- Documentation/admin-guide/blockdev/zram.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 9547e4e95979..94bb7f2245ee 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -214,6 +214,8 @@ mem_limit WO specifies the maximum amount of memory ZRAM can writeback_limit WO specifies the maximum amount of write IO zram can write out to backing device as 4KB unit writeback_limit_enable RW show and set writeback_limit feature +writeback_batch_size RW show and set maximum number of in-flight + writeback operations writeback_compressed RW show and set compressed writeback feature comp_algorithm RW show and change the compression algorithm algorithm_params WO setup compression algorithm parameters @@ -223,7 +225,6 @@ backing_dev RW set up backend storage for zram to write out idle WO mark allocated slot as idle ====================== ====== =============================================== - User space is advised to use the following files to read the device statistics. File /sys/block/zram/stat @@ -447,6 +448,14 @@ this feature, execute:: Note that this feature should be configured before the `zramX` device is initialized. +Depending on backing device storage type, writeback operation may benefit +from a higher number of in-flight write requests (batched writes). The +number of maximum in-flight writeback operations can be configured via +`writeback_batch_size` attribute. To change the default value (which is 32), +execute:: + + $ echo 64 > /sys/block/zramX/writeback_batch_size + If admin wants to measure writeback count in a certain period, they could know it via /sys/block/zram0/bd_stat's 3rd column. -- cgit v1.2.3 From 0b43f89e2d9a18d1a2373064f70bc730180b70f2 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 16 Dec 2025 00:01:17 -0800 Subject: Docs/admin-guide/mm/damon/usage: update for nr_snapshots damos stat Update DAMON usage document for the newly added damos stat, nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: "Masami Hiramatsu (Google)" Cc: Mathieu Desnoyers Cc: Michal Hocko Cc: Mike Rapoport Cc: Steven Rostedt Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 9991dad60fcf..d0944bd78964 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -87,7 +87,7 @@ comma (","). │ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max │ │ │ │ │ │ │ :ref:`dests `/nr_dests │ │ │ │ │ │ │ │ 0/id,weight - │ │ │ │ │ │ │ :ref:`stats `/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds + │ │ │ │ │ │ │ :ref:`stats `/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds,nr_snapshots │ │ │ │ │ │ │ :ref:`tried_regions `/total_bytes │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age,sz_filter_passed │ │ │ │ │ │ │ │ ... @@ -543,9 +543,9 @@ online analysis or tuning of the schemes. Refer to :ref:`design doc The statistics can be retrieved by reading the files under ``stats`` directory (``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, -``sz_ops_filter_passed``, and ``qt_exceeds``), respectively. The files are not -updated in real time, so you should ask DAMON sysfs interface to update the -content of the files for the stats by writing a special keyword, +``sz_ops_filter_passed``, ``qt_exceeds`` and ``nr_snapshots``), respectively. +The files are not updated in real time, so you should ask DAMON sysfs interface +to update the content of the files for the stats by writing a special keyword, ``update_schemes_stats`` to the relevant ``kdamonds//state`` file. .. _sysfs_schemes_tried_regions: -- cgit v1.2.3 From 2584dd7496c53135287d3a4b2e0699fe386df015 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 16 Dec 2025 00:01:23 -0800 Subject: Docs/admin-guide/mm/damon/usage: update for max_nr_snapshots Update DAMON usage document for the newly added DAMON sysfs interface file, max_nr_snapshots. Link: https://lkml.kernel.org/r/20251216080128.42991-11-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: "Masami Hiramatsu (Google)" Cc: Mathieu Desnoyers Cc: Michal Hocko Cc: Mike Rapoport Cc: Steven Rostedt Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index d0944bd78964..7da4c002cb39 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -87,7 +87,7 @@ comma (","). │ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max │ │ │ │ │ │ │ :ref:`dests `/nr_dests │ │ │ │ │ │ │ │ 0/id,weight - │ │ │ │ │ │ │ :ref:`stats `/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds,nr_snapshots + │ │ │ │ │ │ │ :ref:`stats `/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds,nr_snapshots,max_nr_snapshots │ │ │ │ │ │ │ :ref:`tried_regions `/total_bytes │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age,sz_filter_passed │ │ │ │ │ │ │ │ ... @@ -543,10 +543,11 @@ online analysis or tuning of the schemes. Refer to :ref:`design doc The statistics can be retrieved by reading the files under ``stats`` directory (``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, -``sz_ops_filter_passed``, ``qt_exceeds`` and ``nr_snapshots``), respectively. -The files are not updated in real time, so you should ask DAMON sysfs interface -to update the content of the files for the stats by writing a special keyword, -``update_schemes_stats`` to the relevant ``kdamonds//state`` file. +``sz_ops_filter_passed``, ``qt_exceeds``, ``nr_snapshots`` and +``max_nr_snapshots``), respectively. The files are not updated in real time, +so you should ask DAMON sysfs interface to update the content of the files for +the stats by writing a special keyword, ``update_schemes_stats`` to the +relevant ``kdamonds//state`` file. .. _sysfs_schemes_tried_regions: -- cgit v1.2.3 From 64dd89ae01f2708a508e028c28b7906e4702a9a7 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 15 Dec 2025 12:57:53 -0500 Subject: mm/block/fs: remove laptop_mode Laptop mode was introduced to save battery, by delaying and consolidating writes and thereby maximize the time rotating hard drives wouldn't have to spin. Luckily, rotating hard drives, with their high spin-up times and power draw, are a thing of the past for battery-powered devices. Reclaim has also since changed to not write single filesystem pages anymore, and regular filesystem writeback is lumpy by design. The juice doesn't appear worth the squeeze anymore. The footprint of the feature is small, but nevertheless it's a complicating factor in mm, block, filesystems. Developers don't think about it, and it likely hasn't been tested with new reclaim and writeback changes in years. Let's sunset it. Keep the sysctl with a deprecation warning around for a few more cycles, but remove all functionality behind it. [akpm@linux-foundation.org: fix Documentation/admin-guide/laptops/index.rst] Link: https://lkml.kernel.org/r/20251216185201.GH905277@cmpxchg.org Signed-off-by: Johannes Weiner Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Acked-by: Jens Axboe Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Deepanshu Kartikey Signed-off-by: Andrew Morton --- Documentation/admin-guide/laptops/index.rst | 1 - Documentation/admin-guide/laptops/laptop-mode.rst | 770 ---------------------- Documentation/admin-guide/sysctl/vm.rst | 8 - 3 files changed, 779 deletions(-) delete mode 100644 Documentation/admin-guide/laptops/laptop-mode.rst (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/laptops/index.rst b/Documentation/admin-guide/laptops/index.rst index 6432c251dc95..c0b911d05c59 100644 --- a/Documentation/admin-guide/laptops/index.rst +++ b/Documentation/admin-guide/laptops/index.rst @@ -10,7 +10,6 @@ Laptop Drivers alienware-wmi asus-laptop disk-shock-protection - laptop-mode lg-laptop samsung-galaxybook sony-laptop diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst deleted file mode 100644 index 66eb9cd918b5..000000000000 --- a/Documentation/admin-guide/laptops/laptop-mode.rst +++ /dev/null @@ -1,770 +0,0 @@ -=============================================== -How to conserve battery power using laptop-mode -=============================================== - -Document Author: Bart Samwel (bart@samwel.tk) - -Date created: January 2, 2004 - -Last modified: December 06, 2004 - -Introduction ------------- - -Laptop mode is used to minimize the time that the hard disk needs to be spun up, -to conserve battery power on laptops. It has been reported to cause significant -power savings. - -.. Contents - - * Introduction - * Installation - * Caveats - * The Details - * Tips & Tricks - * Control script - * ACPI integration - * Monitoring tool - - -Installation ------------- - -To use laptop mode, you don't need to set any kernel configuration options -or anything. Simply install all the files included in this document, and -laptop mode will automatically be started when you're on battery. For -your convenience, a tarball containing an installer can be downloaded at: - - http://www.samwel.tk/laptop_mode/laptop_mode/ - -To configure laptop mode, you need to edit the configuration file, which is -located in /etc/default/laptop-mode on Debian-based systems, or in -/etc/sysconfig/laptop-mode on other systems. - -Unfortunately, automatic enabling of laptop mode does not work for -laptops that don't have ACPI. On those laptops, you need to start laptop -mode manually. To start laptop mode, run "laptop_mode start", and to -stop it, run "laptop_mode stop". (Note: The laptop mode tools package now -has experimental support for APM, you might want to try that first.) - - -Caveats -------- - -* The downside of laptop mode is that you have a chance of losing up to 10 - minutes of work. If you cannot afford this, don't use it! The supplied ACPI - scripts automatically turn off laptop mode when the battery almost runs out, - so that you won't lose any data at the end of your battery life. - -* Most desktop hard drives have a very limited lifetime measured in spindown - cycles, typically about 50.000 times (it's usually listed on the spec sheet). - Check your drive's rating, and don't wear down your drive's lifetime if you - don't need to. - -* If you mount some of your ext3 filesystems with the -n option, then - the control script will not be able to remount them correctly. You must set - DO_REMOUNTS=0 in the control script, otherwise it will remount them with the - wrong options -- or it will fail because it cannot write to /etc/mtab. - -* If you have your filesystems listed as type "auto" in fstab, like I did, then - the control script will not recognize them as filesystems that need remounting. - You must list the filesystems with their true type instead. - -* It has been reported that some versions of the mutt mail client use file access - times to determine whether a folder contains new mail. If you use mutt and - experience this, you must disable the noatime remounting by setting the option - DO_REMOUNT_NOATIME to 0 in the configuration file. - - -The Details ------------ - -Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is -present for all kernels that have the laptop mode patch, regardless of any -configuration options. When the knob is set, any physical disk I/O (that might -have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The -result of this is that after a disk has spun down, it will not be spun up -anymore to write dirty blocks, because those blocks had already been written -immediately after the most recent read operation. The value of the laptop_mode -knob determines the time between the occurrence of disk I/O and when the flush -is triggered. A sensible value for the knob is 5 seconds. Setting the knob to -0 disables laptop mode. - -To increase the effectiveness of the laptop_mode strategy, the laptop_mode -control script increases dirty_expire_centisecs and dirty_writeback_centisecs in -/proc/sys/vm to about 10 minutes (by default), which means that pages that are -dirtied are not forced to be written to disk as often. The control script also -changes the dirty background ratio, so that background writeback of dirty pages -is not done anymore. Combined with a higher commit value (also 10 minutes) for -ext3 filesystem (also done automatically by the control script), -this results in concentration of disk activity in a small time interval which -occurs only once every 10 minutes, or whenever the disk is forced to spin up by -a cache miss. The disk can then be spun down in the periods of inactivity. - - -Configuration -------------- - -The laptop mode configuration file is located in /etc/default/laptop-mode on -Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It -contains the following options: - -MAX_AGE: - -Maximum time, in seconds, of hard drive spindown time that you are -comfortable with. Worst case, it's possible that you could lose this -amount of work if your battery fails while you're in laptop mode. - -MINIMUM_BATTERY_MINUTES: - -Automatically disable laptop mode if the remaining number of minutes of -battery power is less than this value. Default is 10 minutes. - -AC_HD/BATT_HD: - -The idle timeout that should be set on your hard drive when laptop mode -is active (BATT_HD) and when it is not active (AC_HD). The defaults are -20 seconds (value 4) for BATT_HD and 2 hours (value 244) for AC_HD. The -possible values are those listed in the manual page for "hdparm" for the -"-S" option. - -HD: - -The devices for which the spindown timeout should be adjusted by laptop mode. -Default is /dev/hda. If you specify multiple devices, separate them by a space. - -READAHEAD: - -Disk readahead, in 512-byte sectors, while laptop mode is active. A large -readahead can prevent disk accesses for things like executable pages (which are -loaded on demand while the application executes) and sequentially accessed data -(MP3s). - -DO_REMOUNTS: - -The control script automatically remounts any mounted journaled filesystems -with appropriate commit interval options. When this option is set to 0, this -feature is disabled. - -DO_REMOUNT_NOATIME: - -When remounting, should the filesystems be remounted with the noatime option? -Normally, this is set to "1" (enabled), but there may be programs that require -access time recording. - -DIRTY_RATIO: - -The percentage of memory that is allowed to contain "dirty" or unsaved data -before a writeback is forced, while laptop mode is active. Corresponds to -the /proc/sys/vm/dirty_ratio sysctl. - -DIRTY_BACKGROUND_RATIO: - -The percentage of memory that is allowed to contain "dirty" or unsaved data -after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set -this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio -sysctl. - -Note that the behaviour of dirty_background_ratio is quite different -when laptop mode is active and when it isn't. When laptop mode is inactive, -dirty_background_ratio is the threshold percentage at which background writeouts -start taking place. When laptop mode is active, however, background writeouts -are disabled, and the dirty_background_ratio only determines how much writeback -is done when dirty_ratio is reached. - -DO_CPU: - -Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup. -See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.) - -CPU_MAXFREQ: - -When on battery, what is the maximum CPU speed that the system should use? Legal -values are "slowest" for the slowest speed that your CPU is able to operate at, -or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies. - - -Tips & Tricks -------------- - -* Bartek Kania reports getting up to 50 minutes of extra battery life (on top - of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1). - -* You can spin down the disk while playing MP3, by setting disk readahead - to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at - once, and will then spin down while the MP3 is playing. (Thanks to Bartek - Kania.) - -* Drew Scott Daniels observed: "I don't know why, but when I decrease the number - of colours that my display uses it consumes less battery power. I've seen - this on powerbooks too. I hope that this is a piece of information that - might be useful to the Laptop Mode patch or its users." - -* In syslog.conf, you can prefix entries with a dash `-` to omit syncing the - file after every logging. When you're using laptop-mode and your disk doesn't - spin down, this is a likely culprit. - -* Richard Atterer observed that laptop mode does not work well with noflushd - (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode - from doing its thing. - -* If you're worried about your data, you might want to consider using a USB - memory stick or something like that as a "working area". (Be aware though - that flash memory can only handle a limited number of writes, and overuse - may wear out your memory stick pretty quickly. Do _not_ use journalling - filesystems on flash memory sticks.) - - -Configuration file for control and ACPI battery scripts -------------------------------------------------------- - -This allows the tunables to be changed for the scripts via an external -configuration file - -It should be installed as /etc/default/laptop-mode on Debian, and as -/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes. - -Config file:: - - # Maximum time, in seconds, of hard drive spindown time that you are - # comfortable with. Worst case, it's possible that you could lose this - # amount of work if your battery fails you while in laptop mode. - #MAX_AGE=600 - - # Automatically disable laptop mode when the number of minutes of battery - # that you have left goes below this threshold. - MINIMUM_BATTERY_MINUTES=10 - - # Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG - # by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk - # will read a complete MP3 at once, and will then spin down while the MP3/OGG is - # playing. - #READAHEAD=4096 - - # Shall we remount journaled fs. with appropriate commit interval? (1=yes) - #DO_REMOUNTS=1 - - # And shall we add the "noatime" option to that as well? (1=yes) - #DO_REMOUNT_NOATIME=1 - - # Dirty synchronous ratio. At this percentage of dirty pages the process - # which - # calls write() does its own writeback - #DIRTY_RATIO=40 - - # - # Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been - # exceeded, the kernel will wake flusher threads which will then reduce the - # amount of dirty memory to dirty_background_ratio. Set this nice and low, - # so once some writeout has commenced, we do a lot of it. - # - #DIRTY_BACKGROUND_RATIO=5 - - # kernel default dirty buffer age - #DEF_AGE=30 - #DEF_UPDATE=5 - #DEF_DIRTY_BACKGROUND_RATIO=10 - #DEF_DIRTY_RATIO=40 - #DEF_XFS_AGE_BUFFER=15 - #DEF_XFS_SYNC_INTERVAL=30 - #DEF_XFS_BUFD_INTERVAL=1 - - # This must be adjusted manually to the value of HZ in the running kernel - # on 2.4, until the XFS people change their 2.4 external interfaces to work in - # centisecs. This can be automated, but it's a work in progress that still - # needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for - # external interfaces, and that is currently always set to 100. So you don't - # need to change this on 2.6. - #XFS_HZ=100 - - # Should the maximum CPU frequency be adjusted down while on battery? - # Requires CPUFreq to be setup. - # See Documentation/admin-guide/pm/cpufreq.rst for more info - #DO_CPU=0 - - # When on battery what is the maximum CPU speed that the system should - # use? Legal values are "slowest" for the slowest speed that your - # CPU is able to operate at, or a value listed in: - # /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies - # Only applicable if DO_CPU=1. - #CPU_MAXFREQ=slowest - - # Idle timeout for your hard drive (man hdparm for valid values, -S option) - # Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4). - #AC_HD=244 - #BATT_HD=4 - - # The drives for which to adjust the idle timeout. Separate them by a space, - # e.g. HD="/dev/hda /dev/hdb". - #HD="/dev/hda" - - # Set the spindown timeout on a hard drive? - #DO_HD=1 - - -Control script --------------- - -Please note that this control script works for the Linux 2.4 and 2.6 series (thanks -to Kiko Piris). - -Control script:: - - #!/bin/bash - - # start or stop laptop_mode, best run by a power management daemon when - # ac gets connected/disconnected from a laptop - # - # install as /sbin/laptop_mode - # - # Contributors to this script: Kiko Piris - # Bart Samwel - # Micha Feigin - # Andrew Morton - # Herve Eychenne - # Dax Kelson - # - # Original Linux 2.4 version by: Jens Axboe - - ############################################################################# - - # Source config - if [ -f /etc/default/laptop-mode ] ; then - # Debian - . /etc/default/laptop-mode - elif [ -f /etc/sysconfig/laptop-mode ] ; then - # Others - . /etc/sysconfig/laptop-mode - fi - - # Don't raise an error if the config file is incomplete - # set defaults instead: - - # Maximum time, in seconds, of hard drive spindown time that you are - # comfortable with. Worst case, it's possible that you could lose this - # amount of work if your battery fails you while in laptop mode. - MAX_AGE=${MAX_AGE:-'600'} - - # Read-ahead, in kilobytes - READAHEAD=${READAHEAD:-'4096'} - - # Shall we remount journaled fs. with appropriate commit interval? (1=yes) - DO_REMOUNTS=${DO_REMOUNTS:-'1'} - - # And shall we add the "noatime" option to that as well? (1=yes) - DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'} - - # Shall we adjust the idle timeout on a hard drive? - DO_HD=${DO_HD:-'1'} - - # Adjust idle timeout on which hard drive? - HD="${HD:-'/dev/hda'}" - - # spindown time for HD (hdparm -S values) - AC_HD=${AC_HD:-'244'} - BATT_HD=${BATT_HD:-'4'} - - # Dirty synchronous ratio. At this percentage of dirty pages the process which - # calls write() does its own writeback - DIRTY_RATIO=${DIRTY_RATIO:-'40'} - - # cpu frequency scaling - # See Documentation/admin-guide/pm/cpufreq.rst for more info - DO_CPU=${CPU_MANAGE:-'0'} - CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'} - - # - # Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been - # exceeded, the kernel will wake flusher threads which will then reduce the - # amount of dirty memory to dirty_background_ratio. Set this nice and low, - # so once some writeout has commenced, we do a lot of it. - # - DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'} - - # kernel default dirty buffer age - DEF_AGE=${DEF_AGE:-'30'} - DEF_UPDATE=${DEF_UPDATE:-'5'} - DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'} - DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'} - DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'} - DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'} - DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'} - - # This must be adjusted manually to the value of HZ in the running kernel - # on 2.4, until the XFS people change their 2.4 external interfaces to work in - # centisecs. This can be automated, but it's a work in progress that still needs - # some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external - # interfaces, and that is currently always set to 100. So you don't need to - # change this on 2.6. - XFS_HZ=${XFS_HZ:-'100'} - - ############################################################################# - - KLEVEL="$(uname -r | - { - IFS='.' read a b c - echo $a.$b - } - )" - case "$KLEVEL" in - "2.4"|"2.6") - ;; - *) - echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2 - exit 1 - ;; - esac - - if [ ! -e /proc/sys/vm/laptop_mode ] ; then - echo "Kernel is not patched with laptop_mode patch." >&2 - exit 1 - fi - - if [ ! -w /proc/sys/vm/laptop_mode ] ; then - echo "You do not have enough privileges to enable laptop_mode." >&2 - exit 1 - fi - - # Remove an option (the first parameter) of the form option= from - # a mount options string (the rest of the parameters). - parse_mount_opts () { - OPT="$1" - shift - echo ",$*," | sed \ - -e 's/,'"$OPT"'=[0-9]*,/,/g' \ - -e 's/,,*/,/g' \ - -e 's/^,//' \ - -e 's/,$//' - } - - # Remove an option (the first parameter) without any arguments from - # a mount option string (the rest of the parameters). - parse_nonumber_mount_opts () { - OPT="$1" - shift - echo ",$*," | sed \ - -e 's/,'"$OPT"',/,/g' \ - -e 's/,,*/,/g' \ - -e 's/^,//' \ - -e 's/,$//' - } - - # Find out the state of a yes/no option (e.g. "atime"/"noatime") in - # fstab for a given filesystem, and use this state to replace the - # value of the option in another mount options string. The device - # is the first argument, the option name the second, and the default - # value the third. The remainder is the mount options string. - # - # Example: - # parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime - # - # If fstab contains, say, "rw" for this filesystem, then the result - # will be "defaults,atime". - parse_yesno_opts_wfstab () { - L_DEV="$1" - OPT="$2" - DEF_OPT="$3" - shift 3 - L_OPTS="$*" - PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)" - PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)" - # Watch for a default atime in fstab - FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)" - if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then - # option specified in fstab: extract the value and use it - if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then - echo "$PARSEDOPTS1,no$OPT" - else - # no$OPT not found -- so we must have $OPT. - echo "$PARSEDOPTS1,$OPT" - fi - else - # option not specified in fstab -- choose the default. - echo "$PARSEDOPTS1,$DEF_OPT" - fi - } - - # Find out the state of a numbered option (e.g. "commit=NNN") in - # fstab for a given filesystem, and use this state to replace the - # value of the option in another mount options string. The device - # is the first argument, and the option name the second. The - # remainder is the mount options string in which the replacement - # must be done. - # - # Example: - # parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7 - # - # If fstab contains, say, "commit=3,rw" for this filesystem, then the - # result will be "rw,commit=3". - parse_mount_opts_wfstab () { - L_DEV="$1" - OPT="$2" - shift 2 - L_OPTS="$*" - PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)" - # Watch for a default commit in fstab - FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)" - if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then - # option specified in fstab: extract the value, and use it - echo -n "$PARSEDOPTS1,$OPT=" - echo ",$FSTAB_OPTS," | sed \ - -e 's/.*,'"$OPT"'=//' \ - -e 's/,.*//' - else - # option not specified in fstab: set it to 0 - echo "$PARSEDOPTS1,$OPT=0" - fi - } - - deduce_fstype () { - MP="$1" - # My root filesystem unfortunately has - # type "unknown" in /etc/mtab. If we encounter - # "unknown", we try to get the type from fstab. - cat /etc/fstab | - grep -v '^#' | - while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do - if [ "$FSTAB_MP" = "$MP" ]; then - echo $FSTAB_FST - exit 0 - fi - done - } - - if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then - NOATIME_OPT=",noatime" - fi - - case "$1" in - start) - AGE=$((100*$MAX_AGE)) - XFS_AGE=$(($XFS_HZ*$MAX_AGE)) - echo -n "Starting laptop_mode" - - if [ -d /proc/sys/vm/pagebuf ] ; then - # (For 2.4 and early 2.6.) - # This only needs to be set, not reset -- it is only used when - # laptop mode is enabled. - echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age - echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval - elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then - # (A couple of early 2.6 laptop mode patches had these.) - # The same goes for these. - echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer - echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval - elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then - # (2.6.6) - # But not for these -- they are also used in normal - # operation. - echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer - echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval - elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then - # (2.6.7 upwards) - # And not for these either. These are in centisecs, - # not USER_HZ, so we have to use $AGE, not $XFS_AGE. - echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs - echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs - echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs - fi - - case "$KLEVEL" in - "2.4") - echo 1 > /proc/sys/vm/laptop_mode - echo "30 500 0 0 $AGE $AGE 60 20 0" > /proc/sys/vm/bdflush - ;; - "2.6") - echo 5 > /proc/sys/vm/laptop_mode - echo "$AGE" > /proc/sys/vm/dirty_writeback_centisecs - echo "$AGE" > /proc/sys/vm/dirty_expire_centisecs - echo "$DIRTY_RATIO" > /proc/sys/vm/dirty_ratio - echo "$DIRTY_BACKGROUND_RATIO" > /proc/sys/vm/dirty_background_ratio - ;; - esac - if [ $DO_REMOUNTS -eq 1 ]; then - cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do - PARSEDOPTS="$(parse_mount_opts "$OPTS")" - if [ "$FST" = 'unknown' ]; then - FST=$(deduce_fstype $MP) - fi - case "$FST" in - "ext3") - PARSEDOPTS="$(parse_mount_opts commit "$OPTS")" - mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT - ;; - "xfs") - mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT - ;; - esac - if [ -b $DEV ] ; then - blockdev --setra $(($READAHEAD * 2)) $DEV - fi - done - fi - if [ $DO_HD -eq 1 ] ; then - for THISHD in $HD ; do - /sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1 - /sbin/hdparm -B 1 $THISHD > /dev/null 2>&1 - done - fi - if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then - if [ $CPU_MAXFREQ = 'slowest' ]; then - CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq` - fi - echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq - fi - echo "." - ;; - stop) - U_AGE=$((100*$DEF_UPDATE)) - B_AGE=$((100*$DEF_AGE)) - echo -n "Stopping laptop_mode" - echo 0 > /proc/sys/vm/laptop_mode - if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then - # These need to be restored, if there are no lm_*. - echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER)) > /proc/sys/fs/xfs/age_buffer - echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) > /proc/sys/fs/xfs/sync_interval - elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then - # These need to be restored as well. - echo $((100*$DEF_XFS_AGE_BUFFER)) > /proc/sys/fs/xfs/age_buffer_centisecs - echo $((100*$DEF_XFS_SYNC_INTERVAL)) > /proc/sys/fs/xfs/xfssyncd_centisecs - echo $((100*$DEF_XFS_BUFD_INTERVAL)) > /proc/sys/fs/xfs/xfsbufd_centisecs - fi - case "$KLEVEL" in - "2.4") - echo "30 500 0 0 $U_AGE $B_AGE 60 20 0" > /proc/sys/vm/bdflush - ;; - "2.6") - echo "$U_AGE" > /proc/sys/vm/dirty_writeback_centisecs - echo "$B_AGE" > /proc/sys/vm/dirty_expire_centisecs - echo "$DEF_DIRTY_RATIO" > /proc/sys/vm/dirty_ratio - echo "$DEF_DIRTY_BACKGROUND_RATIO" > /proc/sys/vm/dirty_background_ratio - ;; - esac - if [ $DO_REMOUNTS -eq 1 ] ; then - cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do - # Reset commit and atime options to defaults. - if [ "$FST" = 'unknown' ]; then - FST=$(deduce_fstype $MP) - fi - case "$FST" in - "ext3") - PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)" - PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)" - mount $DEV -t $FST $MP -o remount,$PARSEDOPTS - ;; - "xfs") - PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)" - mount $DEV -t $FST $MP -o remount,$PARSEDOPTS - ;; - esac - if [ -b $DEV ] ; then - blockdev --setra 256 $DEV - fi - done - fi - if [ $DO_HD -eq 1 ] ; then - for THISHD in $HD ; do - /sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1 - /sbin/hdparm -B 255 $THISHD > /dev/null 2>&1 - done - fi - if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then - echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq - fi - echo "." - ;; - *) - echo "Usage: $0 {start|stop}" 2>&1 - exit 1 - ;; - - esac - - exit 0 - - -ACPI integration ----------------- - -Dax Kelson submitted this so that the ACPI acpid daemon will -kick off the laptop_mode script and run hdparm. The part that -automatically disables laptop mode when the battery is low was -written by Jan Topinski. - -/etc/acpi/events/ac_adapter:: - - event=ac_adapter - action=/etc/acpi/actions/ac.sh %e - -/etc/acpi/events/battery:: - - event=battery.* - action=/etc/acpi/actions/battery.sh %e - -/etc/acpi/actions/ac.sh:: - - #!/bin/bash - - # ac on/offline event handler - - status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state` - - case $status in - "on-line") - /sbin/laptop_mode stop - exit 0 - ;; - "off-line") - /sbin/laptop_mode start - exit 0 - ;; - esac - - -/etc/acpi/actions/battery.sh:: - - #! /bin/bash - - # Automatically disable laptop mode when the battery almost runs out. - - BATT_INFO=/proc/acpi/battery/$2/state - - if [[ -f /proc/sys/vm/laptop_mode ]] - then - LM=`cat /proc/sys/vm/laptop_mode` - if [[ $LM -gt 0 ]] - then - if [[ -f $BATT_INFO ]] - then - # Source the config file only now that we know we need - if [ -f /etc/default/laptop-mode ] ; then - # Debian - . /etc/default/laptop-mode - elif [ -f /etc/sysconfig/laptop-mode ] ; then - # Others - . /etc/sysconfig/laptop-mode - fi - MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'} - - ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`" - if [[ ACTION -eq "discharging" ]] - then - PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed "s/.* \([0-9][0-9]* \).*/\1/" ` - REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed "s/.* \([0-9][0-9]* \).*/\1/" ` - fi - if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES)) - then - /sbin/laptop_mode stop - fi - else - logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path." - fi - fi - fi - - -Monitoring tool ---------------- - -Bartek Kania submitted this, it can be used to measure how much time your disk -spends spun up/down. See tools/laptop/dslm/dslm.c diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 245bf6394935..ca6ebeb5171c 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -41,7 +41,6 @@ Currently, these files are in /proc/sys/vm: - extfrag_threshold - highmem_is_dirtyable - hugetlb_shm_group -- laptop_mode - legacy_va_layout - lowmem_reserve_ratio - max_map_count @@ -363,13 +362,6 @@ hugetlb_shm_group contains group id that is allowed to create SysV shared memory segment using hugetlb page. -laptop_mode -=========== - -laptop_mode is a knob that controls "laptop mode". All the things that are -controlled by this knob are discussed in Documentation/admin-guide/laptops/laptop-mode.rst. - - legacy_va_layout ================ -- cgit v1.2.3 From 9e80e66ddaf736e5ca80cba8adf8d497bd53092f Mon Sep 17 00:00:00 2001 From: Gregory Price Date: Sun, 21 Dec 2025 07:56:03 -0500 Subject: mm, hugetlb: implement movable_gigantic_pages sysctl This reintroduces a concept removed by: commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl") This sysctl provides flexibility between ZONE_MOVABLE use cases: 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable When ZONE_MOVABLE is used to make huge page allocation more reliable, disallowing gigantic pages memory in this region is pointless. If hotplug is not a requirement, we can loosen the restrictions to allow 1GB gigantic pages in ZONE_MOVABLE. Since 1GB can be difficult to migrate / has impacts on compaction / defragmentation, we don't enable this by default. Notably, 1GB pages can only be migrated if another 1GB page is available - so hot-unplug will fail if such a page cannot be found. However, since there are scenarios where gigantic pages are migratable, we should allow use of these on movable regions. When not valid 1GB is available for migration, hot-unplug will retry indefinitely (or until interrupted). For example: echo 0 > node0/hugepages/..-1GB/nr_hugepages # clear node0 1GB pages echo 1 > node1/hugepages/..-1GB/nr_hugepages # reserve node1 1GB page ./alloc_huge_node1 & # Allocate a 1GB page on node1 ./node1_offline & # attempt to offline all node1 memory echo 1 > node0/hugepages/..-1GB/nr_hugepages # reserve node0 1GB page In this example, node1_offline will block indefinitely until the final step, when a node0 1GB page is made available. Note: Boot-time CMA is not possible for driver-managed hotplug memory, as CMA requires the memory to be registered as SystemRAM at boot time. Additionally, 1GB huge pages are not supported by THP. Link: https://lkml.kernel.org/r/20251221125603.2364174-1-gourry@gourry.net Signed-off-by: Gregory Price Suggested-by: David Rientjes Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/ Acked-by: David Hildenbrand (Red Hat) Acked-by: David Rientjes Cc: Mel Gorman Cc: Michal Hocko Cc: "David Hildenbrand (Red Hat)" Cc: Gregory Price Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Mike Rapoport Cc: Muchun Song Cc: Oscar Salvador Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/memory-hotplug.rst | 14 +++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 28 +++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 2 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 33c886f3d198..6581558fd0d7 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -612,8 +612,9 @@ ZONE_MOVABLE, especially when fine-tuning zone ratios: allocations and silently create a zone imbalance, usually triggered by inflation requests from the hypervisor. -- Gigantic pages are unmovable, resulting in user space consuming a - lot of unmovable memory. +- Gigantic pages are unmovable when an architecture does not support + huge page migration and/or the ``movable_gigantic_pages`` sysctl is false. + See Documentation/admin-guide/sysctl/vm.rst for more info on this sysctl. - Huge pages are unmovable when an architectures does not support huge page migration, resulting in a similar issue as with gigantic pages. @@ -672,6 +673,15 @@ block might fail: - Concurrent activity that operates on the same physical memory area, such as allocating gigantic pages, can result in temporary offlining failures. +- When an admin sets the ``movable_gigantic_pages`` sysctl to true, gigantic + pages are allowed in ZONE_MOVABLE. This only allows migratable gigantic + pages to be allocated; however, if there are no eligible destination gigantic + pages at offline, the offlining operation will fail. + + Users leveraging ``movable_gigantic_pages`` should weigh the value of + ZONE_MOVABLE for increasing the reliability of gigantic page allocation + against the potential loss of hot-unplug reliability. + - Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap Optimization (HVO) is enabled. diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index ca6ebeb5171c..b98ccb5cb210 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -53,6 +53,7 @@ Currently, these files are in /proc/sys/vm: - mmap_min_addr - mmap_rnd_bits - mmap_rnd_compat_bits +- movable_gigantic_pages - nr_hugepages - nr_hugepages_mempolicy - nr_overcommit_hugepages @@ -620,6 +621,33 @@ This value can be changed after boot using the /proc/sys/vm/mmap_rnd_compat_bits tunable +movable_gigantic_pages +====================== + +This parameter controls whether gigantic pages may be allocated from +ZONE_MOVABLE. If set to non-zero, gigantic pages can be allocated +from ZONE_MOVABLE. ZONE_MOVABLE memory may be created via the kernel +boot parameter `kernelcore` or via memory hotplug as discussed in +Documentation/admin-guide/mm/memory-hotplug.rst. + +Support may depend on specific architecture. + +Note that using ZONE_MOVABLE gigantic pages make memory hotremove unreliable. + +Memory hot-remove operations will block indefinitely until the admin reserves +sufficient gigantic pages to service migration requests associated with the +memory offlining process. As HugeTLB gigantic page reservation is a manual +process (via `nodeN/hugepages/.../nr_hugepages` interfaces) this may not be +obvious when just attempting to offline a block of memory. + +Additionally, as multiple gigantic pages may be reserved on a single block, +it may appear that gigantic pages are available for migration when in reality +they are in the process of being removed. For example if `memoryN` contains +two gigantic pages, one reserved and one allocated, and an admin attempts to +offline that block, this operations may hang indefinitely unless another +reserved gigantic page is available on another block `memoryM`. + + nr_hugepages ============ -- cgit v1.2.3 From 0bec75167d9c491a5a01c6ca85303a58c5b95165 Mon Sep 17 00:00:00 2001 From: Greg Thelen Date: Thu, 1 Jan 2026 14:55:51 -0800 Subject: memcg-v1: remove folio_memcg_lock() doc reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit a29c0e4b2e86 ("memcg-v1: remove memcg move locking code") removed folio_memcg_lock(). Delete the final lingering documentation reference. Link: https://lkml.kernel.org/r/20260101225552.3423108-1-gthelen@google.com Fixes: a29c0e4b2e86 ("memcg-v1: remove memcg move locking code") Signed-off-by: Greg Thelen Acked-by: Shakeel Butt Acked-by: SeongJae Park Cc: Tejun Heo Cc: Johannes Weiner Cc: "Michal Koutný" Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v1/memory.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index d6b1db8cc7eb..7db63c002922 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -311,9 +311,8 @@ Lock order is as follows:: folio_lock mm->page_table_lock or split pte_lock - folio_memcg_lock (memcg->move_lock) - mapping->i_pages lock - lruvec->lru_lock. + mapping->i_pages lock + lruvec->lru_lock. Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by lruvec->lru_lock; the folio LRU flag is cleared before -- cgit v1.2.3 From b36aefb866a12e2fbdc76f3cf0be4025b85dcb2c Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 13 Jan 2026 07:27:12 -0800 Subject: Docs/admin-guide/mm/damon/lru_sort: document filter_young_pages Document the new DAMON_LRU_SORT parameter, filter_young_pages. It can be used to use page level access re-check for the LRU sorting. Link: https://lkml.kernel.org/r/20260113152717.70459-8-sj@kernel.org Signed-off-by: SeongJae Park Acked-by: wang lian Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/lru_sort.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst index 72a943202676..bb222a32aefd 100644 --- a/Documentation/admin-guide/mm/damon/lru_sort.rst +++ b/Documentation/admin-guide/mm/damon/lru_sort.rst @@ -79,6 +79,20 @@ of parametrs except ``enabled`` again. Once the re-reading is done, this parameter is set as ``N``. If invalid parameters are found while the re-reading, DAMON_LRU_SORT will be disabled. +filter_young_pages +------------------ + +Filter [non-]young pages accordingly for LRU [de]prioritizations. + +If this is set, check page level access (youngness) once again before each +LRU [de]prioritization operation. LRU prioritization operation is skipped +if the page has not accessed since the last check (not young). LRU +deprioritization operation is skipped if the page has accessed since the +last check (young). The feature is enabled or disabled if this parameter is +set as ``Y`` or ``N``, respectively. + +Disabled by default. + hot_thres_access_freq --------------------- -- cgit v1.2.3 From cdfca22d15ca5f0f6b3ff33a23e1672dccc74eda Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 13 Jan 2026 07:27:14 -0800 Subject: Docs/admin-guide/mm/damon/lru_sort: document active_mem_bp parameter Document a newly added DAMON_LRU_SORT parameter for doing auto-tuning aiming an active to inactive memory size ratio. Link: https://lkml.kernel.org/r/20260113152717.70459-10-sj@kernel.org Signed-off-by: SeongJae Park Acked-by: wang lian Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/lru_sort.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst index bb222a32aefd..6af3ab5579a3 100644 --- a/Documentation/admin-guide/mm/damon/lru_sort.rst +++ b/Documentation/admin-guide/mm/damon/lru_sort.rst @@ -79,6 +79,18 @@ of parametrs except ``enabled`` again. Once the re-reading is done, this parameter is set as ``N``. If invalid parameters are found while the re-reading, DAMON_LRU_SORT will be disabled. +active_mem_bp +------------- + +Desired active to [in]active memory ratio in bp (1/10,000). + +While keeping the caps that set by other quotas, DAMON_LRU_SORT automatically +increases and decreases the effective level of the quota aiming the LRU +[de]prioritizations of the hot and cold memory resulting in this active to +[in]active memory ratio. Value zero means disabling this auto-tuning feature. + +Disabled by default. + filter_young_pages ------------------ -- cgit v1.2.3 From ed581147a417940857eeea609229de0f5de5617f Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 13 Jan 2026 07:27:16 -0800 Subject: Docs/admin-guide/mm/damon/lru_sort: document intervals autotuning Document a newly added DAMON_LRU_SORT module parameter for using monitoring intervals auto-tuning feature of DAMON. Link: https://lkml.kernel.org/r/20260113152717.70459-12-sj@kernel.org Signed-off-by: SeongJae Park Acked-by: wang lian Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/lru_sort.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst index 6af3ab5579a3..20a8378d5a94 100644 --- a/Documentation/admin-guide/mm/damon/lru_sort.rst +++ b/Documentation/admin-guide/mm/damon/lru_sort.rst @@ -91,6 +91,17 @@ increases and decreases the effective level of the quota aiming the LRU Disabled by default. +Auto-tune monitoring intervals +------------------------------ + +If this parameter is set as ``Y``, DAMON_LRU_SORT automatically tunes DAMON's +sampling and aggregation intervals. The auto-tuning aims to capture meaningful +amount of access events in each DAMON-snapshot, while keeping the sampling +interval 5 milliseconds in minimum, and 10 seconds in maximum. Setting this as +``N`` disables the auto-tuning. + +Disabled by default. + filter_young_pages ------------------ -- cgit v1.2.3 From cd8e95d80bc29b3c72288bd31e845b11755ef6a5 Mon Sep 17 00:00:00 2001 From: "David Hildenbrand (Red Hat)" Date: Tue, 20 Jan 2026 00:01:30 +0100 Subject: mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While compaction depends on migration, the other direction is not the case. So let's make it clearer that this is all about migration of balloon pages. Adjust all comments/docs in the core to talk about "migration" instead of "compaction". While at it add some "/* CONFIG_BALLOON_MIGRATION */". Link: https://lkml.kernel.org/r/20260119230133.3551867-23-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) Reviewed-by: Lorenzo Stoakes Acked-by: Michael S. Tsirkin Cc: Arnd Bergmann Cc: Christophe Leroy Cc: Eugenio Pérez Cc: Greg Kroah-Hartman Cc: Jason Wang Cc: Jerrin Shaji George Cc: Jonathan Corbet Cc: Liam Howlett Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Michal Hocko Cc: Mike Rapoport Cc: Nicholas Piggin Cc: Oscar Salvador Cc: SeongJae Park Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Xuan Zhuo Cc: Zi Yan Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/memory-hotplug.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 6581558fd0d7..0207f8725142 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -603,11 +603,11 @@ ZONE_MOVABLE, especially when fine-tuning zone ratios: memory for metadata and page tables in the direct map; having a lot of offline memory blocks is not a typical case, though. -- Memory ballooning without balloon compaction is incompatible with - ZONE_MOVABLE. Only some implementations, such as virtio-balloon and - pseries CMM, fully support balloon compaction. +- Memory ballooning without support for balloon memory migration is incompatible + with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and + pseries CMM, fully support balloon memory migration. - Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be + Further, the CONFIG_BALLOON_MIGRATION kernel configuration option might be disabled. In that case, balloon inflation will only perform unmovable allocations and silently create a zone imbalance, usually triggered by inflation requests from the hypervisor. -- cgit v1.2.3 From e7df7a0bfc9090f83e9a2f40905bdfc58097330d Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Sun, 18 Jan 2026 10:02:56 -0800 Subject: Docs/admin-guide/mm/damon/usage: introduce DAMON modules at the beginning DAMON usage document provides a list of available DAMON interfaces with brief introduction at the beginning of the doc. The list is missing DAMON modules for special purposes, while it is one of the major suggested interfaces. Add an item for those to the list. Link: https://lkml.kernel.org/r/20260118180305.70023-6-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 7da4c002cb39..bfaee977c37c 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -6,6 +6,11 @@ Detailed Usages DAMON provides below interfaces for different users. +- *Special-purpose DAMON modules.* + :ref:`This ` is for people who are building, + distributing, and/or administrating the kernel with special-purpose DAMON + usages. Using this, users can use DAMON's major features for the given + purposes in build, boot, or runtime in simple ways. - *DAMON user space tool.* `This `_ is for privileged people such as system administrators who want a just-working human-friendly interface. -- cgit v1.2.3 From 652fd06d20da688d6c37cb33efc38a249fce11a3 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Sun, 18 Jan 2026 10:02:57 -0800 Subject: Docs/admin-guide/mm/damon/usage: update stats update process for refresh_ms DAMOS stats on sysfs was only manually updated. Recent addition of 'refresh_ms' knob enabled periodic and automated updates of the stats. The document for stats update process is not updated for the change, however. Update. Link: https://lkml.kernel.org/r/20260118180305.70023-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index bfaee977c37c..b0f3969b6b3b 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -549,10 +549,13 @@ online analysis or tuning of the schemes. Refer to :ref:`design doc The statistics can be retrieved by reading the files under ``stats`` directory (``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, ``sz_ops_filter_passed``, ``qt_exceeds``, ``nr_snapshots`` and -``max_nr_snapshots``), respectively. The files are not updated in real time, -so you should ask DAMON sysfs interface to update the content of the files for -the stats by writing a special keyword, ``update_schemes_stats`` to the -relevant ``kdamonds//state`` file. +``max_nr_snapshots``), respectively. + +The files are not updated in real time by default. Users should ask DAMON +sysfs interface to periodically update those using ``refresh_ms``, or do a one +time update by writing a special keyword, ``update_schemes_stats`` to the +relevant ``kdamonds//state`` file. Refer to :ref:`kdamond directory +` for more details. .. _sysfs_schemes_tried_regions: -- cgit v1.2.3