From cd67e10ac6997c6d1e1504e3c111b693bfdbc148 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 30 Jan 2014 15:45:52 -0800 Subject: zram: promote zram from staging Zram has lived in staging for a LONG LONG time and have been fixed/improved by many contributors so code is clean and stable now. Of course, there are lots of product using zram in real practice. The major TV companys have used zram as swap since two years ago and recently our production team released android smart phone with zram which is used as swap, too and recently Android Kitkat start to use zram for small memory smart phone. And there was a report Google released their ChromeOS with zram, too and cyanogenmod have been used zram long time ago. And I heard some disto have used zram block device for tmpfs. In addition, I saw many report from many other peoples. For example, Lubuntu start to use it. The benefit of zram is very clear. With my experience, one of the benefit was to remove jitter of video application with backgroud memory pressure. It would be effect of efficient memory usage by compression but more issue is whether swap is there or not in the system. Recent mobile platforms have used JAVA so there are many anonymous pages. But embedded system normally are reluctant to use eMMC or SDCard as swap because there is wear-leveling and latency issues so if we do not use swap, it means we can't reclaim anoymous pages and at last, we could encounter OOM kill. :( Although we have real storage as swap, it was a problem, too. Because it sometime ends up making system very unresponsible caused by slow swap storage performance. Quote from Luigi on Google "Since Chrome OS was mentioned: the main reason why we don't use swap to a disk (rotating or SSD) is because it doesn't degrade gracefully and leads to a bad interactive experience. Generally we prefer to manage RAM at a higher level, by transparently killing and restarting processes. But we noticed that zram is fast enough to be competitive with the latter, and it lets us make more efficient use of the available RAM. " and he announced. http://www.spinics.net/lists/linux-mm/msg57717.html Other uses case is to use zram for block device. Zram is block device so anyone can format the block device and mount on it so some guys on the internet start zram as /var/tmp. http://forums.gentoo.org/viewtopic-t-838198-start-0.html Let's promote zram and enhance/maintain it instead of removing. Signed-off-by: Minchan Kim Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Nitin Gupta Acked-by: Pekka Enberg Cc: Bob Liu Cc: Greg Kroah-Hartman Cc: Hugh Dickins Cc: Jens Axboe Cc: Luigi Semenzato Cc: Mel Gorman Cc: Rik van Riel Cc: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/blockdev/zram.txt | 77 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 Documentation/blockdev/zram.txt (limited to 'Documentation') diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt new file mode 100644 index 000000000000..765d790ae831 --- /dev/null +++ b/Documentation/blockdev/zram.txt @@ -0,0 +1,77 @@ +zram: Compressed RAM based block devices +---------------------------------------- + +Project home: http://compcache.googlecode.com/ + +* Introduction + +The zram module creates RAM based block devices named /dev/zram +( = 0, 1, ...). Pages written to these disks are compressed and stored +in memory itself. These disks allow very fast I/O and compression provides +good amounts of memory savings. Some of the usecases include /tmp storage, +use as swap disks, various caches under /var and maybe many more :) + +Statistics for individual zram devices are exported through sysfs nodes at +/sys/block/zram/ + +* Usage + +Following shows a typical sequence of steps for using zram. + +1) Load Module: + modprobe zram num_devices=4 + This creates 4 devices: /dev/zram{0,1,2,3} + (num_devices parameter is optional. Default: 1) + +2) Set Disksize + Set disk size by writing the value to sysfs node 'disksize'. + The value can be either in bytes or you can use mem suffixes. + Examples: + # Initialize /dev/zram0 with 50MB disksize + echo $((50*1024*1024)) > /sys/block/zram0/disksize + + # Using mem suffixes + echo 256K > /sys/block/zram0/disksize + echo 512M > /sys/block/zram0/disksize + echo 1G > /sys/block/zram0/disksize + +3) Activate: + mkswap /dev/zram0 + swapon /dev/zram0 + + mkfs.ext4 /dev/zram1 + mount /dev/zram1 /tmp + +4) Stats: + Per-device statistics are exported as various nodes under + /sys/block/zram/ + disksize + num_reads + num_writes + invalid_io + notify_free + discard + zero_pages + orig_data_size + compr_data_size + mem_used_total + +5) Deactivate: + swapoff /dev/zram0 + umount /dev/zram1 + +6) Reset: + Write any positive value to 'reset' sysfs node + echo 1 > /sys/block/zram0/reset + echo 1 > /sys/block/zram1/reset + + This frees all the memory allocated for the given device and + resets the disksize to zero. You must set the disksize again + before reusing the device. + +Please report any problems at: + - Mailing list: linux-mm-cc at laptop dot org + - Issue tracker: http://code.google.com/p/compcache/issues/list + +Nitin Gupta +ngupta@vflare.org -- cgit v1.2.3 From 49061236a9c2e18b31617cef10d27ba136068bac Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 30 Jan 2014 15:45:54 -0800 Subject: zram: remove old private project comment Remove the old private compcache project address so upcoming patches should be sent to LKML because we Linux kernel community will take care. Signed-off-by: Minchan Kim Cc: Nitin Gupta Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/blockdev/zram.txt | 6 ------ 1 file changed, 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 765d790ae831..2eccddffa6c8 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -1,8 +1,6 @@ zram: Compressed RAM based block devices ---------------------------------------- -Project home: http://compcache.googlecode.com/ - * Introduction The zram module creates RAM based block devices named /dev/zram @@ -69,9 +67,5 @@ Following shows a typical sequence of steps for using zram. resets the disksize to zero. You must set the disksize again before reusing the device. -Please report any problems at: - - Mailing list: linux-mm-cc at laptop dot org - - Issue tracker: http://code.google.com/p/compcache/issues/list - Nitin Gupta ngupta@vflare.org -- cgit v1.2.3 From 778c14affaf94a9e4953179d3e13a544ccce7707 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Thu, 30 Jan 2014 15:46:11 -0800 Subject: mm, oom: base root bonus on current usage A 3% of system memory bonus is sometimes too excessive in comparison to other processes. With commit a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries to avoid killing privileged tasks by subtracting 3% of overall memory (system or cgroup) from their per-task consumption. But as a result, all root tasks that consume less than 3% of overall memory are considered equal, and so it only takes 33+ privileged tasks pushing the system out of memory for the OOM killer to do something stupid and kill dhclient or other root-owned processes. For example, on a 32G machine it can't tell the difference between the 1M agetty and the 10G fork bomb member. The changelog describes this 3% boost as the equivalent to the global overcommit limit being 3% higher for privileged tasks, but this is not the same as discounting 3% of overall memory from _every privileged task individually_ during OOM selection. Replace the 3% of system memory bonus with a 3% of current memory usage bonus. By giving root tasks a bonus that is proportional to their actual size, they remain comparable even when relatively small. In the example above, the OOM killer will discount the 1M agetty's 256 badness points down to 179, and the 10G fork bomb's 262144 points down to 183500 points and make the right choice, instead of discounting both to 0 and killing agetty because it's first in the task list. Signed-off-by: David Rientjes Reported-by: Johannes Weiner Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 31f76178c987..f00bee144add 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1386,8 +1386,8 @@ may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. -There is an additional factor included in the badness score: root -processes are given 3% extra memory over other tasks. +There is an additional factor included in the badness score: the current memory +and swap usage is discounted by 3% for root processes. The amount of "allowed" memory depends on the context in which the oom killer was called. If it is due to the memory assigned to the allocating task's cpuset -- cgit v1.2.3 From 46bf16c44b90791445975463da671521fc430cae Mon Sep 17 00:00:00 2001 From: Richard Yao Date: Thu, 30 Jan 2014 15:46:12 -0800 Subject: Documentation/filesystems/vfs.txt: update file_operations documentation ->readv, ->writev and ->sendfile have been removed while ->show_fdinfo has been added. The documentation should reflect this. Signed-off-by: Richard Yao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/vfs.txt | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index deb48b5fd883..c53784c119c8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -782,7 +782,7 @@ struct file_operations ---------------------- This describes how the VFS can manipulate an open file. As of kernel -3.5, the following members are defined: +3.12, the following members are defined: struct file_operations { struct module *owner; @@ -803,9 +803,6 @@ struct file_operations { int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); - ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); - ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); int (*check_flags)(int); @@ -814,6 +811,7 @@ struct file_operations { ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned int); int (*setlease)(struct file *, long arg, struct file_lock **); long (*fallocate)(struct file *, int mode, loff_t offset, loff_t len); + int (*show_fdinfo)(struct seq_file *m, struct file *f); }; Again, all methods are called without any locks being held, unless @@ -864,12 +862,6 @@ otherwise noted. lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW commands - readv: called by the readv(2) system call - - writev: called by the writev(2) system call - - sendfile: called by the sendfile(2) system call - get_unmapped_area: called by the mmap(2) system call check_flags: called by the fcntl(2) system call for F_SETFL command -- cgit v1.2.3