linux-toradex.git/Documentation/sysctl, branch v4.4.38

pipe: limit the per-user amount of pages allocated in pipes

2016-06-08T01:14:35+00:00

commit 759c01142a5d0f364a462346168a56de28a80f52 upstream.

On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa 
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds 
Signed-off-by: Willy Tarreau 
Signed-off-by: Al Viro 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman

Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory

2015-11-09T23:11:24+00:00

The origin document references to cap_vm_enough_memory is because
cap_vm_enough_memory invoked __vm_enough_memory before and it no longer
does now.

Signed-off-by: Chun Chen 
Acked-by: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup

2015-11-06T03:34:48+00:00

In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina 
Reviewed-by: Aaron Tomlin 
Cc: Ulrich Obergfell 
Acked-by: Don Zickus 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

net: qdisc: enhance default_qdisc documentation

2015-09-17T23:09:22+00:00

Aside from some lingual cleanup, point out which interfaces are not or
partly covered by this setting.

Signed-off-by: Phil Sutter 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller

mm/page_alloc.c: fix a misleading comment

2015-09-08T22:35:28+00:00

The comment says that the per-cpu batchsize and zone watermarks are
determined by present_pages which is definitely wrong, they are both
calculated from managed_pages.  Fix it.

Signed-off-by: Yaowei Bai 
Acked-by: Michal Hocko 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Documentation: mm: fix location of extfrag_index

2015-07-24T13:05:56+00:00

/proc/extfrag_index does not exist.  This file is in debugfs.  Fix the
description of extfrag_threshold to reflect this.

Signed-off-by: Rabin Vincent 
Signed-off-by: Jonathan Corbet

coredump: use from_kuid/kgid when formatting corename

2015-06-26T00:00:43+00:00

When adding __printf attribute to cn_printf, gcc reports some issues:

  fs/coredump.c:213:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kuid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->uid);
       ^
  fs/coredump.c:217:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kgid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->gid);
       ^

These warnings come from the fact that the value of uid/gid needs to be
extracted from the kuid_t/kgid_t structure before being used as an
integer.  More precisely, cred->uid and cred->gid need to be converted to
either user-namespace uid/gid or to init_user_ns uid/gid.

Use init_user_ns in order not to break existing ABI, and document this in
Documentation/sysctl/kernel.txt.

While at it, format uid and gid values with %u instead of %d because
uid_t/__kernel_uid32_t and gid_t/__kernel_gid32_t are unsigned int.

Signed-off-by: Nicolas Iooss 
Acked-by: "Eric W. Biederman" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

watchdog: add watchdog_cpumask sysctl to assist nohz

2015-06-25T00:49:40+00:00

Change the default behavior of watchdog so it only runs on the
housekeeping cores when nohz_full is enabled at build and boot time.
Allow modifying the set of cores the watchdog is currently running on
with a new kernel.watchdog_cpumask sysctl.

In the current system, the watchdog subsystem runs a periodic timer that
schedules the watchdog kthread to run.  However, nohz_full cores are
designed to allow userspace application code running on those cores to
have 100% access to the CPU.  So the watchdog system prevents the
nohz_full application code from being able to run the way it wants to,
thus the motivation to suppress the watchdog on nohz_full cores, which
this patchset provides by default.

However, if we disable the watchdog globally, then the housekeeping
cores can't benefit from the watchdog functionality.  So we allow
disabling it only on some cores.  See Documentation/lockup-watchdogs.txt
for more information.

[jhubbard@nvidia.com: fix a watchdog crash in some configurations]
Signed-off-by: Chris Metcalf 
Acked-by: Don Zickus 
Cc: Ingo Molnar 
Cc: Ulrich Obergfell 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Signed-off-by: John Hubbard 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Doc/sysctl/kernel.txt: document threads-max

2015-04-17T13:04:07+00:00

File /proc/sys/kernel/threads-max controls the maximum number of threads
that can be created using fork().

[akpm@linux-foundation.org: fix typo, per Guenter]
Signed-off-by: Heinrich Schuchardt 
Cc: Oleg Nesterov 
Cc: Ingo Molnar 
Cc: Guenter Roeck 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: allow compaction of unevictable pages

2015-04-15T23:35:17+00:00

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru.  Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory.  When the value is set to 1, allocations
succeed.

Signed-off-by: Eric B Munson 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Christoph Lameter 
Acked-by: David Rientjes 
Acked-by: Rik van Riel 
Cc: Vlastimil Babka 
Cc: Thomas Gleixner 
Cc: Christoph Lameter 
Cc: Peter Zijlstra 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds