<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/oom_kill.c, branch v3.0.80</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>oom: fix integer overflow of points in oom_badness</title>
<updated>2012-01-06T22:13:51+00:00</updated>
<author>
<name>Frantisek Hrbata</name>
<email>fhrbata@redhat.com</email>
</author>
<published>2011-12-20T01:11:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=724c6ae2912dbfc5c6a9e309f92b786fbc141462'/>
<id>724c6ae2912dbfc5c6a9e309f92b786fbc141462</id>
<content type='text'>
commit ff05b6f7ae762b6eb464183eec994b28ea09f6dd upstream.

An integer overflow will happen on 64bit archs if task's sum of rss,
swapents and nr_ptes exceeds (2^31)/1000 value.  This was introduced by
commit

f755a04 oom: use pte pages in OOM score

where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000).  So there could be an int overflow while computing

176          points *= 1000;

and points may have negative value. Meaning the oom score for a mem hog task
will be one.

196          if (points &lt;= 0)
197                  return 1;

For example:
[ 3366]     0  3366 35390480 24303939   5       0             0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child

Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.

In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.

The points variable should be of type long instead of int to prevent the
int overflow.

Signed-off-by: Frantisek Hrbata &lt;fhrbata@redhat.com&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ff05b6f7ae762b6eb464183eec994b28ea09f6dd upstream.

An integer overflow will happen on 64bit archs if task's sum of rss,
swapents and nr_ptes exceeds (2^31)/1000 value.  This was introduced by
commit

f755a04 oom: use pte pages in OOM score

where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000).  So there could be an int overflow while computing

176          points *= 1000;

and points may have negative value. Meaning the oom score for a mem hog task
will be one.

196          if (points &lt;= 0)
197                  return 1;

For example:
[ 3366]     0  3366 35390480 24303939   5       0             0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child

Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.

In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.

The points variable should be of type long instead of int to prevent the
int overflow.

Signed-off-by: Frantisek Hrbata &lt;fhrbata@redhat.com&gt;
Acked-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>oom: task-&gt;mm == NULL doesn't mean the memory was freed</title>
<updated>2011-08-05T04:58:42+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2011-07-30T14:35:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=84416db67616ae44e7111c1c904a53cc79b95b6a'/>
<id>84416db67616ae44e7111c1c904a53cc79b95b6a</id>
<content type='text'>
commit c027a474a68065391c8773f6e83ed5412657e369 upstream.

exit_mm() sets -&gt;mm == NULL then it does mmput()-&gt;exit_mmap() which
frees the memory.

However select_bad_process() checks -&gt;mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.

Change select_bad_process() to check -&gt;mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c027a474a68065391c8773f6e83ed5412657e369 upstream.

exit_mm() sets -&gt;mm == NULL then it does mmput()-&gt;exit_mmap() which
frees the memory.

However select_bad_process() checks -&gt;mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.

Change select_bad_process() to check -&gt;mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>oom: replace PF_OOM_ORIGIN with toggling oom_score_adj</title>
<updated>2011-05-25T15:39:10+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-05-25T00:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=72788c385604523422592249c19cba0187021e9b'/>
<id>72788c385604523422592249c19cba0187021e9b</id>
<content type='text'>
There's a kernel-wide shortage of per-process flags, so it's always
helpful to trim one when possible without incurring a significant penalty.
 It's even more important when you're planning on adding a per- process
flag yourself, which I plan to do shortly for transparent hugepages.

PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
tendency to allocate large amounts of memory and should be preferred for
killing over other tasks.  We'd rather immediately kill the task making
the errant syscall rather than penalizing an innocent task.

This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.

The process's old oom_score_adj is stored and then set to
OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
value is then reinstated when the process should no longer be considered a
high priority for oom killing.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Izik Eidus &lt;ieidus@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's a kernel-wide shortage of per-process flags, so it's always
helpful to trim one when possible without incurring a significant penalty.
 It's even more important when you're planning on adding a per- process
flag yourself, which I plan to do shortly for transparent hugepages.

PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
tendency to allocate large amounts of memory and should be preferred for
killing over other tasks.  We'd rather immediately kill the task making
the errant syscall rather than penalizing an innocent task.

This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.

The process's old oom_score_adj is stored and then set to
OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
value is then reinstated when the process should no longer be considered a
high priority for oom killing.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Izik Eidus &lt;ieidus@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: use pte pages in OOM score</title>
<updated>2011-04-28T18:28:21+00:00</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-04-27T22:26:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f755a042d82b51b54f3bdd0890e5ea56c0fb6807'/>
<id>f755a042d82b51b54f3bdd0890e5ea56c0fb6807</id>
<content type='text'>
PTE pages eat up memory just like anything else, but we do not account for
them in any way in the OOM scores.  They are also _guaranteed_ to get
freed up when a process is OOM killed, while RSS is not.

Reported-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.36+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PTE pages eat up memory just like anything else, but we do not account for
them in any way in the OOM scores.  They are also _guaranteed_ to get
freed up when a process is OOM killed, while RSS is not.

Reported-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.36+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom-kill: remove boost_dying_task_prio()</title>
<updated>2011-04-14T23:06:56+00:00</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-04-14T22:22:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=341aea2bc48bf652777fb015cc2b3dfa9a451817'/>
<id>341aea2bc48bf652777fb015cc2b3dfa9a451817</id>
<content type='text'>
This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").

That commit dramatically improved oom killer logic when a fork-bomb
occurs.  But I've found that it has nasty corner case.  Now cpu cgroup has
strange default RT runtime.  It's 0!  That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.

If an admin inserts a !RT process into a cpu cgroup by setting
rtruntime=0, usually it runs perfectly because a !RT task isn't affected
by the rtruntime knob.  But if it promotes an RT task via an explicit
setscheduler() syscall or an OOM, the task can't run at all.  In short,
the oom killer doesn't work at all if admins are using cpu cgroup and don't
touch the rtruntime knob.

Eventually, kernel may hang up when oom kill occur.  I and the original
author Luis agreed to disable this logic.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Luis Claudio R. Goncalves &lt;lclaudio@uudg.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").

That commit dramatically improved oom killer logic when a fork-bomb
occurs.  But I've found that it has nasty corner case.  Now cpu cgroup has
strange default RT runtime.  It's 0!  That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.

If an admin inserts a !RT process into a cpu cgroup by setting
rtruntime=0, usually it runs perfectly because a !RT task isn't affected
by the rtruntime knob.  But if it promotes an RT task via an explicit
setscheduler() syscall or an OOM, the task can't run at all.  In short,
the oom killer doesn't work at all if admins are using cpu cgroup and don't
touch the rtruntime knob.

Eventually, kernel may hang up when oom kill occur.  I and the original
author Luis agreed to disable this logic.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Luis Claudio R. Goncalves &lt;lclaudio@uudg.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib, arch: add filter argument to show_mem and fix private implementations</title>
<updated>2011-03-25T00:49:37+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-03-24T22:18:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b2b755b5f10eb32fbdc73a9907c07006b17f714b'/>
<id>b2b755b5f10eb32fbdc73a9907c07006b17f714b</id>
<content type='text'>
Commit ddd588b5dd55 ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():

	lib/lib.a(show_mem.o): In function `show_mem':
	show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
	arch/sparc/mm/built-in.o:(.text+0xd70): first defined here

The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.

Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.

Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Reported-by: James Bottomley &lt;James.Bottomley@hansenpartnership.com&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit ddd588b5dd55 ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():

	lib/lib.a(show_mem.o): In function `show_mem':
	show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
	arch/sparc/mm/built-in.o:(.text+0xd70): first defined here

The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.

Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.

Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Reported-by: James Bottomley &lt;James.Bottomley@hansenpartnership.com&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: give current access to memory reserves if it's trying to die</title>
<updated>2011-03-24T02:46:33+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-03-23T23:42:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f9434ad1552427fab49336e1a6e3ef121895b9d1'/>
<id>f9434ad1552427fab49336e1a6e3ef121895b9d1</id>
<content type='text'>
When a memcg is oom and current has already received a SIGKILL, then give
it access to memory reserves with a higher scheduling priority so that it
may quickly exit and free its memory.

This is identical to the global oom killer and is done even before
checking for panic_on_oom: a pending SIGKILL here while panic_on_oom is
selected is guaranteed to have come from userspace; the thread only needs
access to memory reserves to exit and thus we don't unnecessarily panic
the machine until the kernel has no last resort to free memory.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a memcg is oom and current has already received a SIGKILL, then give
it access to memory reserves with a higher scheduling priority so that it
may quickly exit and free its memory.

This is identical to the global oom killer and is done even before
checking for panic_on_oom: a pending SIGKILL here while panic_on_oom is
selected is guaranteed to have come from userspace; the thread only needs
access to memory reserves to exit and thus we don't unnecessarily panic
the machine until the kernel has no last resort to free memory.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: suppress nodes that are not allowed from meminfo on oom kill</title>
<updated>2011-03-23T00:44:01+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-03-22T23:30:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ddd588b5dd55f14320379961e47683db4e4c1d90'/>
<id>ddd588b5dd55f14320379961e47683db4e4c1d90</id>
<content type='text'>
The oom killer is extremely verbose for machines with a large number of
cpus and/or nodes.  This verbosity can often be harmful if it causes other
important messages to be scrolled from the kernel log and incurs a
signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT &gt;
8.

This patch causes only memory information to be displayed for nodes that
are allowed by current's cpuset when dumping the VM state.  Information
for all other nodes is irrelevant to the oom condition; we don't care if
there's an abundance of memory elsewhere if we can't access it.

This only affects the behavior of dumping memory information when an oom
is triggered.  Other dumps, such as for sysrq+m, still display the
unfiltered form when using the existing show_mem() interface.

Additionally, the per-cpu pageset statistics are extremely verbose in oom
killer output, so it is now suppressed.  This removes

	nodes_weight(current-&gt;mems_allowed) * (1 + nr_cpus)

lines from the oom killer output.

Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
nodes.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The oom killer is extremely verbose for machines with a large number of
cpus and/or nodes.  This verbosity can often be harmful if it causes other
important messages to be scrolled from the kernel log and incurs a
signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT &gt;
8.

This patch causes only memory information to be displayed for nodes that
are allowed by current's cpuset when dumping the VM state.  Information
for all other nodes is irrelevant to the oom condition; we don't care if
there's an abundance of memory elsewhere if we can't access it.

This only affects the behavior of dumping memory information when an oom
is triggered.  Other dumps, such as for sysrq+m, still display the
unfiltered form when using the existing show_mem() interface.

Additionally, the per-cpu pageset statistics are extremely verbose in oom
killer output, so it is now suppressed.  This removes

	nodes_weight(current-&gt;mems_allowed) * (1 + nr_cpus)

lines from the oom killer output.

Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
nodes.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: avoid deferring oom killer if exiting task is being traced</title>
<updated>2011-03-23T00:43:58+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-03-22T23:30:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=edd45544c6f09550df0a5491aa8a07af24767e73'/>
<id>edd45544c6f09550df0a5491aa8a07af24767e73</id>
<content type='text'>
The oom killer naturally defers killing anything if it finds an eligible
task that is already exiting and has yet to detach its -&gt;mm.  This avoids
unnecessarily killing tasks when one is already in the exit path and may
free enough memory that the oom killer is no longer needed.  This is
detected by PF_EXITING since threads that have already detached its -&gt;mm
are no longer considered at all.

The problem with always deferring when a thread is PF_EXITING, however, is
that it may never actually exit when being traced, specifically if another
task is tracing it with PTRACE_O_TRACEEXIT.  The oom killer does not want
to defer in this case since there is no guarantee that thread will ever
exit without intervention.

This patch will now only defer the oom killer when a thread is PF_EXITING
and no ptracer has stopped its progress in the exit path.  It also ensures
that a child is sacrificed for the chosen parent only if it has a
different -&gt;mm as the comment implies: this ensures that the thread group
leader is always targeted appropriately.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrey Vagin &lt;avagin@openvz.org&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The oom killer naturally defers killing anything if it finds an eligible
task that is already exiting and has yet to detach its -&gt;mm.  This avoids
unnecessarily killing tasks when one is already in the exit path and may
free enough memory that the oom killer is no longer needed.  This is
detected by PF_EXITING since threads that have already detached its -&gt;mm
are no longer considered at all.

The problem with always deferring when a thread is PF_EXITING, however, is
that it may never actually exit when being traced, specifically if another
task is tracing it with PTRACE_O_TRACEEXIT.  The oom killer does not want
to defer in this case since there is no guarantee that thread will ever
exit without intervention.

This patch will now only defer the oom killer when a thread is PF_EXITING
and no ptracer has stopped its progress in the exit path.  It also ensures
that a child is sacrificed for the chosen parent only if it has a
different -&gt;mm as the comment implies: this ensures that the thread group
leader is always targeted appropriately.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrey Vagin &lt;avagin@openvz.org&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: skip zombies when iterating tasklist</title>
<updated>2011-03-23T00:43:58+00:00</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2011-03-22T23:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=30e2b41f20b6238f51e7cffb879c7a0f0073f5fe'/>
<id>30e2b41f20b6238f51e7cffb879c7a0f0073f5fe</id>
<content type='text'>
We shouldn't defer oom killing if a thread has already detached its -&gt;mm
and still has TIF_MEMDIE set.  Memory needs to be freed, so find kill
other threads that pin the same -&gt;mm or find another task to kill.

Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We shouldn't defer oom killing if a thread has already detached its -&gt;mm
and still has TIF_MEMDIE set.  Memory needs to be freed, so find kill
other threads that pin the same -&gt;mm or find another task to kill.

Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.38.x]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
