<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm, branch v2.6.16.29</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>memory hotplug: solve config broken: undefined reference to `online_page'</title>
<updated>2006-08-08T15:35:33+00:00</updated>
<author>
<name>Yasunori Goto</name>
<email>y-goto@jp.fujitsu.com</email>
</author>
<published>2006-08-08T15:35:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5443cf44e9f256b5bd85b7ea621ccb179c4217d3'/>
<id>5443cf44e9f256b5bd85b7ea621ccb179c4217d3</id>
<content type='text'>
Memory hotplug code of i386 adds memory to only highmem.  So, if
CONFIG_HIGHMEM is not set, CONFIG_MEMORY_HOTPLUG shouldn't be set.
Otherwise, it causes compile error.

In addition, many architecture can't use memory hotplug feature yet.  So, I
introduce CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Memory hotplug code of i386 adds memory to only highmem.  So, if
CONFIG_HIGHMEM is not set, CONFIG_MEMORY_HOTPLUG shouldn't be set.
Otherwise, it causes compile error.

In addition, many architecture can't use memory hotplug feature yet.  So, I
introduce CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG.

Signed-off-by: Yasunori Goto &lt;y-goto@jp.fujitsu.com&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pdflush: handle resume wakeups</title>
<updated>2006-08-08T09:58:01+00:00</updated>
<author>
<name>Pavel Machek</name>
<email>pavel@suse.cz</email>
</author>
<published>2006-08-08T09:58:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d92165febdccd522e2a8b55008ffdb0ed121141a'/>
<id>d92165febdccd522e2a8b55008ffdb0ed121141a</id>
<content type='text'>
2.6.16 needs this. It was merged into 2.6.18-rc1.

pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that
it hasn't been given any work to do then this is considered an error.

That all broke when swsusp came along - because a timer-delivered
wakeup to a frozen pdflush thread will just get lost.  This causes the
pdflush thread to get lost as well: the writeback timer is supposed to
be re-armed by pdflush in process context, but pdflush doesn't execute
the callout which does this.

Fix that up by ignoring the return value from try_to_freeze(): jsut
proceed, see if we have any work pending and only go back to sleep if
that is not the case.

Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Pavel Machek &lt;pavel@suse.cz&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
2.6.16 needs this. It was merged into 2.6.18-rc1.

pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that
it hasn't been given any work to do then this is considered an error.

That all broke when swsusp came along - because a timer-delivered
wakeup to a frozen pdflush thread will just get lost.  This causes the
pdflush thread to get lost as well: the writeback timer is supposed to
be re-armed by pdflush in process context, but pdflush doesn't execute
the callout which does this.

Fix that up by ignoring the return value from try_to_freeze(): jsut
proceed, see if we have any work pending and only go back to sleep if
that is not the case.

Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Pavel Machek &lt;pavel@suse.cz&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] tmpfs: time granularity fix for [acm]time going backwards</title>
<updated>2006-06-22T19:16:12+00:00</updated>
<author>
<name>Robin H. Johnson</name>
<email>robbat2@gentoo.org</email>
</author>
<published>2006-06-13T17:06:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2fb0b930b535b5e7ae8a5c8880d8ba941e508421'/>
<id>2fb0b930b535b5e7ae8a5c8880d8ba941e508421</id>
<content type='text'>
I noticed a strange behavior in a tmpfs file system the other day, while
building packages - occasionally, and seemingly at random, make decided to
rebuild a target. However, only on tmpfs.

A file would be created, and if checked, it had a sub-second timestamp.
However, after an utimes related call where sub-seconds should be set, they
were zeroed instead. In the case that a file was created, and utimes(...,NULL)
was used on it in the same second, the timestamp on the file moved backwards.

After some digging, I found that this was being caused by tmpfs not having a
time granularity set, thus inheriting the default 1 second granularity.

Hugh adds: yes, we missed tmpfs when the s_time_gran mods went into 2.6.11.
Unfortunately, the granularity of CURRENT_TIME, often used in filesystems,
does not match the default granularity set by alloc_super.  A few more such
discrepancies have been found, but this is the most important to fix now.

Signed-off-by: Robin H. Johnson &lt;robbat2@gentoo.org&gt;
Acked-by: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I noticed a strange behavior in a tmpfs file system the other day, while
building packages - occasionally, and seemingly at random, make decided to
rebuild a target. However, only on tmpfs.

A file would be created, and if checked, it had a sub-second timestamp.
However, after an utimes related call where sub-seconds should be set, they
were zeroed instead. In the case that a file was created, and utimes(...,NULL)
was used on it in the same second, the timestamp on the file moved backwards.

After some digging, I found that this was being caused by tmpfs not having a
time granularity set, thus inheriting the default 1 second granularity.

Hugh adds: yes, we missed tmpfs when the s_time_gran mods went into 2.6.11.
Unfortunately, the granularity of CURRENT_TIME, often used in filesystems,
does not match the default granularity set by alloc_super.  A few more such
discrepancies have been found, but this is the most important to fix now.

Signed-off-by: Robin H. Johnson &lt;robbat2@gentoo.org&gt;
Acked-by: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Cpuset: might sleep checking zones allowed fix</title>
<updated>2006-06-05T17:18:13+00:00</updated>
<author>
<name>Paul Jackson</name>
<email>pj@sgi.com</email>
</author>
<published>2006-05-23T00:56:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e81ccf5afaf9cff1ed27c29ff96795c146a3d571'/>
<id>e81ccf5afaf9cff1ed27c29ff96795c146a3d571</id>
<content type='text'>
Fix an infrequently encountered 'sleeping function called
from invalid context' in the cpuset hooks in __alloc_pages.
Could sleep while interrupts disabled.

The routine cpuset_zone_allowed() is called by code in
mm/page_alloc.c __alloc_pages() to determine if a zone is
allowed in the current tasks cpuset.  This routine can sleep,
for certain GFP_KERNEL allocations, if the zone is on a memory
node not allowed in the current cpuset, but might be allowed
in a parent cpuset.

But we can't sleep in __alloc_pages() if in interrupt, nor
if called for a GFP_ATOMIC request (__GFP_WAIT not set in
gfp_flags).

The rule was intended to be:
  Don't call cpuset_zone_allowed() if you can't sleep, unless you
  pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
  the code that might scan up ancestor cpusets and sleep.

This rule was being violated due to a bogus change made (by myself,
pj) to __alloc_pages() as part of the November 2005 effort to
cleanup its logic.

The bogus change can be seen at:
  http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
  [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags

This was first noticed on a tight memory system, in code that
was disabling interrupts and doing allocation requests with
__GFP_WAIT not set, which resulted in __might_sleep() writing
complaints to the log "Debug: sleeping function called ...",
when the code in cpuset_zone_allowed() tried to take the
callback_sem cpuset semaphore.

Special thanks to Dave Chinner, for figuring this out,
and a tip of the hat to Nick Piggin who warned me of this
back in Nov 2005, before I was ready to listen.

Signed-off-by: Paul Jackson &lt;pj@sgi.com&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix an infrequently encountered 'sleeping function called
from invalid context' in the cpuset hooks in __alloc_pages.
Could sleep while interrupts disabled.

The routine cpuset_zone_allowed() is called by code in
mm/page_alloc.c __alloc_pages() to determine if a zone is
allowed in the current tasks cpuset.  This routine can sleep,
for certain GFP_KERNEL allocations, if the zone is on a memory
node not allowed in the current cpuset, but might be allowed
in a parent cpuset.

But we can't sleep in __alloc_pages() if in interrupt, nor
if called for a GFP_ATOMIC request (__GFP_WAIT not set in
gfp_flags).

The rule was intended to be:
  Don't call cpuset_zone_allowed() if you can't sleep, unless you
  pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
  the code that might scan up ancestor cpusets and sleep.

This rule was being violated due to a bogus change made (by myself,
pj) to __alloc_pages() as part of the November 2005 effort to
cleanup its logic.

The bogus change can be seen at:
  http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
  [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags

This was first noticed on a tight memory system, in code that
was disabling interrupts and doing allocation requests with
__GFP_WAIT not set, which resulted in __might_sleep() writing
complaints to the log "Debug: sleeping function called ...",
when the code in cpuset_zone_allowed() tried to take the
callback_sem cpuset semaphore.

Special thanks to Dave Chinner, for figuring this out,
and a tip of the hat to Nick Piggin who warned me of this
back in Nov 2005, before I was ready to listen.

Signed-off-by: Paul Jackson &lt;pj@sgi.com&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] page migration: Fix fallback behavior for dirty pages</title>
<updated>2006-05-20T22:00:33+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-05-01T19:16:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1d4532d4d7351b552220d9ef0a1901a33f00ee5f'/>
<id>1d4532d4d7351b552220d9ef0a1901a33f00ee5f</id>
<content type='text'>
Currently we check PageDirty() in order to make the decision to swap out
the page.  However, the dirty information may be only be contained in the
ptes pointing to the page.  We need to first unmap the ptes before checking
for PageDirty().  If unmap is successful then the page count of the page
will also be decreased so that pageout() works properly.

This is a fix necessary for 2.6.17.  Without this fix we may migrate dirty
pages for filesystems without migration functions.  Filesystems may keep
pointers to dirty pages.  Migration of dirty pages can result in the
filesystem keeping pointers to freed pages.

Unmapping is currently not be separated out from removing all the
references to a page and moving the mapping.  Therefore try_to_unmap will
be called again in migrate_page() if the writeout is successful.  However,
it wont do anything since the ptes are already removed.

The coming updates to the page migration code will restructure the code
so that this is no longer necessary.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we check PageDirty() in order to make the decision to swap out
the page.  However, the dirty information may be only be contained in the
ptes pointing to the page.  We need to first unmap the ptes before checking
for PageDirty().  If unmap is successful then the page count of the page
will also be decreased so that pageout() works properly.

This is a fix necessary for 2.6.17.  Without this fix we may migrate dirty
pages for filesystems without migration functions.  Filesystems may keep
pointers to dirty pages.  Migration of dirty pages can result in the
filesystem keeping pointers to freed pages.

Unmapping is currently not be separated out from removing all the
references to a page and moving the mapping.  Therefore try_to_unmap will
be called again in migrate_page() if the writeout is successful.  However,
it wont do anything since the ptes are already removed.

The coming updates to the page migration code will restructure the code
so that this is no longer necessary.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] add migratepage address space op to shmem</title>
<updated>2006-05-20T22:00:33+00:00</updated>
<author>
<name>Lee Schermerhorn</name>
<email>Lee.Schermerhorn@hp.com</email>
</author>
<published>2006-04-22T09:35:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=75178298c6c807a7274ab48ef9c4d6be7d0114d8'/>
<id>75178298c6c807a7274ab48ef9c4d6be7d0114d8</id>
<content type='text'>
Basic problem: pages of a shared memory segment can only be migrated once.

In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
migratepage address space op.  Therefore, migrate_pages() falls back to
default processing.  In this path, it will try to pageout() dirty pages.
Once a shared memory page has been migrated it becomes dirty, so
migrate_pages() will try to page it out.  However, because the page count
is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
is_page_cache_freeable() returns false.  This will abort all subsequent
migrations.

This patch adds a migratepage address space op to shared memory segments to
avoid taking the default path.  We use the "migrate_page()" function
because it knows how to migrate dirty pages.  This allows shared memory
segment pages to migrate, subject to other conditions such as # pte's
referencing the page [page_mapcount(page)], when requested.

I think this is safe.  If we're migrating a shared memory page, then we
found the page via a page table, so it must be in memory.

Can be verified with memtoy and the shmem-mbind-test script, both
available at:  http://free.linux.hp.com/~lts/Tools/

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Basic problem: pages of a shared memory segment can only be migrated once.

In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
migratepage address space op.  Therefore, migrate_pages() falls back to
default processing.  In this path, it will try to pageout() dirty pages.
Once a shared memory page has been migrated it becomes dirty, so
migrate_pages() will try to page it out.  However, because the page count
is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
is_page_cache_freeable() returns false.  This will abort all subsequent
migrations.

This patch adds a migratepage address space op to shared memory segments to
avoid taking the default path.  We use the "migrate_page()" function
because it knows how to migrate dirty pages.  This allows shared memory
segment pages to migrate, subject to other conditions such as # pte's
referencing the page [page_mapcount(page)], when requested.

I think this is safe.  If we're migrating a shared memory page, then we
found the page via a page table, so it must be in memory.

Can be verified with memtoy and the shmem-mbind-test script, both
available at:  http://free.linux.hp.com/~lts/Tools/

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Acked-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Remove cond_resched in gather_stats()</title>
<updated>2006-05-20T22:00:32+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-04-20T09:43:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=602e0343a69b4300de4b889d644f95983dfdc9cd'/>
<id>602e0343a69b4300de4b889d644f95983dfdc9cd</id>
<content type='text'>
gather_stats() is called with a spinlock held from check_pte_range.  We
cannot reschedule with a lock held.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
gather_stats() is called with a spinlock held from check_pte_range.  We
cannot reschedule with a lock held.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fix MADV_REMOVE vulnerability (CVE-2006-1524 for real this time)</title>
<updated>2006-04-17T21:52:57+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2006-04-17T21:46:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=00ec474c9bed7883f1b3e5f46e3bf09f7de69975'/>
<id>00ec474c9bed7883f1b3e5f46e3bf09f7de69975</id>
<content type='text'>
madvise_remove needs to respect file and mmap protections.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
madvise_remove needs to respect file and mmap protections.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Fix buddy list race that could lead to page lru list corruptions</title>
<updated>2006-04-17T20:16:05+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>piggin@cyberone.com.au</email>
</author>
<published>2006-04-10T23:54:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2b53303840e1a8f1c7d007d988e8f497248ca270'/>
<id>2b53303840e1a8f1c7d007d988e8f497248ca270</id>
<content type='text'>
Rohit found an obscure bug causing buddy list corruption.

page_is_buddy is using a non-atomic test (PagePrivate &amp;&amp; page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.

Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).

Signed-off-by: Martin Bligh &lt;mbligh@google.com&gt;
Signed-off-by: Rohit Seth &lt;rohitseth@google.com&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rohit found an obscure bug causing buddy list corruption.

page_is_buddy is using a non-atomic test (PagePrivate &amp;&amp; page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.

Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).

Signed-off-by: Martin Bligh &lt;mbligh@google.com&gt;
Signed-off-by: Rohit Seth &lt;rohitseth@google.com&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fix free swap cache latency</title>
<updated>2006-03-17T15:51:26+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2006-03-17T07:04:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f5e6b9e69bf043074a0edabe3d271899c34eb79'/>
<id>6f5e6b9e69bf043074a0edabe3d271899c34eb79</id>
<content type='text'>
Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all.  We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Acked-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all.  We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Acked-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
