<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/asm-generic, branch v2.6.25-rc2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Make topology fallback macros reference their arguments.</title>
<updated>2008-02-12T04:37:29+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>andi@firstfloor.org</email>
</author>
<published>2008-02-11T19:03:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=271cad6d7e91ff8eea18976311692f99cd667ad3'/>
<id>271cad6d7e91ff8eea18976311692f99cd667ad3</id>
<content type='text'>
This avoids warnings with unreferenced variables in the !NUMA case.

Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This avoids warnings with unreferenced variables in the !NUMA case.

Signed-off-by: Andi Kleen &lt;ak@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: remove fastcall</title>
<updated>2008-02-08T17:22:31+00:00</updated>
<author>
<name>Harvey Harrison</name>
<email>harvey.harrison@gmail.com</email>
</author>
<published>2008-02-08T12:19:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=144b2a91468bdc0d4fa64b220c152fb58b8ffe05'/>
<id>144b2a91468bdc0d4fa64b220c152fb58b8ffe05</id>
<content type='text'>
Signed-off-by: Harvey Harrison &lt;harvey.harrison@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Harvey Harrison &lt;harvey.harrison@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT</title>
<updated>2008-02-08T17:22:30+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2008-02-08T12:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7fa3031500ec9b0a7460c8c23751799006ffee74'/>
<id>7fa3031500ec9b0a7460c8c23751799006ffee74</id>
<content type='text'>
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.

Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case.  Not
only that, but under such conditions A.OUT core dumps are not produced either.

To make this work, this patch also does the following:

 (1) Makes the existence of the contents of linux/a.out.h contingent on
     CONFIG_ARCH_SUPPORTS_AOUT.

 (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
     core dumping code.

 (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline.  This
     is then included only where needed.  This means that this bit of arch
     code will be stored in the appropriate A.OUT binfmt module rather than
     the core kernel.

 (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
     needed) and FRV.

This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.

[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Jeff Dike &lt;jdike@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.

Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case.  Not
only that, but under such conditions A.OUT core dumps are not produced either.

To make this work, this patch also does the following:

 (1) Makes the existence of the contents of linux/a.out.h contingent on
     CONFIG_ARCH_SUPPORTS_AOUT.

 (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
     core dumping code.

 (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline.  This
     is then included only where needed.  This means that this bit of arch
     code will be stored in the appropriate A.OUT binfmt module rather than
     the core kernel.

 (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
     needed) and FRV.

This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.

[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Jeff Dike &lt;jdike@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tty: let architectures override the user/kernel macros.</title>
<updated>2008-02-08T17:22:24+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2008-02-08T12:18:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aa7738a5f503abea5445cdd8cc2d501502c748ae'/>
<id>aa7738a5f503abea5445cdd8cc2d501502c748ae</id>
<content type='text'>
Give architectures that support the new termios2 the possibilty to overide the
user_termios_to_kernel_termios and kernel_termios_to_user_termios macros.  As
soon as all architectures that use the generic variant have been converted the
ifdefs can go away again.  Architectures in question are avr32, frv, powerpc
and s390.

Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Haavard Skinnemoen &lt;hskinnemoen@atmel.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Give architectures that support the new termios2 the possibilty to overide the
user_termios_to_kernel_termios and kernel_termios_to_user_termios macros.  As
soon as all architectures that use the generic variant have been converted the
ifdefs can go away again.  Architectures in question are avr32, frv, powerpc
and s390.

Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Haavard Skinnemoen &lt;hskinnemoen@atmel.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add cmpxchg_local to asm-generic for per cpu atomic operations</title>
<updated>2008-02-07T16:42:30+00:00</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@polymtl.ca</email>
</author>
<published>2008-02-07T08:16:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=068fbad288a2c18b75b0425fb56d241f018a1cb5'/>
<id>068fbad288a2c18b75b0425fb56d241f018a1cb5</id>
<content type='text'>
Emulates the cmpxchg_local by disabling interrupts around variable modification.
This is not reentrant wrt NMIs and MCEs. It is only protected against normal
interrupts, but this is enough for architectures without such interrupt sources
or if used in a context where the data is not shared with such handlers.

It can be used as a fallback for architectures lacking a real cmpxchg
instruction.

For architectures that have a real cmpxchg but does not have NMIs or MCE,
testing which of the generic vs architecture specific cmpxchg is the fastest
should be done.

asm-generic/cmpxchg.h defines a cmpxchg that uses cmpxchg_local. It is meant to
be used as a cmpxchg fallback for architectures that do not support SMP.

* Patch series comments

Using cmpxchg_local shows a performance improvements of the fast path goes from
a 66% speedup on a Pentium 4 to a 14% speedup on AMD64.

In detail:

Tested-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Measurements on a Pentium4, 3GHz, Hyperthread.
SLUB Performance testing
========================
1. Kmalloc: Repeatedly allocate then free test

* slub HEAD, test 1
kmalloc(8) = 201 cycles         kfree = 351 cycles
kmalloc(16) = 198 cycles        kfree = 359 cycles
kmalloc(32) = 200 cycles        kfree = 381 cycles
kmalloc(64) = 224 cycles        kfree = 394 cycles
kmalloc(128) = 285 cycles       kfree = 424 cycles
kmalloc(256) = 411 cycles       kfree = 546 cycles
kmalloc(512) = 480 cycles       kfree = 619 cycles
kmalloc(1024) = 623 cycles      kfree = 750 cycles
kmalloc(2048) = 686 cycles      kfree = 811 cycles
kmalloc(4096) = 482 cycles      kfree = 538 cycles
kmalloc(8192) = 680 cycles      kfree = 734 cycles
kmalloc(16384) = 713 cycles     kfree = 843 cycles

* Slub HEAD, test 2
kmalloc(8) = 190 cycles         kfree = 351 cycles
kmalloc(16) = 195 cycles        kfree = 360 cycles
kmalloc(32) = 201 cycles        kfree = 370 cycles
kmalloc(64) = 245 cycles        kfree = 389 cycles
kmalloc(128) = 283 cycles       kfree = 413 cycles
kmalloc(256) = 409 cycles       kfree = 547 cycles
kmalloc(512) = 476 cycles       kfree = 616 cycles
kmalloc(1024) = 628 cycles      kfree = 753 cycles
kmalloc(2048) = 684 cycles      kfree = 811 cycles
kmalloc(4096) = 480 cycles      kfree = 539 cycles
kmalloc(8192) = 661 cycles      kfree = 746 cycles
kmalloc(16384) = 741 cycles     kfree = 856 cycles

* cmpxchg_local Slub test
kmalloc(8) = 83 cycles          kfree = 363 cycles
kmalloc(16) = 85 cycles         kfree = 372 cycles
kmalloc(32) = 92 cycles         kfree = 377 cycles
kmalloc(64) = 115 cycles        kfree = 397 cycles
kmalloc(128) = 179 cycles       kfree = 438 cycles
kmalloc(256) = 314 cycles       kfree = 564 cycles
kmalloc(512) = 398 cycles       kfree = 615 cycles
kmalloc(1024) = 573 cycles      kfree = 745 cycles
kmalloc(2048) = 629 cycles      kfree = 816 cycles
kmalloc(4096) = 473 cycles      kfree = 548 cycles
kmalloc(8192) = 659 cycles      kfree = 745 cycles
kmalloc(16384) = 724 cycles     kfree = 843 cycles

2. Kmalloc: alloc/free test

* slub HEAD, test 1
kmalloc(8)/kfree = 322 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 325 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 678 cycles
kmalloc(8192)/kfree = 1013 cycles
kmalloc(16384)/kfree = 1157 cycles

* Slub HEAD, test 2
kmalloc(8)/kfree = 323 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 318 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 648 cycles
kmalloc(8192)/kfree = 1009 cycles
kmalloc(16384)/kfree = 1105 cycles

* cmpxchg_local Slub test
kmalloc(8)/kfree = 112 cycles
kmalloc(16)/kfree = 103 cycles
kmalloc(32)/kfree = 103 cycles
kmalloc(64)/kfree = 103 cycles
kmalloc(128)/kfree = 112 cycles
kmalloc(256)/kfree = 111 cycles
kmalloc(512)/kfree = 111 cycles
kmalloc(1024)/kfree = 111 cycles
kmalloc(2048)/kfree = 121 cycles
kmalloc(4096)/kfree = 650 cycles
kmalloc(8192)/kfree = 1042 cycles
kmalloc(16384)/kfree = 1149 cycles

Tested-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Measurements on a AMD64 2.0 GHz dual-core

In this test, we seem to remove 10 cycles from the kmalloc fast path.
On small allocations, it gives a 14% performance increase. kfree fast
path also seems to have a 10 cycles improvement.

1. Kmalloc: Repeatedly allocate then free test

* cmpxchg_local slub
kmalloc(8) = 63 cycles      kfree = 126 cycles
kmalloc(16) = 66 cycles     kfree = 129 cycles
kmalloc(32) = 76 cycles     kfree = 138 cycles
kmalloc(64) = 100 cycles    kfree = 288 cycles
kmalloc(128) = 128 cycles   kfree = 309 cycles
kmalloc(256) = 170 cycles   kfree = 315 cycles
kmalloc(512) = 221 cycles   kfree = 357 cycles
kmalloc(1024) = 324 cycles  kfree = 393 cycles
kmalloc(2048) = 354 cycles  kfree = 440 cycles
kmalloc(4096) = 394 cycles  kfree = 330 cycles
kmalloc(8192) = 523 cycles  kfree = 481 cycles
kmalloc(16384) = 643 cycles kfree = 649 cycles

* Base
kmalloc(8) = 74 cycles      kfree = 113 cycles
kmalloc(16) = 76 cycles     kfree = 116 cycles
kmalloc(32) = 85 cycles     kfree = 133 cycles
kmalloc(64) = 111 cycles    kfree = 279 cycles
kmalloc(128) = 138 cycles   kfree = 294 cycles
kmalloc(256) = 181 cycles   kfree = 304 cycles
kmalloc(512) = 237 cycles   kfree = 327 cycles
kmalloc(1024) = 340 cycles  kfree = 379 cycles
kmalloc(2048) = 378 cycles  kfree = 433 cycles
kmalloc(4096) = 399 cycles  kfree = 329 cycles
kmalloc(8192) = 528 cycles  kfree = 624 cycles
kmalloc(16384) = 651 cycles kfree = 737 cycles

2. Kmalloc: alloc/free test

* cmpxchg_local slub
kmalloc(8)/kfree = 96 cycles
kmalloc(16)/kfree = 97 cycles
kmalloc(32)/kfree = 97 cycles
kmalloc(64)/kfree = 97 cycles
kmalloc(128)/kfree = 97 cycles
kmalloc(256)/kfree = 105 cycles
kmalloc(512)/kfree = 108 cycles
kmalloc(1024)/kfree = 105 cycles
kmalloc(2048)/kfree = 107 cycles
kmalloc(4096)/kfree = 390 cycles
kmalloc(8192)/kfree = 626 cycles
kmalloc(16384)/kfree = 662 cycles

* Base
kmalloc(8)/kfree = 116 cycles
kmalloc(16)/kfree = 116 cycles
kmalloc(32)/kfree = 116 cycles
kmalloc(64)/kfree = 116 cycles
kmalloc(128)/kfree = 116 cycles
kmalloc(256)/kfree = 126 cycles
kmalloc(512)/kfree = 126 cycles
kmalloc(1024)/kfree = 126 cycles
kmalloc(2048)/kfree = 126 cycles
kmalloc(4096)/kfree = 384 cycles
kmalloc(8192)/kfree = 749 cycles
kmalloc(16384)/kfree = 786 cycles

Tested-by: Christoph Lameter &lt;clameter@sgi.com&gt;
I can confirm Mathieus' measurement now:

Athlon64:

regular NUMA/discontig

1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -&gt; 79 cycles kfree -&gt; 92 cycles
10000 times kmalloc(16) -&gt; 79 cycles kfree -&gt; 93 cycles
10000 times kmalloc(32) -&gt; 88 cycles kfree -&gt; 95 cycles
10000 times kmalloc(64) -&gt; 124 cycles kfree -&gt; 132 cycles
10000 times kmalloc(128) -&gt; 157 cycles kfree -&gt; 247 cycles
10000 times kmalloc(256) -&gt; 200 cycles kfree -&gt; 257 cycles
10000 times kmalloc(512) -&gt; 250 cycles kfree -&gt; 277 cycles
10000 times kmalloc(1024) -&gt; 337 cycles kfree -&gt; 314 cycles
10000 times kmalloc(2048) -&gt; 365 cycles kfree -&gt; 330 cycles
10000 times kmalloc(4096) -&gt; 352 cycles kfree -&gt; 240 cycles
10000 times kmalloc(8192) -&gt; 456 cycles kfree -&gt; 340 cycles
10000 times kmalloc(16384) -&gt; 646 cycles kfree -&gt; 471 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -&gt; 124 cycles
10000 times kmalloc(16)/kfree -&gt; 124 cycles
10000 times kmalloc(32)/kfree -&gt; 124 cycles
10000 times kmalloc(64)/kfree -&gt; 124 cycles
10000 times kmalloc(128)/kfree -&gt; 124 cycles
10000 times kmalloc(256)/kfree -&gt; 132 cycles
10000 times kmalloc(512)/kfree -&gt; 132 cycles
10000 times kmalloc(1024)/kfree -&gt; 132 cycles
10000 times kmalloc(2048)/kfree -&gt; 132 cycles
10000 times kmalloc(4096)/kfree -&gt; 319 cycles
10000 times kmalloc(8192)/kfree -&gt; 486 cycles
10000 times kmalloc(16384)/kfree -&gt; 539 cycles

cmpxchg_local NUMA/discontig

1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -&gt; 55 cycles kfree -&gt; 90 cycles
10000 times kmalloc(16) -&gt; 55 cycles kfree -&gt; 92 cycles
10000 times kmalloc(32) -&gt; 70 cycles kfree -&gt; 91 cycles
10000 times kmalloc(64) -&gt; 100 cycles kfree -&gt; 141 cycles
10000 times kmalloc(128) -&gt; 128 cycles kfree -&gt; 233 cycles
10000 times kmalloc(256) -&gt; 172 cycles kfree -&gt; 251 cycles
10000 times kmalloc(512) -&gt; 225 cycles kfree -&gt; 275 cycles
10000 times kmalloc(1024) -&gt; 325 cycles kfree -&gt; 311 cycles
10000 times kmalloc(2048) -&gt; 346 cycles kfree -&gt; 330 cycles
10000 times kmalloc(4096) -&gt; 351 cycles kfree -&gt; 238 cycles
10000 times kmalloc(8192) -&gt; 450 cycles kfree -&gt; 342 cycles
10000 times kmalloc(16384) -&gt; 630 cycles kfree -&gt; 546 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -&gt; 81 cycles
10000 times kmalloc(16)/kfree -&gt; 81 cycles
10000 times kmalloc(32)/kfree -&gt; 81 cycles
10000 times kmalloc(64)/kfree -&gt; 81 cycles
10000 times kmalloc(128)/kfree -&gt; 81 cycles
10000 times kmalloc(256)/kfree -&gt; 91 cycles
10000 times kmalloc(512)/kfree -&gt; 90 cycles
10000 times kmalloc(1024)/kfree -&gt; 91 cycles
10000 times kmalloc(2048)/kfree -&gt; 90 cycles
10000 times kmalloc(4096)/kfree -&gt; 318 cycles
10000 times kmalloc(8192)/kfree -&gt; 483 cycles
10000 times kmalloc(16384)/kfree -&gt; 536 cycles

Changelog:
- Ran though checkpatch.

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Emulates the cmpxchg_local by disabling interrupts around variable modification.
This is not reentrant wrt NMIs and MCEs. It is only protected against normal
interrupts, but this is enough for architectures without such interrupt sources
or if used in a context where the data is not shared with such handlers.

It can be used as a fallback for architectures lacking a real cmpxchg
instruction.

For architectures that have a real cmpxchg but does not have NMIs or MCE,
testing which of the generic vs architecture specific cmpxchg is the fastest
should be done.

asm-generic/cmpxchg.h defines a cmpxchg that uses cmpxchg_local. It is meant to
be used as a cmpxchg fallback for architectures that do not support SMP.

* Patch series comments

Using cmpxchg_local shows a performance improvements of the fast path goes from
a 66% speedup on a Pentium 4 to a 14% speedup on AMD64.

In detail:

Tested-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Measurements on a Pentium4, 3GHz, Hyperthread.
SLUB Performance testing
========================
1. Kmalloc: Repeatedly allocate then free test

* slub HEAD, test 1
kmalloc(8) = 201 cycles         kfree = 351 cycles
kmalloc(16) = 198 cycles        kfree = 359 cycles
kmalloc(32) = 200 cycles        kfree = 381 cycles
kmalloc(64) = 224 cycles        kfree = 394 cycles
kmalloc(128) = 285 cycles       kfree = 424 cycles
kmalloc(256) = 411 cycles       kfree = 546 cycles
kmalloc(512) = 480 cycles       kfree = 619 cycles
kmalloc(1024) = 623 cycles      kfree = 750 cycles
kmalloc(2048) = 686 cycles      kfree = 811 cycles
kmalloc(4096) = 482 cycles      kfree = 538 cycles
kmalloc(8192) = 680 cycles      kfree = 734 cycles
kmalloc(16384) = 713 cycles     kfree = 843 cycles

* Slub HEAD, test 2
kmalloc(8) = 190 cycles         kfree = 351 cycles
kmalloc(16) = 195 cycles        kfree = 360 cycles
kmalloc(32) = 201 cycles        kfree = 370 cycles
kmalloc(64) = 245 cycles        kfree = 389 cycles
kmalloc(128) = 283 cycles       kfree = 413 cycles
kmalloc(256) = 409 cycles       kfree = 547 cycles
kmalloc(512) = 476 cycles       kfree = 616 cycles
kmalloc(1024) = 628 cycles      kfree = 753 cycles
kmalloc(2048) = 684 cycles      kfree = 811 cycles
kmalloc(4096) = 480 cycles      kfree = 539 cycles
kmalloc(8192) = 661 cycles      kfree = 746 cycles
kmalloc(16384) = 741 cycles     kfree = 856 cycles

* cmpxchg_local Slub test
kmalloc(8) = 83 cycles          kfree = 363 cycles
kmalloc(16) = 85 cycles         kfree = 372 cycles
kmalloc(32) = 92 cycles         kfree = 377 cycles
kmalloc(64) = 115 cycles        kfree = 397 cycles
kmalloc(128) = 179 cycles       kfree = 438 cycles
kmalloc(256) = 314 cycles       kfree = 564 cycles
kmalloc(512) = 398 cycles       kfree = 615 cycles
kmalloc(1024) = 573 cycles      kfree = 745 cycles
kmalloc(2048) = 629 cycles      kfree = 816 cycles
kmalloc(4096) = 473 cycles      kfree = 548 cycles
kmalloc(8192) = 659 cycles      kfree = 745 cycles
kmalloc(16384) = 724 cycles     kfree = 843 cycles

2. Kmalloc: alloc/free test

* slub HEAD, test 1
kmalloc(8)/kfree = 322 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 325 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 678 cycles
kmalloc(8192)/kfree = 1013 cycles
kmalloc(16384)/kfree = 1157 cycles

* Slub HEAD, test 2
kmalloc(8)/kfree = 323 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 318 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 648 cycles
kmalloc(8192)/kfree = 1009 cycles
kmalloc(16384)/kfree = 1105 cycles

* cmpxchg_local Slub test
kmalloc(8)/kfree = 112 cycles
kmalloc(16)/kfree = 103 cycles
kmalloc(32)/kfree = 103 cycles
kmalloc(64)/kfree = 103 cycles
kmalloc(128)/kfree = 112 cycles
kmalloc(256)/kfree = 111 cycles
kmalloc(512)/kfree = 111 cycles
kmalloc(1024)/kfree = 111 cycles
kmalloc(2048)/kfree = 121 cycles
kmalloc(4096)/kfree = 650 cycles
kmalloc(8192)/kfree = 1042 cycles
kmalloc(16384)/kfree = 1149 cycles

Tested-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Measurements on a AMD64 2.0 GHz dual-core

In this test, we seem to remove 10 cycles from the kmalloc fast path.
On small allocations, it gives a 14% performance increase. kfree fast
path also seems to have a 10 cycles improvement.

1. Kmalloc: Repeatedly allocate then free test

* cmpxchg_local slub
kmalloc(8) = 63 cycles      kfree = 126 cycles
kmalloc(16) = 66 cycles     kfree = 129 cycles
kmalloc(32) = 76 cycles     kfree = 138 cycles
kmalloc(64) = 100 cycles    kfree = 288 cycles
kmalloc(128) = 128 cycles   kfree = 309 cycles
kmalloc(256) = 170 cycles   kfree = 315 cycles
kmalloc(512) = 221 cycles   kfree = 357 cycles
kmalloc(1024) = 324 cycles  kfree = 393 cycles
kmalloc(2048) = 354 cycles  kfree = 440 cycles
kmalloc(4096) = 394 cycles  kfree = 330 cycles
kmalloc(8192) = 523 cycles  kfree = 481 cycles
kmalloc(16384) = 643 cycles kfree = 649 cycles

* Base
kmalloc(8) = 74 cycles      kfree = 113 cycles
kmalloc(16) = 76 cycles     kfree = 116 cycles
kmalloc(32) = 85 cycles     kfree = 133 cycles
kmalloc(64) = 111 cycles    kfree = 279 cycles
kmalloc(128) = 138 cycles   kfree = 294 cycles
kmalloc(256) = 181 cycles   kfree = 304 cycles
kmalloc(512) = 237 cycles   kfree = 327 cycles
kmalloc(1024) = 340 cycles  kfree = 379 cycles
kmalloc(2048) = 378 cycles  kfree = 433 cycles
kmalloc(4096) = 399 cycles  kfree = 329 cycles
kmalloc(8192) = 528 cycles  kfree = 624 cycles
kmalloc(16384) = 651 cycles kfree = 737 cycles

2. Kmalloc: alloc/free test

* cmpxchg_local slub
kmalloc(8)/kfree = 96 cycles
kmalloc(16)/kfree = 97 cycles
kmalloc(32)/kfree = 97 cycles
kmalloc(64)/kfree = 97 cycles
kmalloc(128)/kfree = 97 cycles
kmalloc(256)/kfree = 105 cycles
kmalloc(512)/kfree = 108 cycles
kmalloc(1024)/kfree = 105 cycles
kmalloc(2048)/kfree = 107 cycles
kmalloc(4096)/kfree = 390 cycles
kmalloc(8192)/kfree = 626 cycles
kmalloc(16384)/kfree = 662 cycles

* Base
kmalloc(8)/kfree = 116 cycles
kmalloc(16)/kfree = 116 cycles
kmalloc(32)/kfree = 116 cycles
kmalloc(64)/kfree = 116 cycles
kmalloc(128)/kfree = 116 cycles
kmalloc(256)/kfree = 126 cycles
kmalloc(512)/kfree = 126 cycles
kmalloc(1024)/kfree = 126 cycles
kmalloc(2048)/kfree = 126 cycles
kmalloc(4096)/kfree = 384 cycles
kmalloc(8192)/kfree = 749 cycles
kmalloc(16384)/kfree = 786 cycles

Tested-by: Christoph Lameter &lt;clameter@sgi.com&gt;
I can confirm Mathieus' measurement now:

Athlon64:

regular NUMA/discontig

1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -&gt; 79 cycles kfree -&gt; 92 cycles
10000 times kmalloc(16) -&gt; 79 cycles kfree -&gt; 93 cycles
10000 times kmalloc(32) -&gt; 88 cycles kfree -&gt; 95 cycles
10000 times kmalloc(64) -&gt; 124 cycles kfree -&gt; 132 cycles
10000 times kmalloc(128) -&gt; 157 cycles kfree -&gt; 247 cycles
10000 times kmalloc(256) -&gt; 200 cycles kfree -&gt; 257 cycles
10000 times kmalloc(512) -&gt; 250 cycles kfree -&gt; 277 cycles
10000 times kmalloc(1024) -&gt; 337 cycles kfree -&gt; 314 cycles
10000 times kmalloc(2048) -&gt; 365 cycles kfree -&gt; 330 cycles
10000 times kmalloc(4096) -&gt; 352 cycles kfree -&gt; 240 cycles
10000 times kmalloc(8192) -&gt; 456 cycles kfree -&gt; 340 cycles
10000 times kmalloc(16384) -&gt; 646 cycles kfree -&gt; 471 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -&gt; 124 cycles
10000 times kmalloc(16)/kfree -&gt; 124 cycles
10000 times kmalloc(32)/kfree -&gt; 124 cycles
10000 times kmalloc(64)/kfree -&gt; 124 cycles
10000 times kmalloc(128)/kfree -&gt; 124 cycles
10000 times kmalloc(256)/kfree -&gt; 132 cycles
10000 times kmalloc(512)/kfree -&gt; 132 cycles
10000 times kmalloc(1024)/kfree -&gt; 132 cycles
10000 times kmalloc(2048)/kfree -&gt; 132 cycles
10000 times kmalloc(4096)/kfree -&gt; 319 cycles
10000 times kmalloc(8192)/kfree -&gt; 486 cycles
10000 times kmalloc(16384)/kfree -&gt; 539 cycles

cmpxchg_local NUMA/discontig

1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -&gt; 55 cycles kfree -&gt; 90 cycles
10000 times kmalloc(16) -&gt; 55 cycles kfree -&gt; 92 cycles
10000 times kmalloc(32) -&gt; 70 cycles kfree -&gt; 91 cycles
10000 times kmalloc(64) -&gt; 100 cycles kfree -&gt; 141 cycles
10000 times kmalloc(128) -&gt; 128 cycles kfree -&gt; 233 cycles
10000 times kmalloc(256) -&gt; 172 cycles kfree -&gt; 251 cycles
10000 times kmalloc(512) -&gt; 225 cycles kfree -&gt; 275 cycles
10000 times kmalloc(1024) -&gt; 325 cycles kfree -&gt; 311 cycles
10000 times kmalloc(2048) -&gt; 346 cycles kfree -&gt; 330 cycles
10000 times kmalloc(4096) -&gt; 351 cycles kfree -&gt; 238 cycles
10000 times kmalloc(8192) -&gt; 450 cycles kfree -&gt; 342 cycles
10000 times kmalloc(16384) -&gt; 630 cycles kfree -&gt; 546 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -&gt; 81 cycles
10000 times kmalloc(16)/kfree -&gt; 81 cycles
10000 times kmalloc(32)/kfree -&gt; 81 cycles
10000 times kmalloc(64)/kfree -&gt; 81 cycles
10000 times kmalloc(128)/kfree -&gt; 81 cycles
10000 times kmalloc(256)/kfree -&gt; 91 cycles
10000 times kmalloc(512)/kfree -&gt; 90 cycles
10000 times kmalloc(1024)/kfree -&gt; 91 cycles
10000 times kmalloc(2048)/kfree -&gt; 90 cycles
10000 times kmalloc(4096)/kfree -&gt; 318 cycles
10000 times kmalloc(8192)/kfree -&gt; 483 cycles
10000 times kmalloc(16384)/kfree -&gt; 536 cycles

Changelog:
- Ran though checkpatch.

Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Unexport asm/page.h</title>
<updated>2008-02-07T16:42:30+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>k.shutemov@gmail.com</email>
</author>
<published>2008-02-07T08:15:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ed7b1889da256977574663689b598d88950bbd23'/>
<id>ed7b1889da256977574663689b598d88950bbd23</id>
<content type='text'>
Do not export asm/page.h during make headers_install.  This removes PAGE_SIZE
from userspace headers.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not export asm/page.h during make headers_install.  This removes PAGE_SIZE
from userspace headers.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Unexport asm/elf.h</title>
<updated>2008-02-07T16:42:30+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>k.shutemov@gmail.com</email>
</author>
<published>2008-02-07T08:15:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6cc931b9b5ec652c90b928f3ec2163f261552c91'/>
<id>6cc931b9b5ec652c90b928f3ec2163f261552c91</id>
<content type='text'>
Do not export asm/elf.h during make headers_install.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not export asm/elf.h during make headers_install.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Unexport asm/user.h and linux/user.h</title>
<updated>2008-02-07T16:42:29+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>k.shutemov@gmail.com</email>
</author>
<published>2008-02-07T08:15:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c1445db9f72db0537c43a2eab6e1b0f6741162f5'/>
<id>c1445db9f72db0537c43a2eab6e1b0f6741162f5</id>
<content type='text'>
Do not export asm/user.h and linux/user.h during make headers_install.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not export asm/user.h and linux/user.h during make headers_install.

Signed-off-by: Kirill A. Shutemov &lt;k.shutemov@gmail.com&gt;
Reviewed-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>remove support for un-needed _extratext section</title>
<updated>2008-02-06T18:41:01+00:00</updated>
<author>
<name>Robin Getz</name>
<email>rgetz@blackfin.uclinux.org</email>
</author>
<published>2008-02-06T09:36:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a3b81113fb6658629f4ebaabf8dd3067cd341020'/>
<id>a3b81113fb6658629f4ebaabf8dd3067cd341020</id>
<content type='text'>
When passing a zero address to kallsyms_lookup(), the kernel thought it was
a valid kernel address, even if it is not.  This is because is_ksym_addr()
called is_kernel_extratext() and checked against labels that don't exist on
many archs (which default as zero).  Since PPC was the only kernel which
defines _extra_text, (in 2005), and no longer needs it, this patch removes
_extra_text support.

For some history (provided by Jon):
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019734.html
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019736.html
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019751.html

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Robin Getz &lt;rgetz@blackfin.uclinux.org&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Jon Loeliger &lt;jdl@freescale.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When passing a zero address to kallsyms_lookup(), the kernel thought it was
a valid kernel address, even if it is not.  This is because is_ksym_addr()
called is_kernel_extratext() and checked against labels that don't exist on
many archs (which default as zero).  Since PPC was the only kernel which
defines _extra_text, (in 2005), and no longer needs it, this patch removes
_extra_text support.

For some history (provided by Jon):
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019734.html
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019736.html
 http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019751.html

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Robin Getz &lt;rgetz@blackfin.uclinux.org&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Jon Loeliger &lt;jdl@freescale.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Sam Ravnborg &lt;sam@ravnborg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>taskstats scaled time cleanup</title>
<updated>2008-02-06T18:41:00+00:00</updated>
<author>
<name>Michael Neuling</name>
<email>mikey@neuling.org</email>
</author>
<published>2008-02-06T09:36:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=06b8e878a9bc9301201cffe186eba99c4185f20a'/>
<id>06b8e878a9bc9301201cffe186eba99c4185f20a</id>
<content type='text'>
This moves the ability to scale cputime into generic code.  This allows us
to fix the issue in kernel/timer.c (noticed by Balbir) where we could only
add an unscaled value to the scaled utime/stime.

This adds a cputime_to_scaled function.  As before, the POWERPC version
does the scaling based on the last SPURR/PURR ratio calculated.  The
generic and s390 (only other arch to implement asm/cputime.h) versions are
both NOPs.

Also moves the SPURR and PURR snapshots closer.

Signed-off-by: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Jay Lan &lt;jlan@engr.sgi.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This moves the ability to scale cputime into generic code.  This allows us
to fix the issue in kernel/timer.c (noticed by Balbir) where we could only
add an unscaled value to the scaled utime/stime.

This adds a cputime_to_scaled function.  As before, the POWERPC version
does the scaling based on the last SPURR/PURR ratio calculated.  The
generic and s390 (only other arch to implement asm/cputime.h) versions are
both NOPs.

Also moves the SPURR and PURR snapshots closer.

Signed-off-by: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Jay Lan &lt;jlan@engr.sgi.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
