<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/asm-generic/bitops, branch tegra-10.11.4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>arch, hweight: Fix compilation errors</title>
<updated>2010-05-04T17:25:27+00:00</updated>
<author>
<name>Borislav Petkov</name>
<email>borislav.petkov@amd.com</email>
</author>
<published>2010-05-03T12:57:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4677d4a53e0d565742277e8913e91c821453e63e'/>
<id>4677d4a53e0d565742277e8913e91c821453e63e</id>
<content type='text'>
Fix function prototype visibility issues when compiling for non-x86
architectures. Tested with crosstool
(ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc
targets.

Signed-off-by: Borislav Petkov &lt;borislav.petkov@amd.com&gt;
LKML-Reference: &lt;20100503130736.GD26107@aftab&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix function prototype visibility issues when compiling for non-x86
architectures. Tested with crosstool
(ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc
targets.

Signed-off-by: Borislav Petkov &lt;borislav.petkov@amd.com&gt;
LKML-Reference: &lt;20100503130736.GD26107@aftab&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Add optimized popcnt variants</title>
<updated>2010-04-06T22:52:11+00:00</updated>
<author>
<name>Borislav Petkov</name>
<email>borislav.petkov@amd.com</email>
</author>
<published>2010-03-05T16:34:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d61931d89be506372d01a90d1755f6d0a9fafe2d'/>
<id>d61931d89be506372d01a90d1755f6d0a9fafe2d</id>
<content type='text'>
Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov &lt;borislav.petkov@amd.com&gt;
LKML-Reference: &lt;20100318112015.GC11152@aftab&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov &lt;borislav.petkov@amd.com&gt;
LKML-Reference: &lt;20100318112015.GC11152@aftab&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bitops: Optimize hweight() by making use of compile-time evaluation</title>
<updated>2010-04-06T22:52:11+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2010-02-01T14:03:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1527bc8b928dd1399c3d3467dd47d9ede210978a'/>
<id>1527bc8b928dd1399c3d3467dd47d9ede210978a</id>
<content type='text'>
Rename the extisting runtime hweight() implementations to
__arch_hweight(), rename the compile-time versions to __const_hweight()
and then have hweight() pick between them.

Suggested-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100318111929.GB11152@aftab&gt;
Acked-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
LKML-Reference: &lt;1265028224.24455.154.camel@laptop&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename the extisting runtime hweight() implementations to
__arch_hweight(), rename the compile-time versions to __const_hweight()
and then have hweight() pick between them.

Suggested-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100318111929.GB11152@aftab&gt;
Acked-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
LKML-Reference: &lt;1265028224.24455.154.camel@laptop&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking: Convert __raw_spin* functions to arch_spin*</title>
<updated>2009-12-14T22:55:32+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2009-12-02T19:01:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0199c4e68d1f02894bdefe4b5d9e9ee4aedd8d62'/>
<id>0199c4e68d1f02894bdefe4b5d9e9ee4aedd8d62</id>
<content type='text'>
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: linux-arch@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: linux-arch@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>locking: Convert raw_spinlock to arch_spinlock</title>
<updated>2009-12-14T22:55:32+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2009-12-02T18:49:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=445c89514be242b1b0080056d50bdc1b72adeb5c'/>
<id>445c89514be242b1b0080056d50bdc1b72adeb5c</id>
<content type='text'>
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: linux-arch@vger.kernel.org

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: linux-arch@vger.kernel.org

</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: rename atomic.h to atomic-long.h</title>
<updated>2009-06-11T19:02:17+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2009-05-13T22:56:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=72099ed2719fc5829bd79c6ca9d1783ed026eb37'/>
<id>72099ed2719fc5829bd79c6ca9d1783ed026eb37</id>
<content type='text'>
The existing asm-generic/atomic.h only defines the
atomic_long type. This renames it to atomic-long.h
so we have a place to add a truly generic atomic.h
that can be used on all non-SMP systems.

Signed-off-by: Remis Lima Baima &lt;remis.developer@googlemail.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The existing asm-generic/atomic.h only defines the
atomic_long type. This renames it to atomic-long.h
so we have a place to add a truly generic atomic.h
that can be used on all non-SMP systems.

Signed-off-by: Remis Lima Baima &lt;remis.developer@googlemail.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, generic: mark complex bitops.h inlines as __always_inline</title>
<updated>2009-01-13T17:56:30+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>andi@firstfloor.org</email>
</author>
<published>2009-01-12T22:01:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c8399943bdb70fef78798b97f975506ecc99e039'/>
<id>c8399943bdb70fef78798b97f975506ecc99e039</id>
<content type='text'>
Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bitops: use __fls for fls64 on 64-bit archs</title>
<updated>2008-04-26T17:21:16+00:00</updated>
<author>
<name>Alexander van Heukelum</name>
<email>heukelum@mailshack.com</email>
</author>
<published>2008-03-15T17:32:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d57594c203b1e7b54373080a797f0cbfa4aade68'/>
<id>d57594c203b1e7b54373080a797f0cbfa4aade68</id>
<content type='text'>
Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>generic: introduce a generic __fls implementation</title>
<updated>2008-04-26T17:21:16+00:00</updated>
<author>
<name>Alexander van Heukelum</name>
<email>heukelum@mailshack.com</email>
</author>
<published>2008-03-15T17:30:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d9dff22e8ad06ad330968c9e3d3a2fb55a5f9c3'/>
<id>7d9dff22e8ad06ad330968c9e3d3a2fb55a5f9c3</id>
<content type='text'>
Add a generic __fls implementation in the same spirit as
the generic __ffs one. It finds the last (most significant)
set bit in the given long value.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a generic __fls implementation in the same spirit as
the generic __ffs one. It finds the last (most significant)
set bit in the given long value.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps</title>
<updated>2008-04-26T17:21:16+00:00</updated>
<author>
<name>Alexander van Heukelum</name>
<email>heukelum@mailshack.com</email>
</author>
<published>2008-03-11T15:17:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=64970b68d2b3ed32b964b0b30b1b98518fde388e'/>
<id>64970b68d2b3ed32b964b0b30b1b98518fde388e</id>
<content type='text'>
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64, 52 locations are optimized with a minimal increase in
code size:

Current #testing defconfig:
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

[ tglx@linutronix.de: build fixes ]

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64, 52 locations are optimized with a minimal increase in
code size:

Current #testing defconfig:
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

[ tglx@linutronix.de: build fixes ]

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
</feed>
