linux-toradex.git/include/asm-generic/bitops, branch tegra-10.11.4

arch, hweight: Fix compilation errors

2010-05-04T17:25:27+00:00

Fix function prototype visibility issues when compiling for non-x86
architectures. Tested with crosstool
(ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc
targets.

Signed-off-by: Borislav Petkov 
LKML-Reference: <20100503130736.GD26107@aftab>
Signed-off-by: H. Peter Anvin

x86: Add optimized popcnt variants

2010-04-06T22:52:11+00:00

Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov 
LKML-Reference: <20100318112015.GC11152@aftab>
Signed-off-by: H. Peter Anvin

bitops: Optimize hweight() by making use of compile-time evaluation

2010-04-06T22:52:11+00:00

Rename the extisting runtime hweight() implementations to
__arch_hweight(), rename the compile-time versions to __const_hweight()
and then have hweight() pick between them.

Suggested-by: H. Peter Anvin 
Signed-off-by: Peter Zijlstra 
LKML-Reference: <20100318111929.GB11152@aftab>
Acked-by: H. Peter Anvin 
LKML-Reference: <1265028224.24455.154.camel@laptop>
Signed-off-by: H. Peter Anvin

locking: Convert __raw_spin* functions to arch_spin*

2009-12-14T22:55:32+00:00

Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra 
Acked-by: David S. Miller 
Acked-by: Ingo Molnar 
Cc: linux-arch@vger.kernel.org

locking: Convert raw_spinlock to arch_spinlock

2009-12-14T22:55:32+00:00

The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra 
Acked-by: David S. Miller 
Acked-by: Ingo Molnar 
Cc: linux-arch@vger.kernel.org

asm-generic: rename atomic.h to atomic-long.h

2009-06-11T19:02:17+00:00

The existing asm-generic/atomic.h only defines the
atomic_long type. This renames it to atomic-long.h
so we have a place to add a truly generic atomic.h
that can be used on all non-SMP systems.

Signed-off-by: Remis Lima Baima 
Signed-off-by: Arnd Bergmann 
Acked-by: Ingo Molnar

x86, generic: mark complex bitops.h inlines as __always_inline

2009-01-13T17:56:30+00:00

Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Ingo Molnar

bitops: use __fls for fls64 on 64-bit archs

2008-04-26T17:21:16+00:00

Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.

Signed-off-by: Alexander van Heukelum 
Signed-off-by: Ingo Molnar

generic: introduce a generic __fls implementation

2008-04-26T17:21:16+00:00

Add a generic __fls implementation in the same spirit as
the generic __ffs one. It finds the last (most significant)
set bit in the given long value.

Signed-off-by: Alexander van Heukelum 
Signed-off-by: Ingo Molnar

x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps

2008-04-26T17:21:16+00:00

This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.

On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:

In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.

On x86_64, 52 locations are optimized with a minimal increase in
code size:

Current #testing defconfig:
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392637  846592  724424 6963653  6a41c5 vmlinux

After removing the x86_64 specific optimization for find_next_*bit:
	94 x bsf, 79 x find_next_*bit
   text    data     bss     dec     hex filename
   5392358  846592  724424 6963374  6a40ae vmlinux

After this patch (making the optimization generic):
	146 x bsf, 27 x find_next_*bit
   text    data     bss     dec     hex filename
   5392396  846592  724424 6963412  6a40d4 vmlinux

[ tglx@linutronix.de: build fixes ]

Signed-off-by: Ingo Molnar