linux-toradex.git/arch/arm/lib, branch v2.6.32.28

ARM: 6482/2: Fix find_next_zero_bit and related assembly

2010-12-09T21:27:03+00:00

commit 0e91ec0c06d2cd15071a6021c94840a50e6671aa upstream.

The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.

Reviewed-by: Nicolas Pitre 
Tested-by: Stephen Warren 
Signed-off-by: James Jones 
Signed-off-by: Russell King 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'for-rmk' of git://linux-arm.org/linux-2.6

2009-09-19T12:47:57+00:00

ARM: 5701/1: ARM: copy_page.S: take into account the size of the cache line

2009-09-15T21:07:02+00:00

Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include 
 #include 
 #include 

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i < NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j < BUF_SIZE; j+= 4096)
                                 buf[j] = (j & 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka 
Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Russell King

Nicolas Pitre has a new email address

2009-09-15T16:37:12+00:00

Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre 
Signed-off-by: Linus Torvalds

Merge branch 'for-rmk-2.6.32' of git://git.pengutronix.de/git/ukl/linux-2.6 into devel-stable

2009-08-15T15:51:48+00:00

Complete irq tracing support for ARM

2009-08-13T18:34:37+00:00

Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König 
Cc: Russell King 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 

Signed-off-by: Uwe Kleine-König

Thumb-2: Implement the unified arch/arm/lib functions

2009-07-24T11:32:57+00:00

This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas

Thumb-2: Add some .align statements to the .S files

2009-07-24T11:32:52+00:00

Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas

Merge branch 'copy_user' of git://git.marvell.com/orion into devel

2009-06-14T09:59:32+00:00

[ARM] alternative copy_to_user: more precise fallback threshold

2009-05-30T05:10:15+00:00

Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre