<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/arm/lib, branch v2.6.32.28</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ARM: 6482/2: Fix find_next_zero_bit and related assembly</title>
<updated>2010-12-09T21:27:03+00:00</updated>
<author>
<name>James Jones</name>
<email>jajones@nvidia.com</email>
</author>
<published>2010-11-23T23:21:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=822d9b61eb5d4a48fa4fc4b6be64b9a496c852af'/>
<id>822d9b61eb5d4a48fa4fc4b6be64b9a496c852af</id>
<content type='text'>
commit 0e91ec0c06d2cd15071a6021c94840a50e6671aa upstream.

The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.

Reviewed-by: Nicolas Pitre &lt;nicolas.pitre@linaro.org&gt;
Tested-by: Stephen Warren &lt;swarren@nvidia.com&gt;
Signed-off-by: James Jones &lt;jajones@nvidia.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0e91ec0c06d2cd15071a6021c94840a50e6671aa upstream.

The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.

Reviewed-by: Nicolas Pitre &lt;nicolas.pitre@linaro.org&gt;
Tested-by: Stephen Warren &lt;swarren@nvidia.com&gt;
Signed-off-by: James Jones &lt;jajones@nvidia.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-rmk' of git://linux-arm.org/linux-2.6</title>
<updated>2009-09-19T12:47:57+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@arm.linux.org.uk</email>
</author>
<published>2009-09-19T12:47:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=40d743b8c16a8cf6e30c1d941aa6147f9550ea75'/>
<id>40d743b8c16a8cf6e30c1d941aa6147f9550ea75</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 5701/1: ARM: copy_page.S: take into account the size of the cache line</title>
<updated>2009-09-15T21:07:02+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill@shutemov.name</email>
</author>
<published>2009-09-15T09:26:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dca230f00d737353e2dffae489c916b41971921f'/>
<id>dca230f00d737353e2dffae489c916b41971921f</id>
<content type='text'>
Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;
 #include &lt;unistd.h&gt;

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i &lt; NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j &lt; BUF_SIZE; j+= 4096)
                                 buf[j] = (j &amp; 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka &lt;siarhei.siamashka@nokia.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;
 #include &lt;unistd.h&gt;

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i &lt; NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j &lt; BUF_SIZE; j+= 4096)
                                 buf[j] = (j &amp; 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka &lt;siarhei.siamashka@nokia.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Nicolas Pitre has a new email address</title>
<updated>2009-09-15T16:37:12+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@fluxnic.net</email>
</author>
<published>2009-09-14T07:25:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2f82af08fcc7dc01a7e98a49a5995a77e32a2925'/>
<id>2f82af08fcc7dc01a7e98a49a5995a77e32a2925</id>
<content type='text'>
Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-rmk-2.6.32' of git://git.pengutronix.de/git/ukl/linux-2.6 into devel-stable</title>
<updated>2009-08-15T15:51:48+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk@dyn-67.arm.linux.org.uk</email>
</author>
<published>2009-08-15T15:51:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9b2616c2e8cc98ca98bbb40cad83a8d3d859e840'/>
<id>9b2616c2e8cc98ca98bbb40cad83a8d3d859e840</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Complete irq tracing support for ARM</title>
<updated>2009-08-13T18:34:37+00:00</updated>
<author>
<name>Uwe Kleine-König</name>
<email>u.kleine-koenig@pengutronix.de</email>
</author>
<published>2009-08-13T18:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0d928b0b616d1c5c5fe76019a87cba171ca91633'/>
<id>0d928b0b616d1c5c5fe76019a87cba171ca91633</id>
<content type='text'>
Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Thumb-2: Implement the unified arch/arm/lib functions</title>
<updated>2009-07-24T11:32:57+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2009-07-24T11:32:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b592783a2e8b7721a99730bd549aab5208f36af'/>
<id>8b592783a2e8b7721a99730bd549aab5208f36af</id>
<content type='text'>
This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;



</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;



</pre>
</div>
</content>
</entry>
<entry>
<title>Thumb-2: Add some .align statements to the .S files</title>
<updated>2009-07-24T11:32:52+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2009-07-24T11:32:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=88987ef91b99cf99bc5d167caeb31d4958fbf931'/>
<id>88987ef91b99cf99bc5d167caeb31d4958fbf931</id>
<content type='text'>
Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'copy_user' of git://git.marvell.com/orion into devel</title>
<updated>2009-06-14T09:59:32+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk@dyn-67.arm.linux.org.uk</email>
</author>
<published>2009-06-14T09:59:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=98797a241e28b787b84d308b867ec4c5fe7bbdf8'/>
<id>98797a241e28b787b84d308b867ec4c5fe7bbdf8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[ARM] alternative copy_to_user: more precise fallback threshold</title>
<updated>2009-05-30T05:10:15+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2009-05-30T01:55:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c626e3f5ca1d95ad2204d3128c26e7678714eb55'/>
<id>c626e3f5ca1d95ad2204d3128c26e7678714eb55</id>
<content type='text'>
Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
