<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/arm/include/asm/Kbuild, branch v3.10.41</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm</title>
<updated>2012-12-12T19:30:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-12-12T19:30:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b1286f4e9ac14c8973140b338b4d3c5691264d3b'/>
<id>b1286f4e9ac14c8973140b338b4d3c5691264d3b</id>
<content type='text'>
Pull ARM updates from Russell King:
 "Here's the updates for ARM for this merge window, which cover quite a
  variety of areas.

  There's a bunch of patch series from Will tackling various bugs like
  the PROT_NONE handling, ASID allocation, cluster boot protocol and
  ASID TLB tagging updates.

  We move to a build-time sorted exception table rather than doing the
  sorting at run-time, add support for the secure computing filter, and
  some updates to the perf code.  We also have sorted out the placement
  of some headers, fixed some build warnings, fixed some hotplug
  problems with the per-cpu TWD code."

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits)
  ARM: 7594/1: Add .smp entry for REALVIEW_EB
  ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below
  ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets.
  ARM: 7595/1: syscall: rework ordering in syscall_trace_exit
  ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep
  ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch.
  ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU
  ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses
  ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH &lt; v6
  ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only
  ARM: 7587/1: implement optimized percpu variable access
  ARM: 7589/1: integrator: pass the lm resource to amba
  ARM: 7588/1: amba: create a resource parent registrator
  ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM
  ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init
  ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected
  ARM: gic: use a private mapping for CPU target interfaces
  ARM: kernel: add logical mappings look-up
  ARM: kernel: add cpu logical map DT init in setup_arch
  ARM: kernel: add device tree init map function
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ARM updates from Russell King:
 "Here's the updates for ARM for this merge window, which cover quite a
  variety of areas.

  There's a bunch of patch series from Will tackling various bugs like
  the PROT_NONE handling, ASID allocation, cluster boot protocol and
  ASID TLB tagging updates.

  We move to a build-time sorted exception table rather than doing the
  sorting at run-time, add support for the secure computing filter, and
  some updates to the perf code.  We also have sorted out the placement
  of some headers, fixed some build warnings, fixed some hotplug
  problems with the per-cpu TWD code."

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits)
  ARM: 7594/1: Add .smp entry for REALVIEW_EB
  ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below
  ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets.
  ARM: 7595/1: syscall: rework ordering in syscall_trace_exit
  ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep
  ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch.
  ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU
  ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses
  ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH &lt; v6
  ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only
  ARM: 7587/1: implement optimized percpu variable access
  ARM: 7589/1: integrator: pass the lm resource to amba
  ARM: 7588/1: amba: create a resource parent registrator
  ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM
  ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init
  ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected
  ARM: gic: use a private mapping for CPU target interfaces
  ARM: kernel: add logical mappings look-up
  ARM: kernel: add cpu logical map DT init in setup_arch
  ARM: kernel: add device tree init map function
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 7587/1: implement optimized percpu variable access</title>
<updated>2012-12-03T11:16:36+00:00</updated>
<author>
<name>Rob Herring</name>
<email>rob.herring@calxeda.com</email>
</author>
<published>2012-11-29T19:39:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=14318efb322e2fe1a034c69463d725209eb9d548'/>
<id>14318efb322e2fe1a034c69463d725209eb9d548</id>
<content type='text'>
Use the previously unused TPIDRPRW register to store percpu offsets.
TPIDRPRW is only accessible in PL1, so it can only be used in the kernel.

This replaces 2 loads with a mrc instruction for each percpu variable
access. With hackbench, the performance improvement is 1.4% on Cortex-A9
(highbank). Taking an average of 30 runs of "hackbench -l 1000" yields:

Before: 6.2191
After: 6.1348

Will Deacon reported similar delta on v6 with 11MPCore.

The asm "memory clobber" are needed here to ensure the percpu offset
gets reloaded. Testing by Will found that this would not happen in
__schedule() which is a bit of a special case as preemption is disabled
but the execution can move cores.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the previously unused TPIDRPRW register to store percpu offsets.
TPIDRPRW is only accessible in PL1, so it can only be used in the kernel.

This replaces 2 loads with a mrc instruction for each percpu variable
access. With hackbench, the performance improvement is 1.4% on Cortex-A9
(highbank). Taking an average of 30 runs of "hackbench -l 1000" yields:

Before: 6.2191
After: 6.1348

Will Deacon reported similar delta on v6 with 11MPCore.

The asm "memory clobber" are needed here to ensure the percpu offset
gets reloaded. Testing by Will found that this would not happen in
__schedule() which is a bit of a special case as preemption is disabled
but the execution can move cores.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing,x86: Add a TSC trace_clock</title>
<updated>2012-11-13T20:48:27+00:00</updated>
<author>
<name>David Sharp</name>
<email>dhsharp@google.com</email>
</author>
<published>2012-11-13T20:18:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8cbd9cc6254065c97c4bac42daa55ba1abe73a8e'/>
<id>8cbd9cc6254065c97c4bac42daa55ba1abe73a8e</id>
<content type='text'>
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com

Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@linux.intel.com&gt;
Signed-off-by: David Sharp &lt;dhsharp@google.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com

Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@linux.intel.com&gt;
Signed-off-by: David Sharp &lt;dhsharp@google.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>UAPI: (Scripted) Disintegrate arch/arm/include/asm</title>
<updated>2012-10-12T12:05:52+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2012-10-12T12:05:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cb8db5d4578ac9d996200ab59aa655344d305f5b'/>
<id>cb8db5d4578ac9d996200ab59aa655344d305f5b</id>
<content type='text'>
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Dave Jones &lt;davej@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Dave Jones &lt;davej@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 7494/1: use generic termios.h</title>
<updated>2012-08-25T08:22:31+00:00</updated>
<author>
<name>Rob Herring</name>
<email>rob.herring@calxeda.com</email>
</author>
<published>2012-08-15T15:28:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e780c452cfea10cbfae1c96dc16e506758138b6f'/>
<id>e780c452cfea10cbfae1c96dc16e506758138b6f</id>
<content type='text'>
As pointed out by Arnd Bergmann, this fixes a couple of issues but will
increase code size:

The original macro user_termio_to_kernel_termios was not endian safe. It
used an unsigned short ptr to access the low bits in a 32-bit word.

Both user_termio_to_kernel_termios and kernel_termios_to_user_termio are
missing error checking on put_user/get_user and copy_to/from_user.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As pointed out by Arnd Bergmann, this fixes a couple of issues but will
increase code size:

The original macro user_termio_to_kernel_termios was not endian safe. It
used an unsigned short ptr to access the low bits in a 32-bit word.

Both user_termio_to_kernel_termios and kernel_termios_to_user_termio are
missing error checking on put_user/get_user and copy_to/from_user.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 7493/1: use generic unaligned.h</title>
<updated>2012-08-25T08:22:30+00:00</updated>
<author>
<name>Rob Herring</name>
<email>rob.herring@calxeda.com</email>
</author>
<published>2012-08-15T15:28:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d25c881aa3aeea447d0511e391a06c56783d173c'/>
<id>d25c881aa3aeea447d0511e391a06c56783d173c</id>
<content type='text'>
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 7491/1: use generic version of identical asm headers</title>
<updated>2012-08-25T08:22:30+00:00</updated>
<author>
<name>Rob Herring</name>
<email>rob.herring@calxeda.com</email>
</author>
<published>2012-08-15T15:28:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a8052d8445a24771905d71d13f917146f9e822c'/>
<id>4a8052d8445a24771905d71d13f917146f9e822c</id>
<content type='text'>
Inspired by the AArgh64 claim that it should be separate from ARM and one
reason was being able to use more asm-generic headers. Doing a diff of
arch/arm/include/asm and include/asm-generic there are numerous asm
headers which are functionally identical to their asm-generic counterparts.
Delete the ARM version and use the generic ones.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Inspired by the AArgh64 claim that it should be separate from ARM and one
reason was being able to use more asm-generic headers. Doing a diff of
arch/arm/include/asm and include/asm-generic there are numerous asm
headers which are functionally identical to their asm-generic counterparts.
Delete the ARM version and use the generic ones.

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Tested-by: Thomas Petazzoni &lt;thomas.petazzoni@free-electrons.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 7006/1: Migrate to asm-generic wrapper support</title>
<updated>2011-10-17T08:12:40+00:00</updated>
<author>
<name>Stephen Boyd</name>
<email>sboyd@codeaurora.org</email>
</author>
<published>2011-07-26T17:45:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3f8e288033ec7f52b570efad7c2eb42741f6d710'/>
<id>3f8e288033ec7f52b570efad7c2eb42741f6d710</id>
<content type='text'>
With d8ecc5c (kbuild: asm-generic support, 2011-04-27) we can
remove a handful of asm-generic wrappers in ARM code. Since the
generic version of sizes.h doesn't contain SZ_48M, we replace
the 4 users of SZ_48M with the equivalent SZ_32M + SZ_16M.

Signed-off-by: Stephen Boyd &lt;sboyd@codeaurora.org&gt;
Cc: Imre Kaloz &lt;kaloz@openwrt.org&gt;
Acked-by: Krzysztof Halasa &lt;khc@pm.waw.pl&gt;
Cc: Eric Miao &lt;eric.y.miao@gmail.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With d8ecc5c (kbuild: asm-generic support, 2011-04-27) we can
remove a handful of asm-generic wrappers in ARM code. Since the
generic version of sizes.h doesn't contain SZ_48M, we replace
the 4 users of SZ_48M with the equivalent SZ_32M + SZ_16M.

Signed-off-by: Stephen Boyd &lt;sboyd@codeaurora.org&gt;
Cc: Imre Kaloz &lt;kaloz@openwrt.org&gt;
Acked-by: Krzysztof Halasa &lt;khc@pm.waw.pl&gt;
Cc: Eric Miao &lt;eric.y.miao@gmail.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>archs: replace unifdef-y with header-y</title>
<updated>2010-08-14T20:26:51+00:00</updated>
<author>
<name>Sam Ravnborg</name>
<email>sam@ravnborg.org</email>
</author>
<published>2010-08-14T08:20:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bf56fba6703592149e1bcd19220c60eb42dff9b7'/>
<id>bf56fba6703592149e1bcd19220c60eb42dff9b7</id>
<content type='text'>
unifdef-y and header-y have same semantic, so drop unifdef-y

Signed-off-by: Sam Ravnborg &lt;sam@ravnborg.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
unifdef-y and header-y have same semantic, so drop unifdef-y

Signed-off-by: Sam Ravnborg &lt;sam@ravnborg.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>byteorder: make swab.h include asm/swab.h like a regular header</title>
<updated>2009-01-15T03:56:50+00:00</updated>
<author>
<name>Harvey Harrison</name>
<email>harvey.harrison@gmail.com</email>
</author>
<published>2009-01-14T03:27:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=74d96f018673759d04d032c137d132f6447bfb1e'/>
<id>74d96f018673759d04d032c137d132f6447bfb1e</id>
<content type='text'>
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.

Signed-off-by: Harvey Harrison &lt;harvey.harrison@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.

Signed-off-by: Harvey Harrison &lt;harvey.harrison@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
