linux-toradex.git/arch/arm64, branch v4.4.73

arm64: ensure extension of smp_store_release value

2017-06-14T11:16:27+00:00

commit 994870bead4ab19087a79492400a5478e2906196 upstream.

When an inline assembly operand's type is narrower than the register it
is allocated to, the least significant bits of the register (up to the
operand type's width) are valid, and any other bits are permitted to
contain any arbitrary value. This aligns with the AAPCS64 parameter
passing rules.

Our __smp_store_release() implementation does not account for this, and
implicitly assumes that operands have been zero-extended to the width of
the type being stored to. Thus, we may store unknown values to memory
when the value type is narrower than the pointer type (e.g. when storing
a char to a long).

This patch fixes the issue by casting the value operand to the same
width as the pointer operand in all cases, which ensures that the value
is zero-extended as we expect. We use the same union trickery as
__smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
pointers are potentially cast to narrower width integers in unreachable
paths.

A whitespace issue at the top of __smp_store_release() is also
corrected.

No changes are necessary for __smp_load_acquire(). Load instructions
implicitly clear any upper bits of the register, and the compiler will
only consider the least significant bits of the register as valid
regardless.

Fixes: 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
Fixes: 878a84d5a8a1 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
Cc:  # 3.14.x-
Acked-by: Will Deacon 
Signed-off-by: Mark Rutland 
Cc: Matthias Kaehlcke 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: armv8_deprecated: ensure extension of addr

2017-06-14T11:16:27+00:00

commit 55de49f9aa17b0b2b144dd2af587177b9aadf429 upstream.

Our compat swp emulation holds the compat user address in an unsigned
int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
in a register, the upper 32 bits of the register are unknown, and we
must extend the value to 64 bits before we can use it as a base address.

This patch casts the address to unsigned long to ensure it has been
suitably extended, avoiding the potential issue, and silencing a related
warning from clang.

Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm")
Cc:  # 3.19.x-
Acked-by: Will Deacon 
Signed-off-by: Mark Rutland 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: entry: improve data abort handling of tagged pointers

2017-06-14T11:16:27+00:00

commit 276e93279a630657fff4b086ba14c95955912dfa upstream.

This backport has a minor difference from the upstream commit: it adds
the asm-uaccess.h file, which is not present in 4.4, because 4.4 does
not have commit b4b8664d291a ("arm64: don't pull uaccess.h into *.S").

Original patch description:

When handling a data abort from EL0, we currently zero the top byte of
the faulting address, as we assume the address is a TTBR0 address, which
may contain a non-zero address tag. However, the address may be a TTBR1
address, in which case we should not zero the top byte. This patch fixes
that. The effect is that the full TTBR1 address is passed to the task's
signal handler (or printed out in the kernel log).

When handling a data abort from EL1, we leave the faulting address
intact, as we assume it's either a TTBR1 address or a TTBR0 address with
tag 0x00. This is true as far as I'm aware, we don't seem to access a
tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
forget about address tags, and code added in the future may not always
remember to remove tags from addresses before accessing them. So add tag
handling to the EL1 data abort handler as well. This also makes it
consistent with the EL0 data abort handler.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Reviewed-by: Dave Martin 
Acked-by: Will Deacon 
Signed-off-by: Kristina Martsenko 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: hw_breakpoint: fix watchpoint matching for tagged pointers

2017-06-14T11:16:27+00:00

commit 7dcd9dd8cebe9fa626af7e2358d03a37041a70fb upstream.

This backport has a few small differences from the upstream commit:
 - The address tag is removed in watchpoint_handler() instead of
   get_distance_from_watchpoint(), because 4.4 does not have commit
   fdfeff0f9e3d ("arm64: hw_breakpoint: Handle inexact watchpoint
   addresses").
 - A macro is backported (untagged_addr), as it is not present in 4.4.

Original patch description:

When we take a watchpoint exception, the address that triggered the
watchpoint is found in FAR_EL1. We compare it to the address of each
configured watchpoint to see which one was hit.

The configured watchpoint addresses are untagged, while the address in
FAR_EL1 will have an address tag if the data access was done using a
tagged address. The tag needs to be removed to compare the address to
the watchpoints.

Currently we don't remove it, and as a result can report the wrong
watchpoint as being hit (specifically, always either the highest TTBR0
watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Acked-by: Mark Rutland 
Acked-by: Will Deacon 
Signed-off-by: Kristina Martsenko 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: uaccess: ensure extension of access_ok() addr

2017-05-25T12:30:15+00:00

commit a06040d7a791a9177581dcf7293941bd92400856 upstream.

Our access_ok() simply hands its arguments over to __range_ok(), which
implicitly assummes that the addr parameter is 64 bits wide. This isn't
necessarily true for compat code, which might pass down a 32-bit address
parameter.

In these cases, we don't have a guarantee that the address has been zero
extended to 64 bits, and the upper bits of the register may contain
unknown values, potentially resulting in a suprious failure.

Avoid this by explicitly casting the addr parameter to an unsigned long
(as is done on other architectures), ensuring that the parameter is
widened appropriately.

Fixes: 0aea86a2176c ("arm64: User access library functions")
Acked-by: Will Deacon 
Signed-off-by: Mark Rutland 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: xchg: hazard against entire exchange variable

2017-05-25T12:30:15+00:00

commit fee960bed5e857eb126c4e56dd9ff85938356579 upstream.

The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
against other accesses to the memory location being exchanged. However,
the pointer passed to the constraint is a u8 pointer, and thus the
hazard only applies to the first byte of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, as demonstrated with the following test case:

union u {
	unsigned long l;
	unsigned int i[2];
};

unsigned long update_char_hazard(union u *u)
{
	unsigned int a, b;

	a = u->i[1];
	asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
	b = u->i[1];

	return a ^ b;
}

unsigned long update_long_hazard(union u *u)
{
	unsigned int a, b;

	a = u->i[1];
	asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
	b = u->i[1];

	return a ^ b;
}

The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
using -O2 or above:

0000000000000000 :
   0:	d2800001 	mov	x1, #0x0                   	// #0
   4:	f9000001 	str	x1, [x0]
   8:	d2800000 	mov	x0, #0x0                   	// #0
   c:	d65f03c0 	ret

0000000000000010 :
  10:	b9400401 	ldr	w1, [x0,#4]
  14:	d2800002 	mov	x2, #0x0                   	// #0
  18:	f9000002 	str	x2, [x0]
  1c:	b9400400 	ldr	w0, [x0,#4]
  20:	4a000020 	eor	w0, w1, w0
  24:	d65f03c0 	ret

This patch fixes the issue by passing an unsigned long pointer into the
+Q constraint, as we do for our cmpxchg code. This may hazard against
more than is necessary, but this is better than missing a necessary
hazard.

Fixes: 305d454aaa29 ("arm64: atomics: implement native {relaxed, acquire, release} atomics")
Acked-by: Will Deacon 
Signed-off-by: Mark Rutland 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses

2017-05-20T12:27:02+00:00

commit c667186f1c01ca8970c785888868b7ffd74e51ee upstream.

Our 32bit CP14/15 handling inherited some of the ARMv7 code for handling
the trapped system registers, completely missing the fact that the
fields for Rt and Rt2 are now 5 bit wide, and not 4...

Let's fix it, and provide an accessor for the most common Rt case.

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
Signed-off-by: Greg Kroah-Hartman

bpf, arm64: fix jit branch offset related to ldimm64

2017-05-14T11:32:58+00:00

[ Upstream commit ddc665a4bb4b728b4e6ecec8db1b64efa9184b9c ]

When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.

Before (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  54ffff82  b.cs 0x00000020
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

After (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  540000a2  b.cs 0x00000044
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.

Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Xi Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2

2017-03-12T05:37:28+00:00

commit 8f36ebaf21fdae99c091c67e8b6fab33969f2667 upstream.

When we fault in a page, we flush it to the PoC (Point of Coherency)
if the faulting vcpu has its own caches off, so that it can observe
the page we just brought it.

But if the vcpu has its caches on, we skip that step. Bad things
happen when *another* vcpu tries to access that page with its own
caches disabled. At that point, there is no garantee that the
data has made it to the PoC, and we access stale data.

The obvious fix is to always flush to PoC when a page is faulted
in, no matter what the state of the vcpu is.

Fixes: 2d58b733c876 ("arm64: KVM: force cache clean on page fault when caches are off")
Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
Signed-off-by: Greg Kroah-Hartman

crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes

2017-02-09T07:02:45+00:00

commit 11e3b725cfc282efe9d4a354153e99d86a16af08 upstream.

Update the ARMv8 Crypto Extensions and the plain NEON AES implementations
in CBC and CTR modes to return the next IV back to the skcipher API client.
This is necessary for chaining to work correctly.

Note that for CTR, this is only done if the request is a round multiple of
the block size, since otherwise, chaining is impossible anyway.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman