linux-toradex.git/arch, branch v3.10.73

ARM: at91: pm: fix at91rm9200 standby

2015-03-26T14:01:01+00:00

commit 84e871660bebfddb9a62ebd6f19d02536e782f0a upstream.

at91rm9200 standby and suspend to ram has been broken since
00482a4078f4. It is wrongly using AT91_BASE_SYS which is a physical address
and actually doesn't correspond to any register on at91rm9200.

Use the correct at91_ramc_base[0] instead.

Fixes: 00482a4078f4 (ARM: at91: implement the standby function for pm/cpuidle)

Signed-off-by: Alexandre Belloni 
Signed-off-by: Nicolas Ferre 
Signed-off-by: Greg Kroah-Hartman

powerpc/smp: Wait until secondaries are active & online

2015-03-26T14:01:00+00:00

commit 875ebe940d77a41682c367ad799b4f39f128d3fa upstream.

Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

  BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

  CPU: 0
  Comm: watchdog/130

The problem is that we aren't ensuring the CPU active bit is set for the
secondary before allowing the master to continue on. The master unparks
the secondary CPU's kthreads and the scheduler looks for a CPU to run
on. It calls select_task_rq() and realises the suggested CPU is not in
the cpus_allowed mask. It then ends up in select_fallback_rq(), and
since the active bit isnt't set we choose some other CPU to run on.

This seems to have been introduced by 6acbfb96976f "sched: Fix hotplug
vs. set_cpus_allowed_ptr()", which changed from setting active before
online to setting active after online. However that was in turn fixing a
bug where other code assumed an active CPU was also online, so we can't
just revert that fix.

The simplest fix is just to spin waiting for both active & online to be
set. We already have a barrier prior to set_cpu_online() (which also
sets active), to ensure all other setup is completed before online &
active are set.

Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Signed-off-by: Michael Ellerman 
Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

x86/vdso: Fix the build on GCC5

2015-03-26T14:01:00+00:00

commit e893286918d2cde3a94850d8f7101cd1039e0c62 upstream.

On gcc5 the kernel does not link:

  ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.

Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.

.LSTARTFDEDLSI1 says:

        /* HACK: The dwarf2 unwind routines will subtract 1 from the
           return address to get an address in the middle of the
           presumed call instruction.  Since we didn't get here via
           a call, we need to include the nop before the real start
           to make up for it.  */
        .long .LSTART_sigreturn-1-.     /* PC-relative start address */

But commit 69d0627a7f6e ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.

Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".

So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.

Kudos for reporting and diagnosing should go to Richard.

Reported-by: Richard Biener 
Signed-off-by: Jiri Slaby 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/fpu: Drop_fpu() should not assume that tsk equals current

2015-03-26T14:01:00+00:00

commit f4c3686386393c120710dd34df2a74183ab805fd upstream.

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()->restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Borislav Petkov 
Reviewed-by: Rik van Riel 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Pekka Riikonen 
Cc: Quentin Casasnovas 
Cc: Suresh Siddha 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()

2015-03-26T14:01:00+00:00

commit a7c80ebcac3068b1c3cb27d538d29558c30010c8 upstream.

math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]

Signed-off-by: Oleg Nesterov 
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Pekka Riikonen 
Cc: Quentin Casasnovas 
Cc: Rik van Riel 
Cc: Suresh Siddha 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

crypto: aesni - fix memory usage in GCM decryption

2015-03-26T14:01:00+00:00

commit ccfe8c3f7e52ae83155cb038753f4c75b774ca8a upstream.

The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.

The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.

In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.

Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.

Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.

[1] http://www.chronox.de/libkcapi.html

CC: Tadeusz Struk 
Signed-off-by: Stephan Mueller 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman

sparc64: Fix several bugs in memmove().

2015-03-26T14:00:55+00:00

[ Upstream commit 2077cef4d5c29cf886192ec32066f783d6a80db8 ]

Firstly, handle zero length calls properly.  Believe it or not there
are a few of these happening during early boot.

Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src.  The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.

For example, considering NG4memcpy, the main unrolled loop begins like
this:

     load   src + 0x00
     load   src + 0x08
     load   src + 0x10
     load   src + 0x18
     load   src + 0x20
     store  dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call.  That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.

To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well.  We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.

Reported-by: David Ahern 
Reported-by: Bob Picco 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc: Touch NMI watchdog when walking cpus and calling printk

2015-03-26T14:00:55+00:00

[ Upstream commit 31aaa98c248da766ece922bbbe8cc78cfd0bc920 ]

With the increase in number of CPUs calls to functions that dump
output to console (e.g., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.

Signed-off-by: David Ahern 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc: perf: Make counting mode actually work

2015-03-26T14:00:55+00:00

[ Upstream commit d51291cb8f32bfae6b331e1838651f3ddefa73a5 ]

Currently perf-stat (aka, counting mode) does not work:

$ perf stat ls
...
 Performance counter stats for 'ls':

          1.585665      task-clock (msec)         #    0.580 CPUs utilized
                24      context-switches          #    0.015 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                86      page-faults               #    0.054 M/sec
         cycles
         stalled-cycles-frontend
         stalled-cycles-backend
         instructions
         branches
         branch-misses

       0.002735100 seconds time elapsed

The reason is that state is never reset (stays with PERF_HES_UPTODATE set).
Add a call to sparc_pmu_enable_event during the added_event handling.
Clean up the encoding since pmu_start calls sparc_pmu_enable_event which
does the same. Passing PERF_EF_RELOAD to sparc_pmu_start means the call
to sparc_perf_event_set_period can be removed as well.

With this patch:

$ perf stat ls
...
 Performance counter stats for 'ls':

          1.552890      task-clock (msec)         #    0.552 CPUs utilized
                24      context-switches          #    0.015 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                86      page-faults               #    0.055 M/sec
         5,748,997      cycles                    #    3.702 GHz
         stalled-cycles-frontend:HG
         stalled-cycles-backend:HG
         1,684,362      instructions:HG           #    0.29  insns per cycle
           295,133      branches:HG               #  190.054 M/sec
            28,007      branch-misses:HG          #    9.49% of all branches

       0.002815665 seconds time elapsed

Signed-off-by: David Ahern 
Acked-by: Bob Picco 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

sparc: perf: Remove redundant perf_pmu_{en|dis}able calls

2015-03-26T14:00:54+00:00

[ Upstream commit 5b0d4b5514bbcce69b516d0742f2cfc84ebd6db3 ]

perf_pmu_disable is called by core perf code before pmu->del and the
enable function is called by core perf code afterwards. No need to
call again within sparc_pmu_del.

Ditto for pmu->add and sparc_pmu_add.

Signed-off-by: David Ahern 
Acked-by: Bob Picco 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman