linux-toradex.git/arch/x86, branch v3.14.37

x86/vdso: Fix the build on GCC5

2015-03-26T14:06:58+00:00

commit e893286918d2cde3a94850d8f7101cd1039e0c62 upstream.

On gcc5 the kernel does not link:

  ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.

Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.

.LSTARTFDEDLSI1 says:

        /* HACK: The dwarf2 unwind routines will subtract 1 from the
           return address to get an address in the middle of the
           presumed call instruction.  Since we didn't get here via
           a call, we need to include the nop before the real start
           to make up for it.  */
        .long .LSTART_sigreturn-1-.     /* PC-relative start address */

But commit 69d0627a7f6e ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.

Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".

So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.

Kudos for reporting and diagnosing should go to Richard.

Reported-by: Richard Biener 
Signed-off-by: Jiri Slaby 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/fpu: Drop_fpu() should not assume that tsk equals current

2015-03-26T14:06:58+00:00

commit f4c3686386393c120710dd34df2a74183ab805fd upstream.

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()->restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Borislav Petkov 
Reviewed-by: Rik van Riel 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Pekka Riikonen 
Cc: Quentin Casasnovas 
Cc: Suresh Siddha 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()

2015-03-26T14:06:58+00:00

commit a7c80ebcac3068b1c3cb27d538d29558c30010c8 upstream.

math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]

Signed-off-by: Oleg Nesterov 
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Pekka Riikonen 
Cc: Quentin Casasnovas 
Cc: Rik van Riel 
Cc: Suresh Siddha 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

crypto: aesni - fix memory usage in GCM decryption

2015-03-26T14:06:57+00:00

commit ccfe8c3f7e52ae83155cb038753f4c75b774ca8a upstream.

The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.

The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.

In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.

Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.

Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.

[1] http://www.chronox.de/libkcapi.html

CC: Tadeusz Struk 
Signed-off-by: Stephan Mueller 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman

KVM: emulate: fix CMPXCHG8B on 32-bit hosts

2015-03-18T12:31:28+00:00

commit 4ff6f8e61eb7f96d3ca535c6d240f863ccd6fb7d upstream.

This has been broken for a long time: it broke first in 2.6.35, then was
almost fixed in 2.6.36 but this one-liner slipped through the cracks.
The bug shows up as an infinite loop in Windows 7 (and newer) boot on
32-bit hosts without EPT.

Windows uses CMPXCHG8B to write to page tables, which causes a
page fault if running without EPT; the emulator is then called from
kvm_mmu_page_fault.  The loop then happens if the higher 4 bytes are
not 0; the common case for this is that the NX bit (bit 63) is 1.

Fixes: 6550e1f165f384f3a46b60a1be9aba4bc3c2adad
Fixes: 16518d5ada690643453eb0aef3cc7841d3623c2d
Reported-by: Erik Rull 
Tested-by: Erik Rull 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization

2015-03-18T12:31:25+00:00

commit 956421fbb74c3a6261903f3836c0740187cf038b upstream.

'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.

Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86, mm/ASLR: Fix stack randomization on 64-bit systems

2015-03-06T22:43:32+00:00

commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream.

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

  static unsigned long randomize_stack_top(unsigned long stack_top)
  {
           unsigned int random_variable = 0;

           if ((current->flags & PF_RANDOMIZE) &&
                   !(current->personality & ADDR_NO_RANDOMIZE)) {
                   random_variable = get_random_int() & STACK_RND_MASK;
                   random_variable <<= PAGE_SHIFT;
           }
           return PAGE_ALIGN(stack_top) + random_variable;
           return PAGE_ALIGN(stack_top) - random_variable;
  }

Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):

	  random_variable <<= PAGE_SHIFT;

then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.

These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).

This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().

The successful fix can be tested with:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
  7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
  7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
  7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
  7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
  ...

Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.

Signed-off-by: Hector Marco-Gisbert 
Signed-off-by: Ismael Ripoll 
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Al Viro 
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
Signed-off-by: Borislav Petkov 
Signed-off-by: Greg Kroah-Hartman

KVM: x86: update masterclock values on TSC writes

2015-03-06T22:43:31+00:00

commit 7f187922ddf6b67f2999a76dcb71663097b75497 upstream.

When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.

Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

mm/hugetlb: pmd_huge() returns true for non-present hugepage

2015-03-06T22:43:27+00:00

commit cbef8478bee55775ac312a574aad48af7bb9cf9f upstream.

Migrating hugepages and hwpoisoned hugepages are considered as non-present
hugepages, and they are referenced via migration entries and hwpoison
entries in their page table slots.

This behavior causes race condition because pmd_huge() doesn't tell
non-huge pages from migrating/hwpoisoned hugepages.  follow_page_mask() is
one example where the kernel would call follow_page_pte() for such
hugepage while this function is supposed to handle only normal pages.

To avoid this, this patch makes pmd_huge() return true when pmd_none() is
true *and* pmd_present() is false.  We don't have to worry about mixing up
non-present pmd entry with normal pmd (pointing to leaf level pte entry)
because pmd_present() is true in normal pmd.

The same race condition could happen in (x86-specific) gup_pmd_range(),
where this patch simply adds pmd_present() check instead of pmd_huge().
This is because gup_pmd_range() is fast path.  If we have non-present
hugepage in this function, we will go into gup_huge_pmd(), then return 0
at flag mask check, and finally fall back to the slow path.

Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi 
Cc: Hugh Dickins 
Cc: James Hogan 
Cc: David Rientjes 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: Andrea Arcangeli 
Cc: Luiz Capitulino 
Cc: Nishanth Aravamudan 
Cc: Lee Schermerhorn 
Cc: Steve Capper 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

x86,kvm,vmx: Preserve CR4 across VM entry

2015-02-11T06:54:49+00:00

commit d974baa398f34393db76be45f7d4d04fbdbb4a0a upstream.

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry.  Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.

Signed-off-by: Andy Lutomirski 
Acked-by: Paolo Bonzini 
Cc: Petr Matousek 
Cc: Gleb Natapov 
Signed-off-by: Linus Torvalds 
[wangkai: Backport to 3.10: adjust context]
Signed-off-by: Wang Kai 
Signed-off-by: Greg Kroah-Hartman