linux-toradex.git/arch/x86, branch v3.2.34

xen/mmu: Use Xen specific TLB flush instead of the generic one.

2012-11-16T16:47:03+00:00

commit 95a7d76897c1e7243d4137037c66d15cbf2cce76 upstream.

As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.

This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.

Oracle-bug: 14630170

Tested-by:  Jingjie Jiang 
Suggested-by:  Mukesh Rathor 
Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Ben Hutchings

x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility

2012-11-16T16:46:46+00:00

commit f6365201d8a21fb347260f89d6e9b3e718d63c70 upstream.

The X86_32-only disable_hlt/enable_hlt mechanism was used by the
32-bit floppy driver. Its effect was to replace the use of the
HLT instruction inside default_idle() with cpu_relax() - essentially
it turned off the use of HLT.

This workaround was commented in the code as:

 "disable hlt during certain critical i/o operations"

 "This halt magic was a workaround for ancient floppy DMA
  wreckage. It should be safe to remove."

H. Peter Anvin additionally adds:

 "To the best of my knowledge, no-hlt only existed because of
  flaky power distributions on 386/486 systems which were sold to
  run DOS.  Since DOS did no power management of any kind,
  including HLT, the power draw was fairly uniform; when exposed
  to the much hhigher noise levels you got when Linux used HLT
  caused some of these systems to fail.

  They were by far in the minority even back then."

Alan Cox further says:

 "Also for the Cyrix 5510 which tended to go castors up if a HLT
  occurred during a DMA cycle and on a few other boxes HLT during
  DMA tended to go astray.

  Do we care ? I doubt it. The 5510 was pretty obscure, the 5520
  fixed it, the 5530 is probably the oldest still in any kind of
  use."

So, let's finally drop this.

Signed-off-by: Len Brown 
Signed-off-by: Josh Boyer 
Signed-off-by: Andrew Morton 
Acked-by: "H. Peter Anvin" 
Acked-by: Alan Cox 
Cc: Stephen Hemminger 
Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org
[ If anyone cares then alternative instruction patching could be
  used to replace HLT with a one-byte NOP instruction. Much simpler. ]
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings

xen/x86: don't corrupt %eip when returning from a signal handler

2012-10-30T23:26:52+00:00

commit a349e23d1cf746f8bdc603dcc61fae9ee4a695f6 upstream.

In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal.  The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).

The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).

If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR).  This may cause the application to crash or have incorrect
behaviour.

handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax.  For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.

xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.

Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().

There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.

Signed-off-by: David Vrabel 
Acked-by: Jan Beulich 
Acked-by: Ian Campbell 
Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Ben Hutchings

oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()

2012-10-30T23:26:45+00:00

commit 44009105081b51417f311f4c3be0061870b6b8ed upstream.

The "event" variable is a u16 so the shift will always wrap to zero
making the line a no-op.

Signed-off-by: Dan Carpenter 
Signed-off-by: Robert Richter 
Signed-off-by: Ben Hutchings

xen/bootup: allow {read|write}_cr8 pvops call.

2012-10-30T23:26:45+00:00

commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream.

We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.

This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:

kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[]  [] save_processor_state+0x212/0x270

.. snip..
Call Trace:
 [] do_suspend_lowlevel+0xf/0xac
 [] ? x86_acpi_suspend_lowlevel+0x10c/0x150
 [] acpi_suspend_enter+0x57/0xd5

Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Ben Hutchings

xen/bootup: allow read_tscp call for Xen PV guests.

2012-10-30T23:26:44+00:00

commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream.

The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.

Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Ben Hutchings

efi: initialize efi.runtime_version to make query_variable_info/update_capsule workable

2012-10-17T02:49:56+00:00

commit d6cf86d8f23253225fe2a763d627ecf7dfee9dae upstream.

A value of efi.runtime_version is checked before calling
update_capsule()/query_variable_info() as follows.
But it isn't initialized anywhere.


static efi_status_t virt_efi_query_variable_info(u32 attr,
                                                 u64 *storage_space,
                                                 u64 *remaining_space,
                                                 u64 *max_variable_size)
{
        if (efi.runtime_version < EFI_2_00_SYSTEM_TABLE_REVISION)
                return EFI_UNSUPPORTED;


This patch initializes a value of efi.runtime_version at boot time.

Signed-off-by: Seiji Aguchi 
Acked-by: Matthew Garrett 
Signed-off-by: Matt Fleming 
Signed-off-by: Ben Hutchings

mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP

2012-10-17T02:49:45+00:00

commit 027ef6c87853b0a9df53175063028edb4950d476 upstream.

In many places !pmd_present has been converted to pmd_none.  For pmds
that's equivalent and pmd_none is quicker so using pmd_none is better.

However (unless we delete pmd_present) we should provide an accurate
pmd_present too.  This will avoid the risk of code thinking the pmd is non
present because it's under __split_huge_page_map, see the pmd_mknotpresent
there and the comment above it.

If the page has been mprotected as PROT_NONE, it would also lead to a
pmd_present false negative in the same way as the race with
split_huge_page.

Because the PSE bit stays on at all times (both during split_huge_page and
when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
but checking the PROTNONE bit too is still good to remember pmd_present
must always keep PROT_NONE into account.

This explains a not reproducible BUG_ON that was seldom reported on the
lists.

The same issue is in pmd_large, it would go wrong with both PROT_NONE and
if it races with split_huge_page.

Signed-off-by: Andrea Arcangeli 
Acked-by: Rik van Riel 
Cc: Johannes Weiner 
Cc: Hugh Dickins 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

x86/alternatives: Fix p6 nops on non-modular kernels

2012-10-10T02:31:15+00:00

commit cb09cad44f07044d9810f18f6f9a6a6f3771f979 upstream.

Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel.  If something later triggers
patching, it will overwrite kernel code with garbage.

Reported-by: Tomas Racek 
Signed-off-by: Avi Kivity 
Cc: Michael Tokarev 
Cc: Borislav Petkov 
Cc: Marcelo Tosatti 
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori 
Cc: H. Peter Anvin 
Cc: Alan Cox 
Cc: Alan Cox 
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

xen/boot: Disable NUMA for PV guests.

2012-10-10T02:31:04+00:00

commit 8d54db795dfb1049d45dc34f0dddbc5347ec5642 upstream.

The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.

In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:

"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).

This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.

This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)

We have this dump then:
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
Scanning NUMA topology in Northbridge 24
Number of physical nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000040000000
Node 1 MemBase 0000000040000000 Limit 0000000138000000
Node 2 MemBase 0000000138000000 Limit 00000001f8000000
Node 3 MemBase 00000001f8000000 Limit 0000000238000000
Initmem setup node 0 0000000000000000-0000000040000000
  NODE_DATA [000000003ffd9000 - 000000003fffffff]
Initmem setup node 1 0000000040000000-0000000138000000
  NODE_DATA [0000000137fd9000 - 0000000137ffffff]
Initmem setup node 2 0000000138000000-00000001f8000000
  NODE_DATA [00000001f095e000 - 00000001f0984fff]
Initmem setup node 3 00000001f8000000-0000000238000000
Cannot find 159744 bytes in node 3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] __alloc_bootmem_node+0x43/0x96
Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
RIP: e030:[]  [] __alloc_bootmem_node+0x43/0x96
.. snip..
  [] sparse_early_usemaps_alloc_node+0x64/0x178
  [] sparse_init+0xe4/0x25a
  [] paging_init+0x13/0x22
  [] setup_arch+0x9c6/0xa9b
  [] ? printk+0x3c/0x3e
  [] start_kernel+0xe5/0x468
  [] x86_64_start_reservations+0xba/0xc1
  [] ? xen_setup_runstate_info+0x2c/0x36
  [] xen_start_kernel+0x565/0x56c
"

so we just disable NUMA scanning by setting numa_off=1.

Reported-and-Tested-by: Andre Przywara 
Acked-by: Andre Przywara 
Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Ben Hutchings