Age | Commit message (Collapse) | Author |
|
commit fa5ce5f94c0f2bfa41ba68d2d2524298e1fc405e upstream.
New ARM binutils don't allow extraneous whitespace inside
of brackets, which causes this error on all mach-w90x900
defconfigs:
arch/arm/kernel/entry-armv.S: Assembler messages:
arch/arm/kernel/entry-armv.S:214: Error: ARM register expected -- `ldr r0,[ r6,#(0x10C)]'
arch/arm/kernel/entry-armv.S:214: Error: ARM register expected -- `ldr r0,[ r6,#(0x110)]'
arch/arm/kernel/entry-armv.S:430: Error: ARM register expected -- `ldr r0,[ r6,#(0x10C)]'
arch/arm/kernel/entry-armv.S:430: Error: ARM register expected -- `ldr r0,[ r6,#(0x110)]'
This removes the whitespace in order to build the kernel
again.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Wan ZongShun <mcuos.com@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2815774bb38445006074e16251b9ef5123bdc616 upstream.
Recent assembler versions complain about extraneous
whitespace inside [] brackets. This fixes all of
these instances for the samsung platforms. We should
backport this to all kernels that might need to
be built with new binutils.
arch/arm/kernel/entry-armv.S: Assembler messages:
arch/arm/kernel/entry-armv.S:214: Error: ARM register expected -- `ldr r2,[ r6,#(0x10)]'
arch/arm/kernel/entry-armv.S:214: Error: ARM register expected -- `ldr r0,[ r6,#(0x14)]'
arch/arm/kernel/entry-armv.S:430: Error: ARM register expected -- `ldr r2,[ r6,#(0x10)]'
arch/arm/kernel/entry-armv.S:430: Error: ARM register expected -- `ldr r0,[ r6,#(0x14)]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S: Assembler messages:
arch/arm/mach-s3c24xx/sleep-s3c2410.S:48: Error: ARM register expected -- `ldr r7,[ r4 ]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S:49: Error: ARM register expected -- `ldr r8,[ r5 ]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S:50: Error: ARM register expected -- `ldr r9,[ r6 ]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S:64: Error: ARM register expected -- `streq r7,[ r4 ]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S:65: Error: ARM register expected -- `streq r8,[ r5 ]'
arch/arm/mach-s3c24xx/sleep-s3c2410.S:66: Error: ARM register expected -- `streq r9,[ r6 ]'
arch/arm/kernel/debug.S: Assembler messages:
arch/arm/kernel/debug.S:83: Error: ARM register expected -- `ldr r2,[ r2,#((0x0B0)+(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000))))-((0)+(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000))))]'
arch/arm/kernel/debug.S:83: Error: ARM register expected -- `ldr r2,[ r3,#(0x18)]'
arch/arm/kernel/debug.S:85: Error: ARM register expected -- `ldr r2,[ r2,#((0x0B0)+(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000))))-((0)+(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000))))]'
arch/arm/kernel/debug.S:85: Error: ARM register expected -- `ldr r2,[ r3,#(0x18)]'
arch/arm/mach-s3c24xx/pm-h1940.S: Assembler messages:
arch/arm/mach-s3c24xx/pm-h1940.S:33: Error: ARM register expected -- `ldr pc,[ r0,#((0x0B8)+(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000))))-(((0x56000000)-(0x50000000))+(0xF6000000+(0x01000000)))]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S: Assembler messages:
arch/arm/mach-s3c24xx/sleep-s3c2412.S:60: Error: ARM register expected -- `ldrne r9,[ r1 ]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S:61: Error: ARM register expected -- `strne r9,[ r1 ]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S:62: Error: ARM register expected -- `ldrne r9,[ r2 ]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S:63: Error: ARM register expected -- `strne r9,[ r2 ]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S:64: Error: ARM register expected -- `ldrne r9,[ r3 ]'
arch/arm/mach-s3c24xx/sleep-s3c2412.S:65: Error: ARM register expected -- `strne r9,[ r3 ]'
arch/arm/kernel/debug.S:83: Error: ARM register expected -- `ldr r2,[ r3,#(0x08)]'
arch/arm/kernel/debug.S:83: Error: ARM register expected -- `ldr r2,[ r3,#(0x18)]'
arch/arm/kernel/debug.S:83: Error: ARM register expected -- `ldr r2,[ r3,#(0x10)]'
arch/arm/kernel/debug.S:85: Error: ARM register expected -- `ldr r2,[ r3,#(0x08)]'
arch/arm/kernel/debug.S:85: Error: ARM register expected -- `ldr r2,[ r3,#(0x18)]'
arch/arm/kernel/debug.S:85: Error: ARM register expected -- `ldr r2,[ r3,#(0x10)]'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1de63d60cd5b0d33a812efa455d5933bf1564a51 upstream.
There was a serious problem in samsung-laptop that its platform driver is
designed to run under BIOS and running under EFI can cause the machine to
become bricked or can cause Machine Check Exceptions.
Discussion about this problem:
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
https://bugzilla.kernel.org/show_bug.cgi?id=47121
The patches to fix this problem:
efi: Make 'efi_enabled' a function to query EFI facilities
83e68189745ad931c2afd45d8ee3303929233e7f
samsung-laptop: Disable on EFI hardware
e0094244e41c4d0c7ad69920681972fc45d8ce34
Unfortunately this problem comes back again if users specify "noefi" option.
This parameter clears EFI_BOOT and that driver continues to run even if running
under EFI. Refer to the document, this parameter should clear
EFI_RUNTIME_SERVICES instead.
Documentation/kernel-parameters.txt:
===============================================================================
...
noefi [X86] Disable EFI runtime services support.
...
===============================================================================
Documentation/x86/x86_64/uefi.txt:
===============================================================================
...
- If some or all EFI runtime services don't work, you can try following
kernel command line parameters to turn off some or all EFI runtime
services.
noefi turn off all EFI runtime services
...
===============================================================================
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/511C2C04.2070108@jp.fujitsu.com
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream.
A user reported the following oops when a backup process reads
/proc/kcore:
BUG: unable to handle kernel paging request at ffffbb00ff33b000
IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
[...]
Call Trace:
[<ffffffff811b8aaa>] read_kcore+0x17a/0x370
[<ffffffff811ad847>] proc_reg_read+0x77/0xc0
[<ffffffff81151687>] vfs_read+0xc7/0x130
[<ffffffff811517f3>] sys_read+0x53/0xa0
[<ffffffff81449692>] system_call_fastpath+0x16/0x1b
Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.
The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD. If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.
This patch adds the necessary pud_large() check.
Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 32068f6527b8f1822a30671dedaf59c567325026 upstream.
Enable hyperv_clocksource only if its advertised as a feature.
XenServer 6 returns the signature which is checked in
ms_hyperv_platform(), but it does not offer all features. Currently the
clocksource is enabled unconditionally in ms_hyperv_init_platform(), and
the result is a hanging guest.
Hyper-V spec Bit 1 indicates the availability of Partition Reference
Counter. Register the clocksource only if this bit is set.
The guest in question prints this in dmesg:
[ 0.000000] Hypervisor detected: Microsoft HyperV
[ 0.000000] HyperV: features 0x70, hints 0x0
This bug can be reproduced easily be setting 'viridian=1' in a HVM domU
.cfg file. A workaround without this patch is to boot the HVM guest with
'clocksource=jiffies'.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Link: http://lkml.kernel.org/r/1359940959-32168-1-git-send-email-kys@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit cb214ede7657db458fd0b2a25ea0b28dbf900ebc upstream.
When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.
The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.
This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.
The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.
Current code handle x2apic when BIOS pass with xapic mode
enabled:
When user specifies x2apic_phys, or FADT indicates PHYSICAL:
1. During madt oem check, apic driver is set with xapic logical
or xapic phys driver at first.
2. enable_IR_x2apic() will enable x2apic_mode.
3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
will install the correct x2apic phys driver and use x2apic phys mode.
Otherwise it will skip the driver will let x2apic_cluster_probe to
take over to install x2apic cluster driver (wrong one) even though FADT
indicates PHYSICAL, because x2apic_phys_probe does not check
FADT PHYSICAL.
Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.
Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ea0dcf903e7d76aa5d483d876215fedcfdfe140f upstream.
Provide systems that do not support x2apic cluster mode
a mechanism to select x2apic physical mode using the
FADT FORCE_APIC_PHYSICAL_DESTINATION_MODE bit.
Changes from v1: (based on Suresh's comments)
- removed #ifdef CONFIG_ACPI
- removed #include <linux/acpi.h>
Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1335313436-32020-1-git-send-email-greg.pearson@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e575a86fdc50d013bf3ad3aa81d9100e8e6cc60d upstream.
Without this patch, it is trivial to determine kernel page
mappings by examining the error code reported to dmesg[1].
Instead, declare the entire kernel memory space as a violation
of a present page.
Additionally, since show_unhandled_signals is enabled by
default, switch branch hinting to the more realistic
expectation, and unobfuscate the setting of the PF_PROT bit to
improve readability.
[1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/
Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit f03574f2d5b2d6229dcdf2d322848065f72953c7 upstream.
This code was an optimization for 32-bit NUMA systems.
It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger. Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator. If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal. But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.
For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.
We don't need to hang on to performance optimizations for
these old boxes any more. All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.
This code is causing real things to break today:
https://lkml.org/lkml/2013/1/9/376
I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().
[ hpa: Cc: this for -stable, since it is a memory corruption issue.
However, an alternative is to simply mark NUMA as depends BROKEN
rather than EXPERIMENTAL in the X86_32 subclause... ]
Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: For 3.2, using the suggested alternative]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream.
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.
If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.
This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
[bwh: Backported to 3.2 as done in 3.0 by Jiri Slaby]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
|
|
commit d107a204154ddd79339203c2deeb7433f0cf6777 upstream.
The Chip Select Configuration Register must be programmed to 0x2 in
order to achieve the correct behavior of the Static Memory Controller.
Without this patch devices wired to DFI and accessed through SMC cannot
be accessed after resume from S2.
Do not rely on the boot loader to program the CSMSADRCFG register by
programming it in the kernel smemc module.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 7139bc1579901b53db7e898789e916ee2fb52d78 upstream.
This patch goes a long way toward fixing the minifail bug, and
it significantly improves the stability of SMP machines such as
the rp3440. When write protecting a page for COW, we need to
purge the existing translation. Otherwise, the COW break
doesn't occur as expected because the TLB may still have a stale entry
which allows writes.
[jejb: fix up checkpatch errors]
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6a040ce72598159a74969a2d01ab0ba5ee6536b3 upstream.
The DDW code uses a eeh_dev struct from the pci_dev. However, this is
not set until eeh_add_device_late is called.
Since pci_bus_add_devices is called before eeh_add_device_late, the PCI
devices are added to the bus, making drivers' probe hooks to be called.
These will call set_dma_mask, which will call the DDW code, which will
require the eeh_dev struct from pci_dev. This would result in a crash,
due to a NULL dereference.
Calling eeh_add_device_late after pci_bus_add_devices would make the
system BUG, because device files shouldn't be added to devices there
were not added to the system. So, a new function is needed to add such
files only after pci_bus_add_devices have been called.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.
This fixes CVE-2013-0228 / XSA-42
Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:
-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G D ---------------
2.6.32-356.el6.i686 #1
Call Trace:
[<c08476df>] ? panic+0x6e/0x122
[<c084b63c>] ? oops_end+0xbc/0xd0
[<c084b260>] ? do_general_protection+0x0/0x210
[<c084a9b7>] ? error_code+0x73/
-------------
Petr says: "
I've analysed the bug and I think that xen_iret() cannot cope with
mangled DS, in this case zeroed out (null selector/descriptor) by either
xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
entry was invalidated by the reproducer. "
Jan took a look at the preliminary patch and came up a fix that solves
this problem:
"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."
The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.
Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment. In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.
Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68 upstream.
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.
set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.
As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.
Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().
While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 95cf00fa5d5e2a200a2c044c84bde8389a237e02 upstream.
Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
step.c was always very wrong.
1. update_debugctlmsr() was simply unneeded. The child sleeps
TASK_TRACED, __switch_to_xtra(next_p => child) should notice
TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
needed.
2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
should always match the state of current's TIF_BLOCKSTEP bit.
3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
register or the caller can be preempted in between.
4. It is not safe to play with TIF_BLOCKSTEP if task != current.
DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
other if the task is running. The tracee is stopped but it
can be SIGKILL'ed right before set/clear_tsk_thread_flag().
However, now that uprobes uses user_enable_single_step(current)
we can't simply remove update_debugctlmsr(). So this patch adds
the additional "task == current" check and disables irqs to avoid
the race with interrupts/preemption.
Unfortunately this patch doesn't solve the last problem, we need
another fix. Probably we should teach ptrace_stop() to set/clear
single/block stepping after resume.
And afaics there is yet another problem: perf can play with
MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
__switch_to_xtra() has problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 848e8f5f0ad3169560c516fff6471be65f76e69f upstream.
No functional changes, preparation for the next fix and for uprobes
single-step fixes.
Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
new helper, set_task_blockstep().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 40a1ef95da85843696fc3ebe5fce39b0db32669f upstream.
For some reason they didn't get replaced so far by their
paravirt equivalents, resulting in code to be run with
interrupts disabled that doesn't expect so (causing, in the
observed case, a BUG_ON() to trigger) when syscall auditing is
enabled.
David (Cc-ed) came up with an identical fix, so likely this can
be taken to count as an ack from him.
Reported-by: Peter Moody <pmoody@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Tested-by: Peter Moody <pmoody@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 712ba9e9afc4b3d3d6fa81565ca36fe518915c01 upstream.
efi.runtime_version is erroneously being set to the value of the
vendor's firmware revision instead of that of the implemented EFI
specification. We can't deduce which EFI functions are available based
on the revision of the vendor's firmware since the version scheme is
likely to be unique to each vendor.
What we really need to know is the revision of the implemented EFI
specification, which is available in the EFI System Table header.
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c94082656dac74257f63e91f78d5d458ac781fa5 upstream.
The traps are referred to by their numbers and it can be difficult to
understand them while reading the code without context. This patch adds
enumeration of the trap numbers and replaces the numbers with the correct
enum for x86.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20120310000710.GA32667@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cherry-picked-for: v2.3.37
Signed-off-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e43b3cec711a61edf047adf6204d542f3a659ef8 upstream.
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ab3cd8670e0b3fcde7f029e1503ed3c5138e9571 upstream.
Mark static arrays as __initconst so they get removed when the init
sections are flushed.
Reported-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit a9acc5365dbda29f7be2884efb63771dc24bd815 upstream.
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table. So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.
Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.
[ hpa: made a number of stylistic changes, marked arrays as static
const, and made less verbose; use "memblock=debug" for full
verbosity. ]
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 83e68189745ad931c2afd45d8ee3303929233e7f upstream.
Originally 'efi_enabled' indicated whether a kernel was booted from
EFI firmware. Over time its semantics have changed, and it now
indicates whether or not we are booted on an EFI machine with
bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
The immediate motivation for this patch is the bug report at,
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
which details how running a platform driver on an EFI machine that is
designed to run under BIOS can cause the machine to become
bricked. Also, the following report,
https://bugzilla.kernel.org/show_bug.cgi?id=47121
details how running said driver can also cause Machine Check
Exceptions. Drivers need a new means of detecting whether they're
running on an EFI machine, as sadly the expression,
if (!efi_enabled)
hasn't been a sufficient condition for quite some time.
Users actually want to query 'efi_enabled' for different reasons -
what they really want access to is the list of available EFI
facilities.
For instance, the x86 reboot code needs to know whether it can invoke
the ResetSystem() function provided by the EFI runtime services, while
the ACPI OSL code wants to know whether the EFI config tables were
mapped successfully. There are also checks in some of the platform
driver code to simply see if they're running on an EFI machine (which
would make it a bad idea to do BIOS-y things).
This patch is a prereq for the samsung-laptop fix patch.
Cc: David Airlie <airlied@linux.ie>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Peter Jones <pjones@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve Langasek <steve.langasek@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2:
- Adjust context (a lot)
- Add efi_is_native() function from commit 5189c2a7c776
('x86: efi: Turn off efi_enabled after setup on mixed fw/kernel')
- Make efi_init() bail out when booted non-native, as it would previously
not be called in this case
- Drop inapplicable changes to start_kernel()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream.
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.
In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.
Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit b8f2c21db390273c3eaf0e5308faeaeb1e233840 upstream.
Update efi_call_phys_prelog to install an identity mapping of all available
memory. This corrects a bug on very large systems with more then 512 GB in
which bios would not be able to access addresses above not in the mapping.
The result is a crash that looks much like this.
BUG: unable to handle kernel paging request at 000000effd870020
IP: [<0000000078bce331>] 0x78bce330
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper/0 Tainted: G W 3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
RIP: 0010:[<0000000078bce331>] [<0000000078bce331>] 0x78bce330
RSP: 0000:ffffffff81601d28 EFLAGS: 00010006
RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
FS: 0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
Stack:
0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
Call Trace:
[<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
[<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
[<ffffffff81035946>] ? efi_call4+0x46/0x80
[<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
[<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
[<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
[<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
[<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
[<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
Code: Bad RIP value.
RIP [<0000000078bce331>] 0x78bce330
RSP <ffffffff81601d28>
CR2: 000000effd870020
---[ end trace ead828934fef5eab ]---
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 15653371c67c3fbe359ae37b720639dd4c7b42c5 upstream.
Subhash Jadavani reported this partial backtrace:
Now consider this call stack from MMC block driver (this is on the ARMv7
based board):
[<c001b50c>] (v7_dma_inv_range+0x30/0x48) from [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
[<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
[<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from [<c0017ff8>] (dma_map_sg+0x3c/0x114)
This is caused by incrementing the struct page pointer, and running off
the end of the sparsemem page array. Fix this by incrementing by pfn
instead, and convert the pfn to a struct page.
Suggested-by: James Bottomley <JBottomley@Parallels.com>
Tested-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d3286144c92ec876da9e30320afa875699b7e0f1 upstream.
Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
incoherent MMIO, just do nothing and move on.
Reported-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Tested-by: Ben Collins <ben.c@servergy.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 9174adbee4a9a49d0139f5d71969852b36720809 upstream.
This fixes CVE-2013-0190 / XSA-40
There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path. This can result in the kernel crashing.
In the classic kernel case, the relevant code looked a little like:
popl %eax # Error code from hypervisor
jz 5f
addl $16,%esp
jmp iret_exc # Hypervisor said iret fault
5: addl $16,%esp
# Hypervisor said segment selector fault
Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.
In the PVOPS case, the code looks like:
popl_cfi %eax # Error from the hypervisor
lea 16(%esp),%esp # Add $16 before choosing fault path
CFI_ADJUST_CFA_OFFSET -16
jz 5f
addl $16,%esp # Incorrectly adjust %esp again
jmp iret_exc
It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present. At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.
This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6f16f4998f98e42e3f2dedf663cfb691ff0324af upstream.
We currently use a temporary 1MB section aligned to a 1MB boundary for
mapping the provided device tree until the final page table is created.
However, if the device tree happens to cross that 1MB boundary, the end
of it remains unmapped and the kernel crashes when it attempts to access
it. Given no restriction on the location of that DTB, it could end up
with only a few bytes mapped at the end of a section.
Solve this issue by mapping two consecutive sections.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2:
- Adjust context
- The mapping is not conditional; drop the 'ne' suffixes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 568dca15aa2a0f4ddee255894ec393a159f13147 upstream.
Patrik Kluba reports that the preempt count becomes invalid due
to the preempt_enable() call being unbalanced with a
preempt_disable() call in the vfp assembly routines. This happens
because preempt_enable() and preempt_disable() update preempt
counts under PREEMPT_COUNT=y but the vfp assembly routines do so
under PREEMPT=y. In a configuration where PREEMPT=n and
DEBUG_ATOMIC_SLEEP=y, PREEMPT_COUNT=y and so the preempt_enable()
call in VFP_bounce() keeps subtracting from the preempt count
until it goes negative.
Fix this by always using PREEMPT_COUNT to decided when to update
preempt counts in the ARM assembly code.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Patrik Kluba <pkluba@dension.com>
Tested-by: Patrik Kluba <pkluba@dension.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ed4f20943cd4c7b55105c04daedf8d63ab6d499c upstream.
Converting a 64 Bit TOD format value to nanoseconds means that the value
must be divided by 4.096. In order to achieve that we multiply with 125
and divide by 512.
When used within sched_clock() this triggers an overflow after appr.
417 days. Resulting in a sched_clock() return value that is much smaller
than previously and therefore may cause all sort of weird things in
subsystems that rely on a monotonic sched_clock() behaviour.
To fix this implement a tod_to_ns() helper function which converts TOD
values without overflow and call this function from both places that
open coded the conversion: sched_clock() and kvm_s390_handle_wait().
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 4a71997a3279a339e7336ea5d0cd27282e2dea44 upstream.
Ensure that the aux table is properly initialized, even when optional features
are missing. Without this, the FDPIC loader did not work. This was meant to
be included in commit d5ab780305bb6d60a7b5a74f18cf84eb6ad153b1.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3b4bc7bccc7857274705b05cf81a0c72cfd0b0dd upstream.
This patch fixes some code that implements a work-around to a hardware bug in
the ac97 controller on the pxa27x. A bug in the controller's warm reset
functionality requires that the mfp used by the controller as the AC97_nRESET
line be temporarily reconfigured as a generic output gpio (AF0) and manually
held high for the duration of the warm reset cycle. This is what was done in
the original code, but it was broken long ago by commit fb1bf8cd
([ARM] pxa: introduce processor specific pxa27x_assert_ac97reset())
which changed the mfp to a GPIO input instead of a high output.
The fix requires the ac97 controller to obtain the gpio via gpio_request_one(),
with arguments that configure the gpio as an output initially driven high.
Tested on a palm treo 680 machine. Reportedly, this broken code only prevents a
warm reset on hardware that lacks a pull-up on the line, which appears to be the
case for me.
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 6cb9c3697585c47977c42c5cc1b9fc49247ac530 ]
Modifying the huge pte's requires that all the underlying pte's be
modified.
Version 2: added missing flush_tlb_page()
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e6449c9b2d90c1bd9a5985bf05ddebfd1631cd6b upstream.
The missing NULL terminator can cause a panic on
PPC405 boards during boot:
Linux/PowerPC load: console=ttyS0,115200 root=/dev/mtdblock1 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit
Finalizing device tree... flat tree at 0x6a5160
bootconsole [udbg0] enabled
Page fault in user mode with in_atomic() = 1 mm = (null)
NIP = c0275f50 MSR = fffffffe
Oops: Weird page fault, sig: 11 [#1]
PowerPC 40x Platform
Modules linked in:
NIP: c0275f50 LR: c0275f60 CTR: c0280000
REGS: c0275eb0 TRAP: 636f7265 Not tainted (3.7.1)
MSR: fffffffe <VEC,VSX,EE,PR,FP,ME,SE,BE,IR,DR,PMM,RI> CR: c06a6190 XER: 00000001
TASK = c02662a8[0] 'swapper' THREAD: c0274000
GPR00: c0275ec0 c000c658 c027c4bf 00000000 c0275ee0 c000a0ec c020a1a8 c020a1f0
GPR08: c020f631 c020f404 c025f078 c025f080 c0275f10
Call Trace:
---[ end trace 31fd0ba7d8756001 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
The panic happens since commit 9597abe00c1bab2aedce6b49866bf6d1e81c9eed
(sections: fix section conflicts in arch/powerpc), however the root
cause of this is that the NULL terminator were not added in commit
a4f740cf33f7f6c164bbde3c0cdbcc77b0c4997c (of/flattree: Add of_flat_dt_match()
helper function).
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ce73ec6db47af84d1466402781ae0872a9e7873c upstream.
The locking in update_vsyscall_tz() is not only unnecessary because the vdso
code copies the data unproteced in __kernel_gettimeofday() but also
introduces a hard to reproduce race condition between update_vsyscall()
and update_vsyscall_tz(), which causes user space process to loop
forever in vdso code.
The following patch removes the locking from update_vsyscall_tz().
Locking is not only unnecessary because the vdso code copies the data
unprotected in __kernel_gettimeofday() but also erroneous because updating
the tb_update_count is not atomic and introduces a hard to reproduce race
condition between update_vsyscall() and update_vsyscall_tz(), which further
causes user space process to loop forever in vdso code.
The below scenario describes the race condition,
x==0 Boot CPU other CPU
proc_P: x==0
timer interrupt
update_vsyscall
x==1 x++;sync settimeofday
update_vsyscall_tz
x==2 x++;sync
x==3 sync;x++
sync;x++
proc_P: x==3 (loops until x becomes even)
Because the ++ operator would be implemented as three instructions and not
atomic on powerpc.
A similar change was made for x86 in commit 6c260d58634
("x86: vdso: Remove bogus locking in update_vsyscall_tz")
Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 7bf9b7bef881aac820bf1f2e9951a17b09bd7e04 upstream.
find_vma() is *not* safe when somebody else is removing vmas. Not just
the return value might get bogus just as you are getting it (this instance
doesn't try to dereference the resulting vma), the search itself can get
buggered in rather spectacular ways. IOW, ->mmap_sem really, really is
not optional here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c24bf9b4cc6a0f330ea355d73bfdf1dae7e63a05 upstream.
The inb/outb macros for CRIS are broken from a number of points of view,
missing () around parameters and they have an unprotected if statement
in them. This was breaking the compile of IPMI on CRIS and thus I was
being annoyed by build regressions, so I fixed them.
Plus I don't think they would have worked at all, since the data values
were missing "&" and the outsl had a "3" instead of a "4" for the size.
From what I can tell, this stuff is not used at all, so this can't be
any more broken than it was before, anyway.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 8add1ecb81f541ef2fcb0b85a5470ad9ecfb4a84 upstream.
When poweroff machine, kernel_power_off() call disable_nonboot_cpus().
And if we have HOTPLUG_CPU configured, disable_nonboot_cpus() is not an
empty function but attempt to actually disable the nonboot cpus. Since
system state is SYSTEM_POWER_OFF, play_dead() won't be called and thus
disable_nonboot_cpus() hangs. Therefore, we make this patch to avoid
poweroff failure.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Hongliang Tao <taohl@lemote.com>
Signed-off-by: Hua Yan <yanh@lemote.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4211/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 11ee7e99f35ecb15f59b21da6a82d96d2cd3fcc8 upstream.
If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
the kernel fails when we run at a non zero offset. It turns out
we were incorrectly wrapping some of the relocatable kernel code
with CONFIG_CRASH_DUMP.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 864aa04cd02979c2c755cb28b5f4fe56039171c0 upstream.
When updating the page protection map after calculating the user_pgprot
value, the base protection map is temporarily stored in an unsigned long
type, causing truncation of the protection bits when LPAE is enabled.
This effectively means that calls to mprotect() will corrupt the upper
page attributes, clearing the XN bit unconditionally.
This patch uses pteval_t to store the intermediate protection values,
preserving the upper bits for 64-bit descriptors.
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6acf5a8c931da9d26c8dd77d784daaf07fa2bff0 upstream.
HPET_TN_FSB is not a proper mask bit; it merely toggles between MSI and
legacy interrupt delivery. The proper mask bit is HPET_TN_ENABLE, so
use both bits when (un)masking the interrupt.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5093E09002000078000A60E6@nat28.tlf.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e43a028752fed049e4bd94ef895542f96d79fa74 upstream.
When remembering the direction of a DCR transaction, we should write
to the same variable that we interpret on later when doing vcpu_run
again.
Signed-off-by: Alexander Graf <agraf@suse.de>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 87cac8f879a5ecd7109dbe688087e8810b3364eb upstream.
Newer kernels (linux-next with the transparent huge page patches)
use rrbm if the feature is announced via feature bit 66.
RRBM will cause intercepts, so KVM does not handle it right now,
causing an illegal instruction in the guest.
The easy solution is to disable the feature bit for the guest.
This fixes bugs like:
Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
illegal operation: 0001 [#1] SMP
Modules linked in: virtio_balloon virtio_net ipv6 autofs4
CPU: 0 Not tainted 3.5.4 #1
Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
00000000003cc000 0000000000000004 0000000000000000 0000000079800000
0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
Krnl Code:>0000000000124c2a: b9810025 ogr %r2,%r5
0000000000124c2e: 41343000 la %r3,0(%r4,%r3)
0000000000124c32: a716fffa brct %r1,124c26
0000000000124c36: b9010022 lngr %r2,%r2
0000000000124c3a: e3d0f0800004 lg %r13,128(%r15)
0000000000124c40: eb22003f000c srlg %r2,%r2,63
[ 2150.713198] Call Trace:
[ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c)
[ 2150.713749] [<0000000000233812>] page_referenced+0x32a/0x410
[...]
CC: Alex Graf <agraf@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
Fix wii_memory_fixups() the following compile error on 3.0.y tree with
wii_defconfig on 3.0.y tree.
CC arch/powerpc/platforms/embedded6xx/wii.o
arch/powerpc/platforms/embedded6xx/wii.c: In function ‘wii_memory_fixups’:
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
cc1: all warnings being treated as errors
make[2]: *** [arch/powerpc/platforms/embedded6xx/wii.o] Error 1
make[1]: *** [arch/powerpc/platforms/embedded6xx] Error 2
make: *** [arch/powerpc/platforms] Error 2
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
CONFIG_VFPv3 set
commit 39141ddfb63a664f26d3f42f64ee386e879b492c upstream.
After commit 846a136881b8f73c1f74250bf6acfaa309cab1f2 ("ARM: vfp: fix
saving d16-d31 vfp registers on v6+ kernels"), the OMAP 2430SDP board
started crashing during boot with omap2plus_defconfig:
[ 3.875122] mmcblk0: mmc0:e624 SD04G 3.69 GiB
[ 3.915954] mmcblk0: p1
[ 4.086639] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[ 4.093719] Modules linked in:
[ 4.096954] CPU: 0 Not tainted (3.6.0-02232-g759e00b #570)
[ 4.103149] PC is at vfp_reload_hw+0x1c/0x44
[ 4.107666] LR is at __und_usr_fault_32+0x0/0x8
It turns out that the context save/restore fix unmasked a latent bug
in commit 5aaf254409f8d58229107b59507a8235b715a960 ("ARM: 6203/1: Make
VFPv3 usable on ARMv6"). When CONFIG_VFPv3 is set, but the kernel is
booted on a pre-VFPv3 core, the code attempts to save and restore the
d16-d31 VFP registers. These are only present on non-D16 VFPv3+, so
this results in an undefined instruction exception. The code didn't
crash before commit 846a136 because the save and restore code was
only touching d0-d15, present on all VFP.
Fix by implementing a request from Russell King to add a new HWCAP
flag that affirmatively indicates the presence of the d16-d31
registers:
http://marc.info/?l=linux-arm-kernel&m=135013547905283&w=2
and some feedback from Måns to clarify the name of the HWCAP flag.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@linaro.org>
Cc: Måns Rullgård <mans.rullgard@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream.
On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.
invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
[<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
[<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70
QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.
Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[bwh: Backported to 3.2: both functions are in arch/x86/kvm/x86.c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 441a179dafc0f99fc8b3a8268eef66958621082e upstream.
int sys32_rt_sigprocmask(int how, compat_sigset_t __user *set, compat_sigset_t __user *oset,
unsigned int sigsetsize)
{
sigset_t old_set, new_set;
int ret;
if (set && get_sigset32(set, &new_set, sigsetsize))
...
static int
get_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz)
{
compat_sigset_t s;
int r;
if (sz != sizeof *set) panic("put_sigset32()");
In other words, rt_sigprocmask(69, (void *)69, 69) done by 32bit process
will promptly panic the box.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1dc831bf53fddcc6443f74a39e72db5bcea4f15d upstream.
- The code relies on rc_pci_fixup being called, which only happens
when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting
this causes a booting failure with a non-obvious cause.
- Update rc_pci_fixup to set the class properly, copying the
more modern style from other places
- Correct the rc_pci_fixup comment
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|