summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-09-20x86-64, compat: Retruncate rax after ia32 syscall entry tracingRoland McGrath
commit eefdca043e8391dcd719711716492063030b55ac upstream. In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a 32-bit tracee in system call entry. A %rax value set via ptrace at the entry tracing stop gets used whole as a 32-bit syscall number, while we only check the low 32 bits for validity. Fix it by truncating %rax back to 32 bits after syscall_trace_enter, in addition to testing the full 64 bits as has already been added. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20x86-64, compat: Test %rax for the syscall number, not %eaxH. Peter Anvin
commit 36d001c70d8a0144ac1d038f6876c484849a74de upstream. On 64 bits, we always, by necessity, jump through the system call table via %rax. For 32-bit system calls, in theory the system call number is stored in %eax, and the code was testing %eax for a valid system call number. At one point we loaded the stored value back from the stack to enforce zero-extension, but that was removed in checkin d4d67150165df8bf1cc05e532f6efca96f907cab. An actual 32-bit process will not be able to introduce a non-zero-extended number, but it can happen via ptrace. Instead of re-introducing the zero-extension, test what we are actually going to use, i.e. %rax. This only adds a handful of REX prefixes to the code. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-20x86: don't send SIGBUS for kernel page faultsGreg Kroah-Hartman
Based on commit 96054569190bdec375fe824e48ca1f4e3b53dd36 upstream, authored by Linus Torvalds. This is my backport to the .27 kernel tree, hopefully preserving the same functionality. Original commit message: It's wrong for several reasons, but the most direct one is that the fault may be for the stack accesses to set up a previous SIGBUS. When we have a kernel exception, the kernel exception handler does all the fixups, not some user-level signal handler. Even apart from the nested SIGBUS issue, it's also wrong to give out kernel fault addresses in the signal handler info block, or to send a SIGBUS when a system call already returns EFAULT. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13xen: drop xen_sched_clock in favour of using plain wallclock timeJeremy Fitzhardinge
commit 8a22b9996b001c88f2bfb54c6de6a05fc39e177a upstream. xen_sched_clock only counts unstolen time. In principle this should be useful to the Linux scheduler so that it knows how much time a process actually consumed. But in practice this doesn't work very well as the scheduler expects the sched_clock time to be synchronized between cpus. It also uses sched_clock to measure the time a task spends sleeping, in which case "unstolen time" isn't meaningful. So just use plain xen_clocksource_read to return wallclock nanoseconds for sched_clock. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-06.gitignore updatesAlexey Dobriyan
commit c17dad6905fc82d8f523399e5c3f014e81d61df6 upstream. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02x86, Calgary: Limit the max PHB number to 256Darrick J. Wong
commit d596043d71ff0d7b3d0bead19b1d68c55f003093 upstream. The x3950 family can have as many as 256 PCI buses in a single system, so change the limits to the maximum. Since there can only be 256 PCI buses in one domain, we no longer need the BUG_ON check. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> LKML-Reference: <20100701004519.GQ15515@tux1.beaverton.ibm.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02x86, Calgary: Increase max PHB numberDarrick J. Wong
commit 499a00e92dd9a75395081f595e681629eb1eebad upstream. Newer systems (x3950M2) can have 48 PHBs per chassis and 8 chassis, so bump the limits up and provide an explanation of the requirements for each class. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Corinna Schultz <cschultz@linux.vnet.ibm.com> LKML-Reference: <20100624212647.GI15515@tux1.beaverton.ibm.com> [ v2: Fixed build bug, added back PHBS_PER_CALGARY == 4 ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26clockevent: Prevent dead lock on clockevents_lockSuresh Siddha
This is a merge of two mainline commits, intended for stable@kernel.org submission for 2.6.27 kernel. commit f833bab87fca5c3ce13778421b1365845843b976 and commit 918aae42aa9b611a3663b16ae849fdedc67c2292 Changelog of both: Currently clockevents_notify() is called with interrupts enabled at some places and interrupts disabled at some other places. This results in a deadlock in this scenario. cpu A holds clockevents_lock in clockevents_notify() with irqs enabled cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled cpu C doing set_mtrr() which will try to rendezvous of all the cpus. This will result in C and A come to the rendezvous point and waiting for B. B is stuck forever waiting for the spinlock and thus not reaching the rendezvous point. Fix the clockevents code so that clockevents_lock is taken with interrupts disabled and thus avoid the above deadlock. Also call lapic_timer_propagate_broadcast() on the destination cpu so that we avoid calling smp_call_function() in the clockevents notifier chain. This issue left us wondering if we need to change the MTRR rendezvous logic to use stop machine logic (instead of smp_call_function) or add a check in spinlock debug code to see if there are other spinlocks which gets taken under both interrupts enabled/disabled conditions. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Brown Len" <len.brown@intel.com> Cc: stable@kernel.org LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> I got following warning on ia64 box: In function 'acpi_processor_power_verify': 642: warning: passing argument 2 of 'smp_call_function_single' from incompatible pointer type This smp_call_function_single() was introduced by a commit f833bab87fca5c3ce13778421b1365845843b976: The problem is that the lapic_timer_propagate_broadcast() has 2 versions: One is real code that modified in the above commit, and the other is NOP code that used when !ARCH_APICTIMER_STOPS_ON_C3: static void lapic_timer_propagate_broadcast(struct acpi_processor *pr) { } So I got warning because of !ARCH_APICTIMER_STOPS_ON_C3. We really want to do nothing here on !ARCH_APICTIMER_STOPS_ON_C3, so modify lapic_timer_propagate_broadcast() of real version to use smp_call_function_single() in it. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01x86, ia32_aout: do not kill argument mappingJiri Slaby
commit 318f6b228ba88a394ef560efc1bfe028ad5ae6b6 upstream. Do not set current->mm->mmap to NULL in 32-bit emulation on 64-bit load_aout_binary after flush_old_exec as it would destroy already set brpm mapping with arguments. Introduced by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba mm: variable length argument support where the argument mapping in bprm was added. [ hpa: this is a regression from 2.6.22... time to kill a.out? ] Signed-off-by: Jiri Slaby <jslaby@suse.cz> LKML-Reference: <1265831716-7668-1-git-send-email-jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ollie Wild <aaw@google.com> Cc: x86@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01KVM: x86: check for cr3 validity in ioctl_set_sregsMarcelo Tosatti
commit 59839dfff5eabca01cc4e20b45797a60a80af8cb upstream. Matt T. Yourst notes that kvm_arch_vcpu_ioctl_set_sregs lacks validity checking for the new cr3 value: "Userspace callers of KVM_SET_SREGS can pass a bogus value of cr3 to the kernel. This will trigger a NULL pointer access in gfn_to_rmap() when userspace next tries to call KVM_RUN on the affected VCPU and kvm attempts to activate the new non-existent page table root. This happens since kvm only validates that cr3 points to a valid guest physical memory page when code *inside* the guest sets cr3. However, kvm currently trusts the userspace caller (e.g. QEMU) on the host machine to always supply a valid page table root, rather than properly validating it along with the rest of the reloaded guest state." http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2687641&group_id=180599 Check for a valid cr3 address in kvm_arch_vcpu_ioctl_set_sregs, triple fault in case of failure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01KVM: VMX: Check cpl before emulating debug register accessAvi Kivity
commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream. Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01KVM: x86 emulator: limit instructions to 15 bytesAvi Kivity
commit eb3c79e64a70fb8f7473e30fa07e89c1ecc2c9bb upstream [ <cebbert@redhat.com>: backport to 2.6.27 ] While we are never normally passed an instruction that exceeds 15 bytes, smp games can cause us to attempt to interpret one, which will cause large latencies in non-preempt hosts. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06Revert: KVM: MMU: do not free active mmu pages in free_mmu_pages()Greg Kroah-Hartman
This reverts the commit d2127c8300fb1ec54af56faee17170e7a525326d, which was the commit f00be0cae4e6ad0a8c7be381c6d9be3586800b3e upstream. This was done based on comments saying it was causing problems. Cc: Gleb Natapov <gleb@redhat.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06x86/ptrace: make genregs[32]_get/set more robustLinus Torvalds
commit 04a1e62c2cec820501f93526ad1e46073b802dc4 upstream. The loop condition is fragile: we compare an unsigned value to zero, and then decrement it by something larger than one in the loop. All the callers should be passing in appropriately aligned buffer lengths, but it's better to just not rely on it, and have some appropriate defensive loop limits. Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18x86: GART: pci-gart_64.c: Use correct length in strncmpJoe Perches
commit 41855b77547fa18d90ed6a5d322983d3fdab1959 upstream. Signed-off-by: Joe Perches <joe@perches.com> LKML-Reference: <1257818330.12852.72.camel@Joe-Laptop.home> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18x86: Fix iommu=nodac parameter handlingTejun Heo
commit 2ae8bb75db1f3de422eb5898f2a063c46c36dba8 upstream. iommu=nodac should forbid dac instead of enabling it. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Matteo Frigo <athena@fftw.org> LKML-Reference: <4AE5B52A.4050408@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the ↵Darrick J. Wong
PCI tree commit 4528752f49c1f4025473d12bc5fa9181085c3f22 upstream. On a multi-node x3950M2 system, there's a slight oddity in the PCI device tree for all secondary nodes: 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) ...as compared to the primary node: 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) In both nodes, the LSI RAID controller hangs off a CalIOC2 device, but on the secondary nodes, the BIOS hides the VGA device and substitutes the device tree ending with the disk controller. It would seem that Calgary devices don't necessarily appear at the top of the PCI tree, which means that the current code to find the Calgary IOMMU that goes with a particular device is buggy. Rather than walk all the way to the top of the PCI device tree and try to match bus number with Calgary descriptor, the code needs to examine each parent of the particular device; if it encounters a Calgary with a matching bus number, simply use that. Otherwise, we BUG() when the bus number of the Calgary doesn't match the bus number of whatever's at the top of the device tree. Extra note: This patch appears to work correctly for the x3950 that came before the x3950 M2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Corinna Schultz <coschult@us.ibm.com> LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18x86: ASUS P4S800 reboot=bios quirkLeann Ogasawara
commit 4832ddda2ec4df96ea1eed334ae2dbd65fc1f541 upstream. Bug reporter noted their system with an ASUS P4S800 motherboard would hang when rebooting unless reboot=b was specified. Their dmidecode didn't contain descriptive System Information for Manufacturer or Product Name, so I used their Base Board Information to create a reboot quirk patch. The bug reporter confirmed this patch resolves the reboot hang. Handle 0x0001, DMI type 1, 25 bytes System Information Manufacturer: System Manufacturer Product Name: System Name Version: System Version Serial Number: SYS-1234567890 UUID: E0BFCD8B-7948-D911-A953-E486B4EEB67F Wake-up Type: Power Switch Handle 0x0002, DMI type 2, 8 bytes Base Board Information Manufacturer: ASUSTeK Computer INC. Product Name: P4S800 Version: REV 1.xx Serial Number: xxxxxxxxxxx BugLink: http://bugs.launchpad.net/bugs/366682 ASUS P4S800 will hang when rebooting unless reboot=b is specified. Add a quirk to reboot through the bios. Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> LKML-Reference: <1259972107.4629.275.camel@emiko> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18x86, apic: Enable lapic nmi watchdog on AMD Family 11hMikael Pettersson
commit 7d1849aff6687a135a8da3a75e32a00e3137a5e2 upstream. The x86 lapic nmi watchdog does not recognize AMD Family 11h, resulting in: NMI watchdog: CPU not supported As far as I can see from available documentation (the BKDM), family 11h looks identical to family 10h as far as the PMU is concerned. Extending the check to accept family 11h results in: Testing NMI watchdog ... OK. I've been running with this change on a Turion X2 Ultra ZM-82 laptop for a couple of weeks now without problems. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-08Enable ACPI PDC handshake for VIA/Centaur CPUsHarald Welte
commit d77b81974521c82fa6fda38dfff1b491dcc62a32 upstream. In commit 0de51088e6a82bc8413d3ca9e28bbca2788b5b53, we introduced the use of acpi-cpufreq on VIA/Centaur CPU's by removing a vendor check for VENDOR_INTEL. However, as it turns out, at least the Nano CPU's also need the PDC (processor driver capabilities) handshake in order to activate the methods required for acpi-cpufreq. Since arch_acpi_processor_init_pdc() contains another vendor check for Intel, the PDC is not initialized on VIA CPU's. The resulting behavior of a current mainline kernel on such systems is: acpi-cpufreq loads and it indicates CPU frequency changes. However, the CPU stays at a single frequency This trivial patch ensures that init_intel_pdc() is called on Intel and VIA/Centaur CPU's alike. Signed-off-by: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09x86/amd-iommu: Workaround for erratum 63Joerg Roedel
commit c5cca146aa03e1f60fb179df65f0dbaf17bc64ed upstream. There is an erratum for IOMMU hardware which documents undefined behavior when forwarding SMI requests from peripherals and the DTE of that peripheral has a sysmgt value of 01b. This problem caused weird IO_PAGE_FAULTS in my case. This patch implements the suggested workaround for that erratum into the AMD IOMMU driver. The erratum is documented with number 63. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09x86/amd-iommu: Un__init function required on shutdownJoerg Roedel
commit ca0207114f1708b563f510b7781a360ec5b98359 upstream. The function iommu_feature_disable is required on system shutdown to disable the IOMMU but it is marked as __init. This may result in a panic if the memory is reused. This patch fixes this bug. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID (CVE-2009-3638)Avi Kivity
commit 6a54435560efdab1a08f429a954df4d6c740bddf upstream. The number of entries is multiplied by the entry size, which can overflow on 32-bit hosts. Bound the entry count instead. Reported-by: David Wagner <daw@cs.berkeley.edu> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09x86-64: Fix register leak in 32-bit syscall audtingJan Beulich
commit 81766741fe1eee3884219e8daaf03f466f2ed52f upstream. Restoring %ebp after the call to audit_syscall_exit() is not only unnecessary (because the register didn't get clobbered), but in the sysenter case wasn't even doing the right thing: It loaded %ebp from a location below the top of stack (RBP < ARGOFFSET), i.e. arbitrary kernel data got passed back to user mode in the register. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Roland McGrath <roland@redhat.com> LKML-Reference: <4AE5CC4D020000780001BD13@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: x86: Disallow hypercalls for guest callers in rings > 0 [CVE-2009-3290]Jan Kiszka
[ backport to 2.6.27 by Chuck Ebbert <cebbert@redhat.com> ] commit 07708c4af1346ab1521b26a202f438366b7bcffd upstream. So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12x86: Increase MIN_GAP to include randomized stackMichal Hocko
[ trivial backport to 2.6.27: Chuck Ebbert <cebbert@redhat.com> ] commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream. Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12x86: Don't leak 64-bit kernel register values to 32-bit processesJan Beulich
commit 24e35800cdc4350fc34e2bed37b608a9e13ab3b6 upstream x86: Don't leak 64-bit kernel register values to 32-bit processes While 32-bit processes can't directly access R8...R15, they can gain access to these registers by temporarily switching themselves into 64-bit mode. Therefore, registers not preserved anyway by called C functions (i.e. R8...R11) must be cleared prior to returning to user mode. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4AC34D73020000780001744A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12x86-64: slightly stream-line 32-bit syscall entry codeJan Beulich
commit 295286a89107c353b9677bc604361c537fd6a1c0 upstream x86-64: slightly stream-line 32-bit syscall entry code [ required for following patch to apply properly ] Avoid updating registers or memory twice as well as needlessly loading or copying registers. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lockMarcelo Tosatti
(cherry picked from commit 7c8a83b75a38a807d37f5a4398eca2a42c8cf513) kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with the protection of mmu_lock. Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting against kvm_handle_hva. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: x86: check for cr3 validity in mmu_alloc_rootsMarcelo Tosatti
(cherry picked from commit 8986ecc0ef58c96eec48d8502c048f3ab67fd8e2) Verify the cr3 address stored in vcpu->arch.cr3 points to an existant memslot. If not, inject a triple fault. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: do not free active mmu pages in free_mmu_pages()Gleb Natapov
(cherry picked from commit f00be0cae4e6ad0a8c7be381c6d9be3586800b3e) free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Fix PDPTR reloading on CR4 writesAvi Kivity
(cherry picked from commit a2edf57f510cce6a389cc14e58c6ad0a4296d6f9) The processor is documented to reload the PDPTRs while in PAE mode if any of the CR4 bits PSE, PGE, or PAE change. Linux relies on this behaviour when zapping the low mappings of PAE kernels during boot. The code already handled changes to CR4.PAE; augment it to also notice changes to PSE and PGE. This triggered while booting an F11 PAE kernel; the futex initialization code runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem ended up uninitialized, killing PI futexes and pulseaudio which uses them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Make paravirt tlb flush also reload the PAE PDPTRsAvi Kivity
(cherry picked from commit a8cd0244e9cebcf9b358d24c7e7410062f3665cb) The paravirt tlb flush may be used not only to flush TLBs, but also to reload the four page-directory-pointer-table entries, as it is used as a replacement for reloading CR3. Change the code to do the entire CR3 reloading dance instead of simply flushing the TLB. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: VMX: Handle vmx instruction vmexitsAvi Kivity
(cherry picked from commit e3c7cb6ad7191e92ba89d00a7ae5f5dd1ca0c214) IF a guest tries to use vmx instructions, inject a #UD to let it know the instruction is not implemented, rather than crashing. This prevents guest userspace from crashing the guest kernel. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Make EFER reads safe when EFER does not existAvi Kivity
(cherry picked from commit e286e86e6d2042d67d09244aa0e05ffef75c9d54) Some processors don't have EFER; don't oops if userspace wants us to read EFER when we check NX. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: SVM: Remove port 80 passthroughAvi Kivity
(cherry picked from commit 99f85a28a78e96d28907fe036e1671a218fee597) KVM optimizes guest port 80 accesses by passthing them through to the host. Some AMD machines die on port 80 writes, allowing the guest to hard-lock the host. Remove the port passthrough to avoid the problem. Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Tested-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: VMX: Don't allow uninhibited access to EFER on i386Avi Kivity
(cherry picked from commit 16175a796d061833aacfbd9672235f2d2725df65) vmx_set_msr() does not allow i386 guests to touch EFER, but they can still do so through the default: label in the switch. If they set EFER_LME, they can oops the host. Fix by having EFER access through the normal channel (which will check for EFER_LME) even on i386. Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: VMX: Set IGMT bit in EPT entrySheng Yang
(cherry picked from commit 928d4bf747e9c290b690ff515d8f81e8ee226d97) There is a potential issue that, when guest using pagetable without vmexit when EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory, which would be inconsistent with host side and would cause host MCE due to inconsistent cache attribute. The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default memory type to protect host (notice that all memory mapped by KVM should be WB). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: increase per-vcpu rmap cache alloc sizeMarcelo Tosatti
(cherry picked from commit c41ef344de212bd918f7765af21b5008628c03e0) The page fault path can use two rmap_desc structures, if: - walk_addr's dirty pte update allocates one rmap_desc. - mmu_lock is dropped, sptes are zapped resulting in rmap_desc being freed. - fetch->mmu_set_spte allocates another rmap_desc. Increase to 4 for safety. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: set debug registers after "schedulable" sectionMarcelo Tosatti
(cherry picked from commit 29415c37f043d1d54dcf356601d738ff6633b72b) The vcpu thread can be preempted after the guest_debug_pre() callback, resulting in invalid debug registers on the new vcpu. Move it inside the non-preemptable section. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: add MC5_MISC msr read supportJoerg Roedel
(cherry picked from commit a89c1ad270ca7ad0eec2667bc754362ce7b142be) Currently KVM implements MC0-MC4_MISC read support. When booting Linux this results in KVM warnings in the kernel log when the guest tries to read MC5_MISC. Fix this warnings with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Reduce stack usage in kvm_pv_mmu_op()Dave Hansen
(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26) We're in a hot path. We can't use kmalloc() because it might impact performance. So, we just stick the buffer that we need into the kvm_vcpu_arch structure. This is used very often, so it is not really a waste. We also have to move the buffer structure's definition to the arch-specific x86 kvm header. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()Dave Hansen
(cherry picked from commit b772ff362ec6b821c8a5227a3355e263f917bfad) [sheng: fix KVM_GET_LAPIC using wrong size] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()Dave Hansen
(cherry picked from commit f0d662759a2465babdba1160749c446648c9d159) On my machine with gcc 3.4, kvm uses ~2k of stack in a few select functions. This is mostly because gcc fails to notice that the different case: statements could have their stack usage combined. It overflows very nicely if interrupts happen during one of these large uses. This patch uses two methods for reducing stack usage. 1. dynamically allocate large objects instead of putting on the stack. 2. Use a union{} member for all of the case variables. This tricks gcc into combining them all into a single stack allocation. (There's also a comment on this) Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: Fix setting the accessed bit on non-speculative sptesAvi Kivity
(cherry picked from commit 3201b5d9f0f7ef392886cd76dcd2c69186d9d5cd) The accessed bit was accidentally turned on in a random flag word, rather than, the spte itself, which was lucky, since it used the non-EPT compatible PT_ACCESSED_MASK. Fix by turning the bit on in the spte and changing it to use the portable accessed mask. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: Flush tlbs after clearing write permission when accessing dirty logAvi Kivity
(cherry picked from commit 171d595d3b3254b9a952af8d1f6965d2e85dcbaa) Otherwise, the cpu may allow writes to the tracked pages, and we lose some display bits or fail to migrate correctly. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()Avi Kivity
(cherry picked from commit 2245a28fe2e6fdb1bdabc4dcde1ea3a5c37e2a9e) It was generally safe due to slots_lock being held for write, but it wasn't very nice. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHAREDAvi Kivity
(cherry picked from commit acee3c04e8208c17aad1baff99baa68d71640a19) There is no reason to share internal memory slots with fork()ed instances. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: Load real mode segments correctlyAvi Kivity
(cherry picked from commit f4bbd9aaaae23007e4d79536d35a30cbbb11d407) Real mode segments to not reference the GDT or LDT; they simply compute base = selector * 16. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08KVM: VMX: Change segment dpl at reset to 3Avi Kivity
(cherry picked from commit a16b20da879430fdf245ed45461ed40ffef8db3c) This is more emulation friendly, if not 100% correct. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>