summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-08-20x86: fix setup code crashes on my old 486 boxJoerg Roedel
commit 7b27718bdb1b70166383dec91391df5534d449ee upstream yesterday I tried to reactivate my old 486 box and wanted to install a current Linux with latest kernel on it. But it turned out that the latest kernel does not boot because the machine crashes early in the setup code. After some debugging it turned out that the problem is the query_ist() function. If this interrupt with that function is called the machine simply locks up. It looks like a BIOS bug. Looking for a workaround for this problem I wrote the attached patch. It checks for the CPUID instruction and if it is not implemented it does not call the speedstep BIOS function. As far as I know speedstep should be available since some Pentium earliest. Alan Cox observed that it's available since the Pentium II, so cpuid levels 4 and 5 can be excluded altogether. H. Peter Anvin cleaned up the code some more: > Right in concept, but I dislike the implementation (duplication of the > CPU detect code we already have). Could you try this patch and see if > it works for you? which, with a small modification to fix a build error with it the resulting kernel boots on my machine. Signed-off-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20x86: amd opteron TOM2 mask val fixYinghai Lu
commit 8004dd965b13b01a96def054d420f6df7ff22d53 upstream. there is a typo in the mask value, need to remove that extra 0, to avoid 4bit clearing. Signed-off-by: Yinghal Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: maximilian attems <max@stro.at> Cc: Peter Palfrader <weasel@debian.org> Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20KVM: task switch: translate guest segment limit to virt-extension byte ↵Marcelo Tosatti
granular field (cherry picked from commit c93cd3a58845012df2d658fecd0ac99f7008d753) If 'g' is one then limit is 4kb granular. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20KVM: Avoid instruction emulation when event delivery is pendingAvi Kivity
(cherry-picked from commit 577bdc496614ced56d999bbb425e85adf2386490) When an event (such as an interrupt) is injected, and the stack is shadowed (and therefore write protected), the guest will exit. The current code will see that the stack is shadowed and emulate a few instructions, each time postponing the injection. Eventually the injection may succeed, but at that time the guest may be unwilling to accept the interrupt (for example, the TPR may have changed). This occurs every once in a while during a Windows 2008 boot. Fix by unshadowing the fault address if the fault was due to an event injection. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20KVM: task switch: use seg regs provided by subarch instead of reading from GDTMarcelo Tosatti
(cherry-picked from commit 34198bf8426276a2ce1e97056a0f02d43637e5ae) There is no guarantee that the old TSS descriptor in the GDT contains the proper base address. This is the case for Windows installation's reboot-via-triplefault. Use guest registers instead. Also translate the address properly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20KVM: task switch: segment base is linear addressMarcelo Tosatti
(cherry picked from commit 98899aa0e0bf5de05850082be0eb837058c09ea5) The segment base is always a linear address, so translate before accessing guest memory. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06Kprobe smoke test lockdep warningPeter Zijlstra
[ Upstream commit d54191b85e294c46f05a2249b1f55ae54930bcc7 ] On Mon, 2008-04-21 at 18:54 -0400, Masami Hiramatsu wrote: > Thank you for reporting. > > Actually, kprobes tries to fixup thread's flags in post_kprobe_handler > (which is called from kprobe_exceptions_notify) by > trace_hardirqs_fixup_flags(pt_regs->flags). However, even the irq flag > is set in pt_regs->flags, true hardirq is still off until returning > from do_debug. Thus, lockdep assumes that hardirq is off without annotation. > > IMHO, one possible solution is that fixing hardirq flags right after > notify_die in do_debug instead of in post_kprobe_handler. My reply to BZ 10489: > [ 2.707509] Kprobe smoke test started > [ 2.709300] ------------[ cut here ]------------ > [ 2.709420] WARNING: at kernel/lockdep.c:2658 check_flags+0x4d/0x12c() > [ 2.709541] Modules linked in: > [ 2.709588] Pid: 1, comm: swapper Not tainted 2.6.25.jml.057 #1 > [ 2.709588] [<c0126acc>] warn_on_slowpath+0x41/0x51 > [ 2.709588] [<c010bafc>] ? save_stack_trace+0x1d/0x3b > [ 2.709588] [<c0140a83>] ? save_trace+0x37/0x89 > [ 2.709588] [<c011987d>] ? kernel_map_pages+0x103/0x11c > [ 2.709588] [<c0109803>] ? native_sched_clock+0xca/0xea > [ 2.709588] [<c0142958>] ? mark_held_locks+0x41/0x5c > [ 2.709588] [<c0382580>] ? kprobe_exceptions_notify+0x322/0x3af > [ 2.709588] [<c0142aff>] ? trace_hardirqs_on+0xf1/0x119 > [ 2.709588] [<c03825b3>] ? kprobe_exceptions_notify+0x355/0x3af > [ 2.709588] [<c0140823>] check_flags+0x4d/0x12c > [ 2.709588] [<c0143c9d>] lock_release+0x58/0x195 > [ 2.709588] [<c038347c>] ? __atomic_notifier_call_chain+0x0/0x80 > [ 2.709588] [<c03834d6>] __atomic_notifier_call_chain+0x5a/0x80 > [ 2.709588] [<c0383508>] atomic_notifier_call_chain+0xc/0xe > [ 2.709588] [<c013b6d4>] notify_die+0x2d/0x2f > [ 2.709588] [<c038168a>] do_debug+0x67/0xfe > [ 2.709588] [<c0381287>] debug_stack_correct+0x27/0x30 > [ 2.709588] [<c01564c0>] ? kprobe_target+0x1/0x34 > [ 2.709588] [<c0156572>] ? init_test_probes+0x50/0x186 > [ 2.709588] [<c04fae48>] init_kprobes+0x85/0x8c > [ 2.709588] [<c04e947b>] kernel_init+0x13d/0x298 > [ 2.709588] [<c04e933e>] ? kernel_init+0x0/0x298 > [ 2.709588] [<c04e933e>] ? kernel_init+0x0/0x298 > [ 2.709588] [<c0105ef7>] kernel_thread_helper+0x7/0x10 > [ 2.709588] ======================= > [ 2.709588] ---[ end trace 778e504de7e3b1e3 ]--- > [ 2.709588] possible reason: unannotated irqs-off. > [ 2.709588] irq event stamp: 370065 > [ 2.709588] hardirqs last enabled at (370065): [<c0382580>] kprobe_exceptions_notify+0x322/0x3af > [ 2.709588] hardirqs last disabled at (370064): [<c0381bb7>] do_int3+0x1d/0x7d > [ 2.709588] softirqs last enabled at (370050): [<c012b464>] __do_softirq+0xfa/0x100 > [ 2.709588] softirqs last disabled at (370045): [<c0107438>] do_softirq+0x74/0xd9 > [ 2.714751] Kprobe smoke test passed successfully how I love this stuff... Ok, do_debug() is a trap, this can happen at any time regardless of the machine's IRQ state. So the first thing we do is fix up the IRQ state. Then we call this die notifier stuff; and return with messed up IRQ state... YAY. So, kprobes fudges it.. notify_die(DIE_DEBUG) kprobe_exceptions_notify() post_kprobe_handler() modify regs->flags trace_hardirqs_fixup_flags(regs->flags); <--- must be it So what's the use of modifying flags if they're not meant to take effect at some point. /me tries to reproduce issue; enable kprobes test thingy && boot OK, that reproduces.. So the below makes it work - but I'm not getting this code; at the time I wrote that stuff I CC'ed each and every kprobe maintainer listed in the usual places but got no reposonse - can some please explain this stuff to me? Are the saved flags only for the TF bit or are they made in full effect later (and if so, where) ? Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06x86: io delay - add checking for NULL early paramCyrill Gorcunov
[ Upstream commit d6cd7effcc5e0047faf15ab0a54c980f1a616a07 ] Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: akpm@linux-foundation.org Cc: andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06x86: idle process - add checking for NULL early paramCyrill Gorcunov
[ Upstream commit ab6bc3e343fbe3be4a0f67225e849d0db6b4b7ac ] Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: akpm@linux-foundation.org Cc: andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01x86: fix kernel_physical_mapping_init() for large x86 systemsIngo Molnar
based on e22146e610bb7aed63282148740ab1d1b91e1d90 upstream Fix bug in kernel_physical_mapping_init() that causes kernel page table to be built incorrectly for systems with greater than 512GB of memory. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01x86: fix crash due to missing debugctlmsr on AMD K6-3Jan Kratochvil
commit d536b1f86591fb081c7a56eab04e711eb4dab951 upstream currently if you use PTRACE_SINGLEBLOCK on AMD K6-3 (i586) it will crash. Kernel now wrongly assumes existing DEBUGCTLMSR MSR register there. Removed the assumption also for some other non-K6 CPUs but I am not sure there (but it can only bring small inefficiency there if my assumption is wrong). Based on info from Roland McGrath, Chuck Ebbert and Mikulas Patocka. More info at: https://bugzilla.redhat.com/show_bug.cgi?id=456175 Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01x86, suspend, acpi: enter Big Real ModeH. Peter Anvin
Commit 3bf2e77453a87c22eb57ed4926760ac131c84459 upstream x86, suspend, acpi: enter Big Real Mode The explanation for recent video BIOS suspend quirk failures is that the VESA BIOS expects to be entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real Mode (*.limit = 0xffff). This patch changes the segment descriptors to Big Real Mode instead. The segment descriptor registers (what Intel calls "segment cache") is always active. The only thing that changes based on CR0.PE is how it is *loaded* and the interpretation of the CS flags. The segment descriptor registers contain of the following sub-registers: selector (the "visible" part), base, limit and flags. In protected mode or long mode, they are loaded from descriptors (or fs.base or gs.base can be manipulated directly in long mode.) In real mode, the only thing changed by a segment register load is the selector and the base, where the base <- selector << 4. In particular, *the limit and the flags are not changed*. As far as the handling of the CS flags: a code segment cannot be writable in protected mode, whereas it is "just another segment" in real mode, so there is some kind of quirk that kicks in for this when CR0.PE <- 0. I'm not sure if this is accomplished by actually changing the cs.flags register or just changing the interpretation; it might be something that is CPU-specific. In particular, the Transmeta CPUs had an explicit "CS is writable if you're in real mode" override, so even if you had loaded CS with an execute-only segment it'd be writable (but not readable!) on return to real mode. I'm not at all sure if that is how other CPUs behave. Signed-off-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-01x86 reboot quirks: add Dell Precision WorkStation T5400Ingo Molnar
commit fab3b58d3b242b5903f78d60d86803a8aecdf6de upstream as reported in: "reboot=bios is mandatory on Dell T5400 server." http://bugzilla.kernel.org/show_bug.cgi?id=11108 add a DMI reboot quirk. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01Patch Upstream: x86 ptrace: fix PTRACE_GETFPXREGS errorRoland McGrath
commit 45fdc3a7624a4a48185a04ae0abab5f9793d8952 upstream ptrace has always returned only -EIO for all failures to access registers. The user_regset calls are allowed to return a more meaningful variety of errors. The REGSET_XFP calls use -ENODEV for !cpu_has_fxsr hardware. Make ptrace return the traditional -EIO instead of the error code from the user_regset call. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hostsAvi Kivity
Original-Commit-Hash: c23a6fe17abf8562e675465f8d55ba1a551d314d The direct mapped shadow code (used for real mode and two dimensional paging) sets upper-level ptes using direct assignment rather than calling set_shadow_pte(). A nonpae host will split this into two writes, which opens up a race if another vcpu accesses the same memory area. Fix by calling set_shadow_pte() instead of assigning directly. Noticed by Izik Eidus. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destructionMarcelo Tosatti
Original-Commit-Hash: 3cc312f03e06a8fa39ecb4cc0189efc2bd888899 Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: x86 emulator: Fix HLT instructionMohammed Gamal
Original-Commit-Hash: bcc542267538e9ba933d08b4cd4ebd796e03a3d7 This patch fixes issue encountered with HLT instruction under FreeDOS's HIMEM XMS Driver. The HLT instruction jumped directly to the done label and skips updating the EIP value, therefore causing the guest to spin endlessly on the same instruction. The patch changes the instruction so that it writes back the updated EIP value. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: VMX: Add ept_sync_context in flush_tlbSheng Yang
Original-Commit-Hash: 73f785350b92e1a3af945340f7d10f3978193cba Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The old behavior don't sync EPT TLB with modified EPT entry, which result in inconsistent content of EPT TLB and EPT table. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be heldMarcelo Tosatti
Original-Commit-Hash: 64f6a0c041bd8fc100a0d655058bdbc31feda03c kvm_mmu_zap_page() needs slots lock held (rmap_remove->gfn_to_memslot, for example). Since kvm_lock spinlock is held in mmu_shrink(), do a non-blocking down_read_trylock(). Untested. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: SVM: fix suspend/resume supportJoerg Roedel
Original-Commit-Hash: ab6267b708bec563891294488f2e854be404bdaf On suspend the svm_hardware_disable function is called which frees all svm_data variables. On resume they are not re-allocated. This patch removes the deallocation of svm_data from the hardware_disable function to the hardware_unsetup function which is not called on suspend. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01KVM: VMX: Fix a wrong usage of vmcs_configSheng Yang
Original-Commit-Hash: 406046a9638a455876b030853862e576a4378d29 The function ept_update_paging_mode_cr0() write to CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's wrong because the variable may not consistent with the content in the CPU_BASE_VM_EXEC_CONTROL MSR. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-10Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix /dev/mem compatibility under PAT
2008-07-10arch/x86/kernel/.gitignore: Added vmlinux.lds to .gitignore file because it ↵Daniel Guilak
shouldn't be tracked. Signed-off-by: Daniel Guilak <daniel@danielguilak.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10x86: fix /dev/mem compatibility under PATVenkatesh Pallipadi
Add ioremap_default(), which gives a sane mapping without worrying about type conflicts. Use it in /dev/mem read in place of ioremap(), as with ioremap(), any mapping of the region (other than UC_MINUS) will cause a conflict and failure of /dev/mem read. Should address the vbetest failure reported at: http://bugzilla.kernel.org/show_bug.cgi?id=11057 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-07Revert "PCI: Correct last two HP entries in the bfsort whitelist"Jesse Barnes
This reverts commit a1676072558854b95336c8f7db76b0504e909a0a. It duplicates the change from 8d64c781f0c5fbfdf8016bd1634506ff2ad1376a and only one should be applied, otherwise some of the Dell quirks are lost. Thanks to Tony Camuso for catching this. Acked-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-05Merge branch 'x86/s2ram-fix' into x86/urgentIngo Molnar
2008-07-05x86 ACPI: fix resume from suspend to RAM on uniprocessor x86-64Rafael J. Wysocki
Since the trampoline code is now used for ACPI resume from suspend to RAM, the trampoline page tables have to be fixed up during boot not only on SMP systems, but also on UP systems that use the trampoline. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10923 Reported-by: Dionisus Torimens <djtm@gmx.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: pm list <linux-pm@lists.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-05x86 ACPI: normalize segment descriptor register on resumeH. Peter Anvin
Some Dell laptops enter resume with apparent garbage in the segment descriptor registers (almost certainly the result of a botched transition from protected to real mode.) The only way to clean that up is to enter protected mode ourselves and clean out the descriptor registers. This fixes resume on Dell XPS M1210 and Dell D620. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10927 Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: fix address truncation in pte mfn<->pfn conversion arch/x86/mm/init_64.c: early_memtest(): fix types x86: fix Intel Mac booting with EFI
2008-07-04xen: fix address truncation in pte mfn<->pfn conversionJeremy Fitzhardinge
When converting the page number in a pte/pmd/pud/pgd between machine and pseudo-physical addresses, the converted result was being truncated at 32-bits. This caused failures on machines with more than 4G of physical memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: "Christopher S. Aker" <caker@theshore.net> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03arch/x86/mm/init_64.c: early_memtest(): fix typesAndrew Morton
fix this warning: arch/x86/mm/init_64.c: In function 'early_memtest': arch/x86/mm/init_64.c:524: warning: passing argument 2 of 'find_e820_area_size' from incompatible pointer type Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-03x86: fix Intel Mac booting with EFIHugh Dickins
Fedora reports that mem_init()'s zap_low_mappings(), extended to SMP in 61165d7a035f6571c7576e7f51e7230157724c8d x86: fix app crashes after SMP resume causes 32-bit Intel Mac machines to reboot very early when booting with EFI. The EFI code appears to manage low mappings for itself when needed; but like many before it, confuses PSE with PAE. So it has only been mapping half the space it needed when PSE but not PAE. This remained unnoticed until we moved the SMP zap_low_mappings() before efi_enter_virtual_mode(). Presumably could have been noticed years ago if anyone ran a UP kernel on such machines? Reported-by: Peter Jones <pjones@redhat.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Peter Jones <pjones@redhat.com> Cc: Glauber Costa <gcosta@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Peter Jones <pjones@redhat.com>
2008-07-02Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix NODES_SHIFT Kconfig range
2008-07-01x86: fix NODES_SHIFT Kconfig rangeThomas Gleixner
commit 4323838215184f5a2f081e0d17b8d60731b03164 x86: change size of node ids from u8 to s16 set the range for NODES_SHIFT to 1..15. The possible range is 1..9 Fixes Bugzilla #10726 Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-30Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ptrace GET/SET FPXREGS broken x86: fix cpu hotplug crash x86: section/warning fixes x86: shift bits the right way in native_read_tscp
2008-06-30ptrace GET/SET FPXREGS brokenTAKADA Yoshihito
When I update kernel 2.6.25 from 2.6.24, gdb does not work. On 2.6.25, ptrace(PTRACE_GETFPXREGS, ...) returns ENODEV. But 2.6.24 kernel's ptrace() returns EIO. It is issue of compatibility. I attached test program as pt.c and patch for fix it. #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <errno.h> #include <sys/ptrace.h> #include <sys/types.h> struct user_fxsr_struct { unsigned short cwd; unsigned short swd; unsigned short twd; unsigned short fop; long fip; long fcs; long foo; long fos; long mxcsr; long reserved; long st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */ long xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */ long padding[56]; }; int main(void) { pid_t pid; pid = fork(); switch(pid){ case -1:/* error */ break; case 0:/* child */ child(); break; default: parent(pid); break; } return 0; } int child(void) { ptrace(PTRACE_TRACEME); kill(getpid(), SIGSTOP); sleep(10); return 0; } int parent(pid_t pid) { int ret; struct user_fxsr_struct fpxregs; ret = ptrace(PTRACE_GETFPXREGS, pid, 0, &fpxregs); if(ret < 0){ printf("%d: %s.\n", errno, strerror(errno)); } kill(pid, SIGCONT); wait(pid); return 0; } /* in the kerel, at kernel/i387.c get_fpxregs() */ Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30x86: fix cpu hotplug crashZhang, Yanmin
Vegard Nossum reported crashes during cpu hotplug tests: http://marc.info/?l=linux-kernel&m=121413950227884&w=4 In function _cpu_up, the panic happens when calling __raw_notifier_call_chain at the second time. Kernel doesn't panic when calling it at the first time. If just say because of nr_cpu_ids, that's not right. By checking the source code, I found that function do_boot_cpu is the culprit. Consider below call chain: _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu. So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED, because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns NULL. Many resources are related to cpu_possible_map, so it's better not to change it. Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26x86: section/warning fixesDaniel J Blueman
WARNING: arch/x86/mm/built-in.o(.text+0x3a1): Section mismatch in reference from the function set_pte_phys() to the function .init.text:spp_getpage() The function set_pte_phys() references the function __init spp_getpage(). This is often because set_pte_phys lacks a __init annotation or the annotation of spp_getpage is wrong. arch/x86/mm/init_64.c: In function 'early_memtest': arch/x86/mm/init_64.c:520: warning: passing argument 2 of 'find_e820_area_size' from incompatible pointer type Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Cc: "Linus Torvalds" <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24Merge branch 'kvm-updates-2.6.26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
2008-06-24x86: KVM guest: Use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann
This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Make xen use the paravirt clocksource structs and functionsGerd Hoffmann
This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Add structs and functions for paravirt clocksourceGerd Hoffmann
This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24xen: remove support for non-PAE 32-bitJeremy Fitzhardinge
Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity
Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>