summaryrefslogtreecommitdiff
path: root/arch/x86_64/kernel
AgeCommit message (Collapse)Author
2007-07-21x86_64: cleanup of unneeded macrosGuillaume Thouvenin
Cleanup unneeded macros used for register space address calculation. Now we are using the EBDA to find the space address. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: reserve TCEs with the same address as MEM regionsMuli Ben-Yehuda
This works around a bug where DMAs that have the same addresses as some MEM regions do not go through. Not clear yet if this is due to a mis-configuration or something deeper. [akpm@linux-foundation.org: coding style fixlet] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: grab PLSSR too when a DMA error occursMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: make dump_error_regs a chip opMuli Ben-Yehuda
Provide seperate versions for Calgary and CalIOC2 Also print out the PCIe Root Complex Status on CalIOC2 errors Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: implement CalIOC2 TCE cache flush sequenceMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: add chip_ops and a quirk function for CalIOC2Muli Ben-Yehuda
[akpm@linux-foundation.org>: make calioc2_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce CalIOC2 supportMuli Ben-Yehuda
CalIOC2 is a PCI-e implementation of the Calgary logic. Most of the programming details are the same, but some differ, e.g., TCE cache flush. This patch introduces CalIOC2 support - detection and various support routines. It's not expected to work yet (but will with follow-on patches). Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: abstract how we find the iommu_table for a deviceMuli Ben-Yehuda
... in preparation for doing it differently for CalIOC2. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce chipset specific opsMuli Ben-Yehuda
Calgary and CalIOC2 share most of the same logic. Introduce struct cal_chipset_ops for quirks and tce flush logic which are [akpm@linux-foundation.org: make calgary_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce handle_quirks() for various chipset quirksMuli Ben-Yehuda
Move the aic94xx split completion timeout handling there. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: update copyright noticeMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: generalize calgary_increase_split_completion_timeoutMuli Ben-Yehuda
... will be used by CalIOC2 later Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: check remote IRR bit before migrating level triggered irqEric W. Biederman
On x86_64 kernel, level triggered irq migration gets initiated in the context of that interrupt(after executing the irq handler) and following steps are followed to do the irq migration. 1. mask IOAPIC RTE entry; // write to IOAPIC RTE 2. EOI; // processor EOI write 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and // and interrupt vector due to per cpu vector // allocation. 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE Because of the per cpu vector allocation in x86_64 kernels, when the irq migrates to a different cpu, new vector(corresponding to the new cpu) will get allocated. An EOI write to local APIC has a side effect of generating an EOI write for level trigger interrupts (normally this is a broadcast to all IOAPICs). The EOI broadcast generated as a side effect of EOI write to processor may be delayed while the other IOAPIC writes (step 3 and 4) can go through. Normally, the EOI generated by local APIC for level trigger interrupt contains vector number. The IOAPIC will take this vector number and search the IOAPIC RTE entries for an entry with matching vector number and clear the remote IRR bit (indicate EOI). However, if the vector number is changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI is received later. This will cause the remote IRR to get stuck causing the interrupt hang (no more interrupt from this RTE). Current x86_64 kernel assumes that remote IRR bit is cleared by the time IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR bit and if it still set, delay the irq migration to the next interrupt arrival event(hopefully, next time remote IRR bit will get cleared before the IOAPIC RTE is reprogrammed). Initial analysis and patch from Nanhai. Clean up patch from Suresh. Rewritten to be less intrusive, and to contain a big fat comment by Eric. [akpm@linux-foundation.org: fix comments] Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nanhai Zou <nanhai.zou@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Keith Packard <keith.packard@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: round_jiffies() for i386 and x86-64 non-critical/corrected MCE pollingVenki Pallipadi
This helps to reduce the frequency at which the CPU must be taken out of a lower-power state. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Tim Hockin <thockin@hockin.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: Make Alt-SysRq-p display the debug register contentsAlan Stern
This patch (as921) adds code to the show_regs() routine in i386 and x86_64 to print the contents of the debug registers along with all the others. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: PM_TRACE supportNigel Cunningham
Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: mcelog tolerant level cleanupTim Hockin
Background: The MCE handler has several paths that it can take, depending on various conditions of the MCE status and the value of the 'tolerant' knob. The exact semantics are not well defined and the code is a bit twisty. Description: This patch makes the MCE handler's behavior more clear by documenting the behavior for various 'tolerant' levels. It also fixes or enhances several small things in the handler. Specifically: * If RIPV is set it is not safe to restart, so set the 'no way out' flag rather than the 'kill it' flag. * Don't panic() on correctable MCEs. * If the _OVER bit is set *and* the _UC bit is set (meaning possibly dropped uncorrected errors), set the 'no way out' flag. * Use EIPV for testing whether an app can be killed (SIGBUS) rather than RIPV. According to docs, EIPV indicates that the error is related to the IP, while RIPV simply means the IP is valid to restart from. * Don't clear the MCi_STATUS registers until after the panic() path. This leaves the status bits set after the panic() so clever BIOSes can find them (and dumb BIOSes can do nothing). This patch also calls nonseekable_open() in mce_open (as suggested by akpm). Result: Tolerant levels behave almost identically to how they always have, but not it's well defined. There's a slightly higher chance of panic()ing when multiple errors happen (a good thing, IMHO). If you take an MBE and panic(), the error status bits are not cleared. Alternatives: None. Testing: I used software to inject correctable and uncorrectable errors. With tolerant = 3, the system usually survives. With tolerant = 2, the system usually panic()s (PCC) but not always. With tolerant = 1, the system always panic()s. When the system panic()s, the BIOS is able to detect that the cause of death was an MC4. I was not able to reproduce the case of a non-PCC error in userspace, with EIPV, with (tolerant < 3). That will be rare at best. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: support poll() on /dev/mcelogTim Hockin
Background: /dev/mcelog is typically polled manually. This is less than optimal for situations where accurate accounting of MCEs is important. Calling poll() on /dev/mcelog does not work. Description: This patch adds support for poll() to /dev/mcelog. This results in immediate wakeup of user apps whenever the poller finds MCEs. Because the exception handler can not take any locks, it can not call the wakeup itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is caught at the next return from interrupt or exit from idle, calling the mce_user_notify() routine. This patch also disables the "fake panic" path of the mce_panic(), because it results in printk()s in the exception handler and crashy systems. This patch also does some small cleanup for essentially unused variables, and moves the user notification into the body of the poller, so it is only called once per poll, rather than once per CPU. Result: Applications can now poll() on /dev/mcelog. When an error is logged (whether through the poller or through an exception) the applications are woken up promptly. This should not affect any previous behaviors. If no MCEs are being logged, there is no overhead. Alternatives: I considered simply supporting poll() through the poller and not using TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error happening and the user application being notified is *the*most* critical window for us. Many uncorrectable errors can be logged to the network if given a chance. I also considered doing the MCE poll directly from the idle notifier, but decided that was overkill. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that my user app is woken up in sync with the polling interval. I also used the northbridge to inject uncorrectable ECC errors, and verified (printk() to the rescue) that the notify routine is called and the user app does wake up. I built with PREEMPT on and off, and verified that my machine survives MCEs. [wli@holomorphy.com: build fix] Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: O_EXCL on /dev/mcelogTim Hockin
Background: /dev/mcelog is a clear-on-read interface. It is currently possible for multiple users to open and read() the device. Users are protected from each other during any one read, but not across reads. Description: This patch adds support for O_EXCL to /dev/mcelog. If a user opens the device with O_EXCL, no other user may open the device (EBUSY). Likewise, any user that tries to open the device with O_EXCL while another user has the device will fail (EBUSY). Result: Applications can get exclusive access to /dev/mcelog. Applications that do not care will be unchanged. Alternatives: A simpler choice would be to only allow one open() at all, regardless of O_EXCL. Testing: I wrote an application that opens /dev/mcelog with O_EXCL and observed that any other app that tried to open /dev/mcelog would fail until the exclusive app had closed the device. Caveats: None. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: extract helper function from e820_register_active_regionsDavid Rientjes
The logic in e820_find_active_regions() for determining the true active regions for an e820 entry given a range of PFN's is needed for e820_hole_size() as well. e820_hole_size() is called from the NUMA emulation code to determine the reserved area within an address range on a per-node basis. Its logic should duplicate that of finding active regions in an e820 entry because these are the only true ranges we may register anyway. [akpm@linux-foundation.org: cleanup] Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Quicklist support for x86_64Christoph Lameter
This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: remove unused variable maxcpusJan Beulich
.. and adjust documentation to properly reflect options that are x86-64 specific. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: time.c white space wreckage cleanupThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: apic.c coding style janitor workThomas Gleixner
Fix coding style, white space wreckage and remove unused code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: fiuxp pt_reqs leftoversThomas Gleixner
The hpet_rtc_interrupt handler still uses pt_regs. Fix it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Fix APIC typoThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Remove dead code and other janitor work in tsc.cThomas Gleixner
Remove unused code and variables and do some codingstyle / whitespace cleanups while at it. Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Use generic xtime initThomas Gleixner
xtime can be initialized including the cmos update from the generic timekeeping code. Remove the arch specific implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: use generic cmos updateThomas Gleixner
Use the generic cmos update function in kernel/time/ntp.c Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: hpet tsc calibration fix broken smi detection logicThomas Gleixner
The current SMI detection logic in read_hpet_tsc() makes sure, that when a SMI happens between the read of the HPET counter and the read of the TSC, this wrong value is used for TSC calibration. This is not the intention of the function. The comparison must ensure, that we do _NOT_ use such a value. Fix the check to use calibration values where delta of the two TSC reads is smaller than a reasonable threshold. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Don't use softirq safe locks in smp_call_functionAndi Kleen
It is not fully softirq safe anyways. Can't do a WARN_ON unfortunately because it could trigger in the panic case. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: Add L3 cache support to AMD CPUID4 emulationAndi Kleen
With that an L3 cache is correctly reported in the cache information in /sys With fixes from Andreas Herrmann and Dean Gaudet and Joachim Deguara Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpuAndi Kleen
This implements new vDSO for x86-64. The concept is similar to the existing vDSOs on i386 and PPC. x86-64 has had static vsyscalls before, but these are not flexible enough anymore. A vDSO is a ELF shared library supplied by the kernel that is mapped into user address space. The vDSO mapping is randomized for each process for security reasons. Doing this was needed for clock_gettime, because clock_gettime always needs a syscall fallback and having one at a fixed address would have made buffer overflow exploits too easy to write. The vdso can be disabled with vdso=0 It currently includes a new gettimeofday implemention and optimized clock_gettime(). The gettimeofday implementation is slightly faster than the one in the old vsyscall. clock_gettime is significantly faster than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME. The new calls are generally faster than the old vsyscall. Advantages over the old x86-64 vsyscalls: - Extensible - Randomized - Cleaner - Easier to virtualize (the old static address range previously causes overhead e.g. for Xen because it has to create special page tables for it) Weak points: - glibc support still to be written The VM interface is partly based on Ingo Molnar's i386 version. Includes compile fix from Joachim Deguara Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Use string instruction memcpy/memset on AMD Fam10Andi Kleen
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Don't rely on a unique IO-APIC IDAndi Kleen
Linux 64bit only uses the IO-APIC ID as an internal cookie. In the future there could be some cases where the IO-APIC IDs are not unique because they share an 8 bit space with CPUs and if there are enough CPUs it is difficult to get them that. But Linux needs the io apic ID internally for its data structures. Assign unique IO APIC ids on table parsing. TBD do for 32bit too Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: add new nmi rescanDave Jiang
Provides a way for NMI reported errors on x86 to notify the EDAC subsystem pending ECC errors by writing to a software state variable. Here's the reworked patch. I added an EDAC stub to the kernel so we can have variables that are in the kernel even if EDAC is a module. I also implemented the idea of using the chip driver to select error detection mode via module parameter and eliminate the kernel compile option. Please review/test. Thx! Also, I only made changes to some of the chipset drivers since I am unfamiliar with the other ones. We can add similar changes as we go. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19lguest: the host codeRusty Russell
This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19x86_64: Put allocated ELF notes in read-only data segmentRoland McGrath
This changes the x86_64 linker script to use the asm-generic NOTES macro so that ELF note sections with SHF_ALLOC set are linked into the kernel image along with other read-only data. The PT_NOTE also points to their location. This paves the way for putting useful build-time information into ELF notes that can be found easily later in a kernel memory dump. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19use the new percpu interface for shared dataFenghua Yu
Currently most of the per cpu data, which is accessed by different cpus, has a ____cacheline_aligned_in_smp attribute. Move all this data to the new per cpu shared data section: .data.percpu.shared_aligned. This will seperate the percpu data which is referenced frequently by other cpus from the local only percpu data. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19define new percpu interface for shared dataFenghua Yu
per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Integrate beeping flag with existing acpi_sleep flagsPavel Machek
Move "debug during resume from s2ram" into the variable we already use for real-mode flags to simplify code. It also closes nasty trap for the user in acpi_sleep_setup; order of parameters actually mattered there, acpi_sleep=s3_bios,s3_mode doing something different from acpi_sleep=s3_mode,s3_bios. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19PM: Optional beeping during resume from suspend to RAMNigel Cunningham
Add a feature allowing the user to make the system beep during a resume from suspend to RAM, on x86_64 and i386. This is useful for the users with broken resume from RAM, so that they can verify if the control reaches the kernel after a wake-up event. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18Handle bogus %cs selector in single-step instruction decodingRoland McGrath
The code for LDT segment selectors was not robust in the face of a bogus selector set in %cs via ptrace before the single-step was done. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18xen: use the hvc console infrastructure for Xen consoleJeremy Fitzhardinge
Implement a Xen back-end for hvc console. * * * Add early printk support via hvc console, enable using "earlyprintk=xen" on the kernel command line. From: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Olof Johansson <olof@lixom.net>
2007-07-18usermodehelper: Tidy up waitingJeremy Fitzhardinge
Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
2007-07-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits) KVM: Use CPU_DYING for disabling virtualization KVM: Tune hotplug/suspend IPIs KVM: Keep track of which cpus have virtualization enabled SMP: Allow smp_call_function_single() to current cpu i386: Allow smp_call_function_single() to current cpu x86_64: Allow smp_call_function_single() to current cpu HOTPLUG: Adapt thermal throttle to CPU_DYING HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING HOTPLUG: Add CPU_DYING notifier KVM: Clean up #includes KVM: Remove kvmfs in favor of the anonymous inodes source KVM: SVM: Reliably detect if SVM was disabled by BIOS KVM: VMX: Remove unnecessary code in vmx_tlb_flush() KVM: MMU: Fix Wrong tlb flush order KVM: VMX: Reinitialize the real-mode tss when entering real mode KVM: Avoid useless memory write when possible KVM: Fix x86 emulator writeback KVM: Add support for in-kernel pio handlers KVM: VMX: Fix interrupt checking on lightweight exit KVM: Adds support for in-kernel mmio handlers ...
2007-07-17x86_64: speedup touch_nmi_watchdogAndrew Morton
Avoid dirtying remote cpu's memory if it already has the correct value. Cc: Andi Kleen <ak@suse.de> Cc: Konrad Rzeszutek <konrad@darnok.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Inhibit NMI watchdog when Alt-SysRq-T operation is underwayKonrad Rzeszutek
On large memory configuration with not so fast CPUs the NMI watchdog is triggered when memory addresses are being gathered and printed. The code paths for Alt-SysRq-t are sprinkled with touch_nmi_watchdog in various places but not in this routine (or in the loop that utilizes this function). The patch has been tested for regression on large CPU+memory configuration (128 logical CPUs + 224 GB) and 1,2,4,16-CPU sockets with various memory sizes (1,2,4,6,20). Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17PTRACE_POKEDATA consolidationAlexey Dobriyan
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata() function. AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless return EPERM. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17PTRACE_PEEKDATA consolidationAlexey Dobriyan
Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata() function. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>