summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/tsc.c
AgeCommit message (Collapse)Author
2015-10-20perf/x86: Fix time_shift in perf_event_mmap_pageAdrian Hunter
Commit: b20112edeadf ("perf/x86: Improve accuracy of perf/sched clock") allowed the time_shift value in perf_event_mmap_page to be as much as 32. Unfortunately the documented algorithms for using time_shift have it shifting an integer, whereas to work correctly with the value 32, the type must be u64. In the case of perf tools, Intel PT decodes correctly but the timestamps that are output (for example by perf script) have lost 32-bits of granularity so they look like they are not changing at all. Fix by limiting the shift to 31 and adjusting the multiplier accordingly. Also update the documentation of perf_event_mmap_page so that new code based on it will be more future-proof. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: b20112edeadf ("perf/x86: Improve accuracy of perf/sched clock") Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18Merge branch 'perf/urgent' into perf/core, to pick up fixes before applying ↵Ingo Molnar
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-17Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - misc fixes all around the map - block non-root vm86(old) if mmap_min_addr != 0 - two small debuggability improvements - removal of obsolete paravirt op * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Fix Geode LX timekeeping in the generic x86 build x86/apic: Serialize LVTT and TSC_DEADLINE writes x86/ioapic: Force affinity setting in setup_ioapic_dest() x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method x86/ldt: Fix small LDT allocation for Xen x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text x86/cpu: Print family/model/stepping in hex x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 x86/alternatives: Make optimize_nops() interrupt safe and synced x86/mm/srat: Print non-volatile flag in SRAT x86/cpufeatures: Enable cpuid for Intel SHA extensions
2015-09-16x86/platform: Fix Geode LX timekeeping in the generic x86 buildDavid Woodhouse
In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable") bypassed verification of the TSC on Geode LX. However, this code (now in the check_system_tsc_reliable() function in arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was set. OpenWRT has recently started building its generic Geode target for Geode GX, not LX, to include support for additional platforms. This broke the timekeeping on LX-based devices, because the TSC wasn't marked as reliable: https://dev.openwrt.org/ticket/20531 By adding a runtime check on is_geode_lx(), we can also include the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus fixing the problem. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Andres Salomon <dilinger@queued.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13perf/x86: Improve accuracy of perf/sched clockAdrian Hunter
When TSC is stable perf/sched clock is based on it. However the conversion from cycles to nanoseconds is not as accurate as it could be. Because CYC2NS_SCALE_FACTOR is 10, the accuracy is +/- 1/2048 The change is to calculate the maximum shift that results in a multiplier that is still a 32-bit number. For example all frequencies over 1 GHz will have a shift of 32, making the accuracy of the conversion +/- 1/(2^33). That is achieved by using the 'clocks_calc_mult_shift()' function. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1440147918-22250-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomic updates from Ingo Molnar: "Main changes in this cycle are: - Extend atomic primitives with coherent logic op primitives (atomic_{or,and,xor}()) and deprecate the old partial APIs (atomic_{set,clear}_mask()) The old ops were incoherent with incompatible signatures across architectures and with incomplete support. Now every architecture supports the primitives consistently (by Peter Zijlstra) - Generic support for 'relaxed atomics': - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return() - atomic_read_acquire() - atomic_set_release() This came out of porting qwrlock code to arm64 (by Will Deacon) - Clean up the fragile static_key APIs that were causing repeat bugs, by introducing a new one: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. To be able to know the 'type' of the static key we encode it in the jump entry (by Peter Zijlstra) - Static key self-tests (by Jason Baron) - qrwlock optimizations (by Waiman Long) - small futex enhancements (by Davidlohr Bueso) - ... and misc other changes" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) jump_label/x86: Work around asm build bug on older/backported GCCs locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics locking/qrwlock: Implement queue_write_unlock() using smp_store_release() locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t' locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic locking/static_keys: Make verify_keys() static jump label, locking/static_keys: Update docs locking/static_keys: Provide a selftest jump_label: Provide a self-test s390/uaccess, locking/static_keys: employ static_branch_likely() x86, tsc, locking/static_keys: Employ static_branch_likely() locking/static_keys: Add selftest locking/static_keys: Add a new static_key interface locking/static_keys: Rework update logic locking/static_keys: Add static_key_{en,dis}able() helpers ...
2015-09-01Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The biggest changes in this cycle were: - Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC) primitives. (Andy Lutomirski) - Add new, comprehensible entry and exit handlers written in C. (Andy Lutomirski) - vm86 mode cleanups and fixes. (Brian Gerst) - 32-bit compat code cleanups. (Brian Gerst) The amount of simplification in low level assembly code is already palpable: arch/x86/entry/entry_32.S | 130 +---- arch/x86/entry/entry_64.S | 197 ++----- but more simplifications are planned. There's also the usual laudry mix of low level changes - see the changelog for details" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits) x86/asm: Drop repeated macro of X86_EFLAGS_AC definition x86/asm/msr: Make wrmsrl() a function x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer x86/asm: Add MONITORX/MWAITX instruction support x86/traps: Weaken context tracking entry assertions x86/asm/tsc: Add rdtscll() merge helper selftests/x86: Add syscall_nt selftest selftests/x86: Disable sigreturn_64 x86/vdso: Emit a GNU hash x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks x86/entry/32: Migrate to C exit path x86/entry/32: Remove 32-bit syscall audit optimizations x86/vm86: Rename vm86->v86flags and v86mask x86/vm86: Rename vm86->vm86_info to user_vm86 x86/vm86: Clean up vm86.h includes x86/vm86: Move the vm86 IRQ definitions to vm86.h x86/vm86: Use the normal pt_regs area for vm86 x86/vm86: Eliminate 'struct kernel_vm86_struct' x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86' x86/vm86: Move vm86 fields out of 'thread_struct' ...
2015-08-04perf/x86: Add a native_perf_sched_clock_from_tsc()Andi Kleen
PEBSv3 has a raw TSC time stamp in its memory buffer that later needs to to be converted to perf_clock. Add a native_sched_clock_from_tsc() that works the same as native_sched_clock(), but starts with an already given TSC value. Paravirt is ignored, it will just get the native clock. But there isn't a para virtualized PEBS anyway. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03x86, tsc, locking/static_keys: Employ static_branch_likely()Peter Zijlstra
Because of the static_key restrictions we had to take an unconditional jump for the most likely case, causing $I bloat. Rewrite to use the new primitives. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31Merge branch 'x86/urgent' into x86/asm, before applying dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc: Use rdtsc_ordered() in read_tsc() instead of get_cycles()Andy Lutomirski
There are two logical changes here. First, this removes a check for cpu_has_tsc. That check is unnecessary, as we don't register the TSC as a clocksource on systems that have no TSC. Second, it adds a barrier, thus preventing observable non-monotonicity. I suspect that the missing barrier was never a problem in practice because system calls themselves were heavy enough barriers to prevent user code from observing time warps due to speculation. (Without the corresponding barrier in the vDSO, however, non-monotonicity is easy to detect.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/c6ff621a053127a65b70f175443578db7a0711be.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc: Rename native_read_tsc() to rdtsc()Andy Lutomirski
Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc(). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc: Replace rdtscll() with native_read_tsc()Andy Lutomirski
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is just a wrapper around native_read_tsc(). Unwrap it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/asm/tsc: Inline native_read_tsc() and remove __native_read_tsc()Andy Lutomirski
In the following commit: cdc7957d1954 ("x86: move native_read_tsc() offline") ... native_read_tsc() was moved out of line, presumably for some now-obsolete vDSO-related reason. Undo it. The entire rdtsc, shl, or sequence is only 11 bytes, and calls via rdtscl() and similar helpers were already inlined. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/d05ffe2aaf8468ca475ebc00efad7b2fa174af19.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()Adrian Hunter
If it takes longer than 12us to read the PIT counter lsb/msb, then the error margin will never fall below 500ppm within 50ms, and Fast TSC calibration will always fail. This patch detects when that will happen and fails fast. Note the failure message is not printed in that case because: 1. it will always happen on that class of hardware 2. the absence of the message is more informative than its presence Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/556EB717.9070607@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-23x86/tsc: Change Fast TSC calibration failed from error to infoAlexandre Demers
Many users see this message when booting without knowning that it is of no importance and that TSC calibration may have succeeded by another way. As explained by Paul Bolle in http://lkml.kernel.org/r/1348488259.1436.22.camel@x61.thuisdomein "Fast TSC calibration failed" should not be considered as an error since other calibration methods are being tried afterward. At most, those send a warning if they fail (not an error). So let's change the message from error to warning. [ tglx: Make if pr_info. It's really not important at all ] Fixes: c767a54ba065 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1418106470-6906-1-git-send-email-alexandre.f.demers@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-22x86, apic: Handle a bad TSC more gracefullyAndy Lutomirski
If the TSC is unusable or disabled, then this patch fixes: - Confusion while trying to clear old APIC interrupts. - Division by zero and incorrect programming of the TSC deadline timer. This fixes boot if the CPU has a TSC deadline timer but a missing or broken TSC. The failure to boot can be observed with qemu using -cpu qemu64,-tsc,+tsc-deadline This also happens to me in nested KVM for unknown reasons. With this patch, I can boot cleanly (although without a TSC). Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-04Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and ↵Linus Torvalds
'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build/cleanup/debug updates from Ingo Molnar: "Robustify the build process with a quirk to avoid GCC reordering related bugs. Two code cleanups. Simplify entry_64.S CFI annotations, by Jan Beulich" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, build: Change code16gcc.h from a C header to an assembly header * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Simplify __HAVE_ARCH_CMPXCHG tests x86/tsc: Get rid of custom DIV_ROUND() macro * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Drop several unnecessary CFI annotations
2014-07-23clocksource: Move cycle_last validation to core codeThomas Gleixner
The only user of the cycle_last validation is the x86 TSC. In order to provide NMI safe accessor functions for clock monotonic and monotonic_raw we need to do that in the core. We can't do the TSC specific if (now < cycle_last) now = cycle_last; for the other wrapping around clocksources, but TSC has CLOCKSOURCE_MASK(64) which actually does not mask out anything so if now is less than cycle_last the subtraction will give a negative result. So we can check for that in clocksource_delta() and return 0 for that case. Implement and enable it for x86 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-02x86, tsc: Fix cpufreq lockupPeter Zijlstra
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver locked up when doing cpu hotplug. Because we called set_cyc2ns_scale() from the time_cpufreq_notifier() unconditionally, it gets called multiple times for each freq change, instead of only the once, when the tsc_khz value actually changes. Because it gets called more than once, we run out of cyc2ns data slots and stall, waiting for a free one, but because we're half way offline, there's no consumers to free slots. By placing the call inside the condition that actually changes tsc_khz we avoid superfluous calls and avoid the problem. Reported-by: Mauro <registosites@hotmail.com> Tested-by: Mauro <registosites@hotmail.com> Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: <stable@vger.kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Bin Gao <bin.gao@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-19x86/tsc: Get rid of custom DIV_ROUND() macroMichal Nazarewicz
When invoced for positive values, DIV_ROUND macro defined in arch/x86/kernel/tsc.c behaves exactly like DIV_ROUND_CLOSEST from include/linux/kernel.h file, so remove the custom macro in favour of the shared one. [ hpa: changed line breaks ] Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Link: http://lkml.kernel.org/r/1403143116-21755-1-git-send-email-mina86@mina86.com Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-04-02Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso changes from Peter Anvin: "This is the revamp of the 32-bit vdso and the associated cleanups. This adds timekeeping support to the 32-bit vdso that we already have in the 64-bit vdso. Although 32-bit x86 is legacy, it is likely to remain in the embedded space for a very long time to come. This removes the traditional COMPAT_VDSO support; the configuration variable is reused for simply removing the 32-bit vdso, which will produce correct results but obviously suffer a performance penalty. Only one beta version of glibc was affected, but that version was unfortunately included in one OpenSUSE release. This is not the end of the vdso cleanups. Stefani and Andy have agreed to continue work for the next kernel cycle; in fact Andy has already produced another set of cleanups that came too late for this cycle. An incidental, but arguably important, change is that this ensures that unused space in the VVAR page is properly zeroed. It wasn't before, and would contain whatever garbage was left in memory by BIOS or the bootloader. Since the VVAR page is accessible to user space this had the potential of information leaks" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86, vdso: Fix the symbol versions on the 32-bit vDSO x86, vdso, build: Don't rebuild 32-bit vdsos on every make x86, vdso: Actually discard the .discard sections x86, vdso: Fix size of get_unmapped_area() x86, vdso: Finish removing VDSO32_PRELINK x86, vdso: Move more vdso definitions into vdso.h x86: Load the 32-bit vdso in place, just like the 64-bit vdsos x86, vdso32: handle 32 bit vDSO larger one page x86, vdso32: Disable stack protector, adjust optimizations x86, vdso: Zero-pad the VVAR page x86, vdso: Add 32 bit VDSO time support for 64 bit kernel x86, vdso: Add 32 bit VDSO time support for 32 bit kernel x86, vdso: Patch alternatives in the 32-bit VDSO x86, vdso: Introduce VVAR marco for vdso32 x86, vdso: Cleanup __vdso_gettimeofday() x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro x86, vdso: __vdso_clock_gettime() cleanup x86, vdso: Revamp vclock_gettime.c mm: Add new func _install_special_mapping() to mmap.c x86, vdso: Make vsyscall_gtod_data handling x86 generic ...
2014-03-19cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}Viresh Kumar
Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have not been used for some time, so remove them to clean up code a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-18x86, vdso: Make vsyscall_gtod_data handling x86 genericStefani Seibold
This patch move the vsyscall_gtod_data handling out of vsyscall_64.c into an additonal file vsyscall_gtod.c to make the functionality available for x86 32 bit kernel. It also adds a new vsyscall_32.c which setup the VVAR page. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-19x86, tsc: Fallback to normal calibration if fast MSR calibration failsThomas Gleixner
If we cannot calibrate TSC via MSR based calibration try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that to the caller. This value gets then propagated further to clockevents code resulting division by zero oops like the one below: divide error: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.13.0+ #47 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000 RIP: 0010:[<ffffffff810aec14>] [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0 RSP: 0000:ffff880075507e58 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0 Stack: ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88 ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168 ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000 Call Trace: [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30 [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0 [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8 [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0 [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205 [<ffffffff8177c910>] ? rest_init+0x90/0x90 [<ffffffff8177c91e>] kernel_init+0xe/0x120 [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0 [<ffffffff8177c910>] ? rest_init+0x90/0x90 Prevent this from happening by: 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero if it fails. 2) Check this return value in native_calibrate_tsc() and in case of zero fallback to use normal non-MSR based calibration. [mw: Added subject and changelog] Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@linux.intel.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-09x86: Use preempt_disable_notrace() in cycles_2_ns()Steven Rostedt
When debug preempt is enabled, preempt_disable() can be traced by function and function graph tracing. There's a place in the function graph tracer that calls trace_clock() which eventually calls cycles_2_ns() outside of the recursion protection. When cycles_2_ns() calls preempt_disable() it gets traced and the graph tracer will go into a recursive loop causing a crash or worse, a triple fault. Simple fix is to use preempt_disable_notrace() in cycles_2_ns, which makes sense because the preempt_disable() tracing may use that code too, and it tracing it, even with recursion protection is rather pointless. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140204141315.2a968a72@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23sched/x86/tsc: Initialize multiplier to 0Peter Zijlstra
Since we keep the clock value linearly continuous on frequency change, make sure the initial multiplier is 0, such that our initial value is 0. Without this we compute the initial value at whatever the TSC has managed to reach since power-on. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Fixes: 20d1c86a57762 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: paulmck@linux.vnet.ibm.com Cc: John Stultz <john.stultz@linaro.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: dyoung@redhat.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140123094804.GP30183@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-20Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel SoC changes from Ingo Molnar: "Improved Intel SoC platform support" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs arch: x86: New MailBox support driver for Intel SOC's
2014-01-15x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCsBin Gao
On SoCs that have the calibration MSRs available, either there is no PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is driven from the same clock as the TSC, so calibration is redundant and just slows down the boot. TSC rate is caculated by this formula: <maximum core-clock to bus-clock ratio> * <maximum resolved frequency> The ratio and the resolved frequency ID can be obtained from MSR. See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5 for details. Signed-off-by: Bin Gao <bin.gao@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
2014-01-13sched/clock, x86: Avoid a runtime condition in native_sched_clock()Peter Zijlstra
Use a static_key to avoid touching tsc_disabled and a runtime condition in native_sched_clock() -- less cachelines touched is always better. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 215295 213039 (cold) local_clock: 301773 220773 216084 (warm) sched_clock: 38375 25659 25231 (warm) local_clock: 100371 27242 27601 (warm) rdtsc: 27340 24208 24203 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 237019 240055 (cold) local_clock: 396890 294819 299942 (warm) sched_clock: 38194 25609 25276 (warm) local_clock: 143452 71232 73232 (warm) rdtsc: 27345 24243 24244 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-hrz87bo37qke25bty6pnfy4b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13sched/clock, x86: Use a static_key for sched_clock_stablePeter Zijlstra
In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQsPeter Zijlstra
Use a ring-buffer like multi-version object structure which allows always having a coherent object; we use this to avoid having to disable IRQs while reading sched_clock() and avoids a problem when getting an NMI while changing the cyc2ns data. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 331312 257223 (cold) local_clock: 301773 310296 309889 (warm) sched_clock: 38375 38247 25280 (warm) local_clock: 100371 102713 85268 (warm) rdtsc: 27340 27289 24247 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 372706 301224 (cold) local_clock: 396890 399275 399870 (warm) sched_clock: 38194 38124 25630 (warm) local_clock: 143452 148698 129629 (warm) rdtsc: 27345 27365 24307 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13sched/clock, x86: Move some cyc2ns() code aroundPeter Zijlstra
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c, so move it there. There are no cycles_2_ns() users. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23perf/x86: Add ability to calculate TSC from perf sample timestampsAdrian Hunter
For modern CPUs, perf clock is directly related to TSC. TSC can be calculated from perf clock and vice versa using a simple calculation. Two of the three componenets of that calculation are already exported in struct perf_event_mmap_page. This patch exports the third. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-14x86: delete __cpuinit usage from all x86 filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-03-15x86: tsc: Add support for new S3_NONSTOP featureFeng Tang
Add support for new S3_NONSTOP feature Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-02-04Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux ↵Thomas Gleixner
into timers/core Trivial conflict in arch/x86/Kconfig Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-01-15Round the calculated scale factor in set_cyc2ns_scale()Bernd Faust
During some experiments with an external clock (in a FPGA), we saw that the TSC clock drifted approx. 2.5ms per second. This drift was caused by the current way of calculating the scale. In our case cpu_khz had a value of 3292725. This resulted in a scale value of 310. But when doing the calculation by hand it shows that the actual value is 310.9886188491, so a value of 311 would be more precise. With this change the value is rounded. Signed-off-by: Bernd Faust <berndfaust@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-10-24x86: Allow tracing of functions in arch/x86/kernel/rtc.cDavid Vrabel
Move native_read_tsc() to tsc.c to allow profiling to be re-enabled for rtc.c. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1349698050-6560-1-git-send-email-david.vrabel@citrix.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>Joe Perches
Use a more current logging style: - Bare printks should have a KERN_<LEVEL> for consistency's sake - Add pr_fmt where appropriate - Neaten some macro definitions - Convert some Ok output to OK - Use "%s: ", __func__ in pr_fmt for summit - Convert some printks to pr_<level> Message output is not identical in all cases. Signed-off-by: Joe Perches <joe@perches.com> Cc: levinsasha928@gmail.com Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop [ merged two similar patches, tidied up the changelog ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-29Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ia64: vsyscall: Add missing paranthesis alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n x86: vdso: Put declaration before code x86-64: Inline vdso clock_gettime helpers x86-64: Simplify and optimize vdso clock_gettime monotonic variants kernel-time: fix s/then/than/ spelling errors time: remove no_sync_cmos_clock time: Avoid scary backtraces when warning of > 11% adj alarmtimer: Make sure we initialize the rtctimer ntp: Fix leap-second hrtimer livelock x86, tsc: Skip refined tsc calibration on systems with reliable TSC rtc: Provide flag for rtc devices that don't support UIE ia64: vsyscall: Use seqcount instead of seqlock x86: vdso: Use seqcount instead of seqlock x86: vdso: Remove bogus locking in update_vsyscall_tz() time: Remove bogus comments time: Fix change_clocksource locking time: x86: Fix race switching from vsyscall to non-vsyscall clock
2012-03-28Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-20x86: kvmclock: abstract save/restore sched_clock_stateMarcelo Tosatti
Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-15x86, tsc: Skip refined tsc calibration on systems with reliable TSCAlok Kataria
While running the latest Linux as guest under VMware in highly over-committed situations, we have seen cases when the refined TSC algorithm fails to get a valid tsc_start value in tsc_refine_calibration_work from multiple attempts. As a result the kernel keeps on scheduling the tsc_irqwork task for later. Subsequently after several attempts when it gets a valid start value it goes through the refined calibration and either bails out or uses the new results. Given that the kernel originally read the TSC frequency from the platform, which is the best it can get, I don't think there is much value in refining it. So for systems which get the TSC frequency from the platform we should skip the refined tsc algorithm. We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is set only on VMware and for Moorestown Penwell both of which have there own TSC calibration methods. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Dirk Brandewie <dirk.brandewie@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org [jstultz: Reworked to simply not schedule the refining work, rather then scheduling the work and bombing out later] Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-13sched/x86: Fix overflow in cyc2ns_offsetSalman Qazi
When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-19Merge remote-tracking branch 'linus/master' into x86/urgentH. Peter Anvin
2012-01-17x86, tsc: Fix SMI induced variation in quick_pit_calibrate()Linus Torvalds
pit_expect_msb() returns success wrongly in the below SMI scenario: a. pit_verify_msb() has not yet seen the MSB transition. b. we are close to the MSB transition though and got a SMI immediately after returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB transition has happened somewhere during SMI execution. c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB transition, we are way off because of the SMI. And as the SMI happened between the pit_verify_msb() and before the 'tsc' is recorded in the for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and quick_pit_calibrate() will not notice this error. Depending on whether SMI disturbance happens while computing d1 or d2, we will see the TSC calibrated value smaller or bigger than the expected value. As a result, in a cluster we were seeing a variation of approximately +/- 20MHz in the calibrated values, resulting in NTP failures. [ As far as the SMI source is concerned, this is a periodic SMI that gets disabled after ACPI is enabled by the OS. But the TSC calibration happens before the ACPI is enabled. ] To address this, change pit_expect_msb() so that - the 'tsc' is the TSC in between the two reads that read the MSB change from the PIT (same as before) - the 'delta' is the difference in TSC from *before* the MSB changed to *after* the MSB changed. Now the delta is twice as big as before (it covers four PIT accesses, roughly 4us) and quick_pit_calibrate() will loop a bit longer to get the calibrated value with in the 500ppm precision. As the delta (d1/d2) covers four PIT accesses, actual calibrated result might be closer to 250ppm precision. As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50. SMI disturbance will showup as much larger delta's and the loop will take longer than usual for the result to be with in the accepted precision. Or will fallback to slow PIT calibration if it takes more than 50msec. Also while we are at this, remove the calibration correction that aims to get the result to the middle of the error bars. We really don't know which direction to correct into, so remove it. Reported-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1326843337.5291.4.camel@sbsiddha-mobl2 Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-01-11Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel config: Fix the APB_TIMER selection x86/mrst: Add additional debug prints for pb_keys x86/intel config: Revamp configuration to allow for Moorestown and Medfield x86/intel/scu/ipc: Match the changes in the x86 configuration x86/apb: Fix configuration constraints x86: Fix INTEL_MID silly x86/Kconfig: Cyclone-timer depends on x86-summit x86: Reduce clock calibration time during slave cpu startup x86/config: Revamp configuration for MID devices x86/sfi: Kill the IRQ as id hack
2011-12-05Merge branch 'fortglx/3.3/tip/timers/core' of ↵Thomas Gleixner
git://git.linaro.org/people/jstultz/linux into timers/core