summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic
AgeCommit message (Collapse)Author
2013-02-17x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systemsStoney Wang
commit cb214ede7657db458fd0b2a25ea0b28dbf900ebc upstream. When a HP ProLiant DL980 G7 Server boots a regular kernel, there will be intermittent lost interrupts which could result in a hang or (in extreme cases) data loss. The reason is that this system only supports x2apic physical mode, while the kernel boots with a logical-cluster default setting. This bug can be worked around by specifying the "x2apic_phys" or "nox2apic" boot option, but we want to handle this system without requiring manual workarounds. The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table. As all apicids are smaller than 255, BIOS need to pass the control to the OS with xapic mode, according to x2apic-spec, chapter 2.9. Current code handle x2apic when BIOS pass with xapic mode enabled: When user specifies x2apic_phys, or FADT indicates PHYSICAL: 1. During madt oem check, apic driver is set with xapic logical or xapic phys driver at first. 2. enable_IR_x2apic() will enable x2apic_mode. 3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe() will install the correct x2apic phys driver and use x2apic phys mode. Otherwise it will skip the driver will let x2apic_cluster_probe to take over to install x2apic cluster driver (wrong one) even though FADT indicates PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL. Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem. Signed-off-by: Stoney Wang <song-bo.wang@hp.com> [ updated the changelog and simplified the code ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24x86/irq/ioapic: Check for valid irq_cfg pointer in ↵Dimitri Sivanich
smp_irq_move_cleanup_interrupt Posting this patch to fix an issue concerning sparse irq's that I raised a while back. There was discussion about adding refcounting to sparse irqs (to fix other potential race conditions), but that does not appear to have been addressed yet. This covers the only issue of this type that I've encountered in this area. A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data. In create_irq_nr() there is a window where we have set vector_irq in __assign_irq_vector(), but not yet called irq_set_chip_data() to set the irq_cfg pointer. Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time, smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned irq, but panic when accessing irq_cfg. Only continue processing the irq if irq_cfg is non-NULL. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-06sections: fix section conflicts in arch/x86Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-19arch/x86: Remove unecessary semicolonsPeter Senna Tschudin
Found by http://coccinelle.lip6.fr/ Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Cc: avi@redhat.com Cc: mtosatti@redhat.com Cc: a.p.zijlstra@chello.nl Cc: rusty@rustcorp.com.au Cc: masami.hiramatsu.pt@hitachi.com Cc: suresh.b.siddha@intel.com Cc: joerg.roedel@amd.com Cc: agordeev@redhat.com Cc: yinghai@kernel.org Cc: bhelgaas@google.com Cc: liuj97@gmail.com Link: http://lkml.kernel.org/r/1347986174-30287-7-git-send-email-peter.senna@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-14x86, apic: fix broken legacy interrupts in the logical apic modeSuresh Siddha
Recent commit 332afa656e76458ee9cf0f0d123016a0658539e4 cleaned up a workaround that updates irq_cfg domain for legacy irq's that are handled by the IO-APIC. This was assuming that the recent changes in assign_irq_vector() were sufficient to remove the workaround. But this broke couple of AMD platforms. One of them seems to be sending interrupts to the offline cpu's, resulting in spurious "No irq handler for vector xx (irq -1)" messages when those cpu's come online. And the other platform seems to always send the interrupt to the last logical CPU (cpu-7). Recent changes had an unintended side effect of using only logical cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this broke the legacy interrupts not getting routed to the cpu-7 on the AMD platform, resulting in a boot hang. For now, reintroduce the removed workaround, (essentially not allowing the vector to change for legacy irq's when io-apic starts to handle the irq. Which also addressed the uninteded sife effect of just specifying cpu-0 in the IO-APIC RTE for those irq's during boot). Reported-and-tested-by: Robert Richter <robert.richter@amd.com> Reported-and-tested-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-08-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, kcmp: The kcmp system call can be common arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case x86/mce: Add quirk for instruction recovery on Sandy Bridge processors x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs x86, nops: Missing break resulting in incorrect selection on Intel x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental
2012-07-26Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros
2012-07-26x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqsTomoki Sekiyama
In the current kernel, percpu variable `vector_irq' is not always cleared when a CPU is offlined. If the CPU that has the disabled irqs in vector_irq is hotplugged again, __setup_vector_irq() hits invalid irq vector and may crash. This bug can be reproduced as following; # echo 0 > /sys/devices/system/cpu/cpu7/online # modprobe -r some_driver_using_interrupts # vector_irq@cpu7 uncleared # echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash To fix this problem, this patch clears vector_irq in __fixup_irqs() when the CPU is offlined. This also reverts commit f6175f5bfb4c, which partially fixes this bug by clearing vector in __clear_irq_vector(). But in environments with IOMMU IRQ remapper, it could fail because cfg->domain doesn't contain offlined CPUs. With this patch, the fix in __clear_irq_vector() can be reverted because every vector_irq is already cleared in __fixup_irqs() on offlined CPUs. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Yinghai Lu <yinghai@kernel.org> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120726104732.2889.19144.stgit@kvmdev Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
2012-07-22Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "This tree mostly involves various APIC driver cleanups/robustization, and vSMP motivated platform callback improvements/cleanups" Fix up trivial conflict due to printk cleanup right next to return value change. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()" x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity x86/apic/x2apic: Limit the vector reservation to the user specified mask x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership x86/vsmp: Fix vector_allocation_domain's return value irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask x86/apic/es7000+summit: Always make valid apicid from a cpumask x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid() x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and() x86/apic: Eliminate cpu_mask_to_apicid() operation x86/x2apic/cluster: Vector_allocation_domain() should return a value x86/apic/irq_remap: Silence a bogus pr_err() x86/vsmp: Ignore IOAPIC IRQ affinity if possible x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask x86/apic: Make cpu_mask_to_apicid() operations return error code x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector() x86/apic: Try to spread IRQ vectors to different priority levels x86/apic: Factor out default vector_allocation_domain() operation ...
2012-07-16apic: add apic_set_eoi_write for PV useMichael S. Tsirkin
KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-06x86/apic/x2apic: Use multiple cluster members for the irq destination only ↵Suresh Siddha
with the explicit affinity During boot or driver load etc, interrupt destination is setup using default target cpu's. Later the user (irqbalance etc) or the driver (irq_set_affinity/ irq_set_affinity_hint) can request the interrupt to be migrated to some specific set of cpu's. In the x2apic cluster routing, for the default scenario use single cpu as the interrupt destination and when there is an explicit interrupt affinity request, route the interrupt to multiple members of a x2apic cluster specified in the cpumask of the migration request. This will minmize the vector pressure when there are lot of interrupt sources and relatively few x2apic clusters (for example a single socket server). This will allow the performance critical interrupts to be routed to multiple cpu's in the x2apic cluster (irqbalance for example uses the cache siblings etc while specifying the interrupt destination) and allow non-critical interrupts to be serviced by a single logical cpu. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-4-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic/x2apic: Limit the vector reservation to the user specified maskSuresh Siddha
For the x2apic cluster mode, vector for an interrupt is currently reserved on all the cpu's that are part of the x2apic cluster. But the interrupts will be routed only to the cluster (derived from the first cpu in the mask) members specified in the mask. So there is no need to reserve the vector in the unused cluster members. Modify __assign_irq_vector() to reserve the vectors based on the user specified irq destination mask. If the new mask is a proper subset of the currently used mask, cleanup the vector allocation on the unused cpu members. Also, allow the apic driver to tune the vector domain based on the affinity mask (which in most cases is the user-specified mask). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain ↵Suresh Siddha
membership Currently __assign_irq_vector() goes through each cpu in the specified mask until it finds a free vector in all the cpu's that are part of the same interrupt domain. We visit all the interrupt domain sibling cpus to reserve the free vector. So, when we fail to find a free vector in an interrupt domain, it is safe to continue our search with a cpu belonging to a new interrupt domain. No need to go through each cpu, if the domain containing that cpu is already visited. Use the irq_cfg's old_domain to track the visited domains and optimize the cpu traversal while finding a free vector in the given cpumask. NOTE: We can also optimize the search by using for_each_cpu() and skip the current cpu, if it is not the first cpu in the mask returned by the vector_allocation_domain(). But re-using the cfg->old_domain to track the visited domains will be slightly faster. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20Merge commit 'v3.5-rc3' into x86/debugIngo Molnar
Merge it in to pick up a fix that we are going to clean up in this branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18Merge branch 'x86/apic' into x86/platformIngo Molnar
Merge in x86/apic to solve a vector_allocation_domain() API change semantic merge conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-15irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up ↵Suresh Siddha
irq_set_affinity() for UP Move the ->irq_set_affinity() routines out of the #ifdef CONFIG_SMP sections and use config_enabled(CONFIG_SMP) checks inside those routines. Thus making those routines simple null stubs for !CONFIG_SMP and retaining those routines with no additional runtime overhead for CONFIG_SMP kernels. Cleans up the ifdef CONFIG_SMP in and around routines related to irq_set_affinity in io_apic and irq_remapping subsystems. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: torvalds@linux-foundation.org Cc: joerg.roedel@amd.com Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1339723729.3475.63.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-15Merge branch 'x86/cleanups' into x86/apicIngo Molnar
Merge in the cleanups because a followup x86/apic change relies on them. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumaskAlexander Gordeev
cpu_mask_to_apicid_and() always returns apicid of a single CPU, even in case multiple CPUs were requested. This update fixes a typo and forces apicid of a cluster to be returned. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614075043.GI3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic/es7000+summit: Always make valid apicid from a cpumaskAlexander Gordeev
In case of invalid parameters cpu_mask_to_apicid_and() might return apicid value of 0 (on Summit) or a uninitialized value (on ES7000), although it is supposed to return apicid of cpu-0 at least. Fix the operation to always return a valid apicid. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614075026.GH3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid()Alexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614075010.GG3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and()Alexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614074954.GF3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic: Eliminate cpu_mask_to_apicid() operationAlexander Gordeev
Since there are only two locations where cpu_mask_to_apicid() is called from, remove the operation and use only cpu_mask_to_apicid_and() instead. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Suggested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614074935.GE3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/x2apic/cluster: Vector_allocation_domain() should return a valueAlexander Gordeev
Since commit 8637e38 ("x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()") vector_allocation_domain() operation indicates if a cpumask is dynamic or static. This update fixes the oversight and makes the operation to return a value. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614103933.GJ3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86: Add read_mostly declaration/definition to variables from smp.hVlad Zolotarov
Add "read-mostly" qualifier to the following variables in smp.h: - cpu_sibling_map - cpu_core_map - cpu_llc_shared_map - cpu_llc_id - cpu_number - x86_cpu_to_apicid - x86_bios_cpu_apicid - x86_cpu_to_logical_apicid As long as all the variables above are only written during the initialization, this change is meant to prevent the false sharing. More specifically, on vSMP Foundation platform x86_cpu_to_apicid shared the same internode_cache_line with frequently written lapic_events. From the analysis of the first 33 per_cpu variables out of 219 (memories they describe, to be more specific) the 8 have read_mostly nature (tlb_vector_offset, cpu_loops_per_jiffy, xen_debug_irq, etc.) and 25 are frequently written (irq_stack_union, gdt_page, exception_stacks, idt_desc, etc.). Assuming that the spread of the rest of the per_cpu variables is similar, identifying the read mostly memories will make more sense in terms of long-term code maintenance comparing to identifying frequently written memories. Signed-off-by: Vlad Zolotarov <vlad@scalemp.com> Acked-by: Shai Fultheim <shai@scalemp.com> Cc: Shai Fultheim (Shai@ScaleMP.com) <Shai@scalemp.com> Cc: ido@wizery.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1719258.EYKzE4Zbq5@vlad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_maskAlexander Gordeev
Currently cpu_mask_to_apicid() should not get a offline CPU with the cpumask. Otherwise some apic drivers might try to access non-existent per-cpu variables (i.e. x2apic). In that regard cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations are inconsistent. This fix makes the two operations do not rely on calling functions and always return the apicid for only online CPUs. As result, the meaning and implementations of cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations become straight. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131624.GG4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Make cpu_mask_to_apicid() operations return error codeAlexander Gordeev
Current cpu_mask_to_apicid() and cpu_mask_to_apicid_and() implementations have few shortcomings: 1. A value returned by cpu_mask_to_apicid() is written to hardware registers unconditionally. Should BAD_APICID get ever returned it will be written to a hardware too. But the value of BAD_APICID is not universal across all hardware in all modes and might cause unexpected results, i.e. interrupts might get routed to CPUs that are not configured to receive it. 2. Because the value of BAD_APICID is not universal it is counter- intuitive to return it for a hardware where it does not make sense (i.e. x2apic). 3. cpu_mask_to_apicid_and() operation is thought as an complement to cpu_mask_to_apicid() that only applies a AND mask on top of a cpumask being passed. Yet, as consequence of 18374d8 commit the two operations are inconsistent in that of: cpu_mask_to_apicid() should not get a offline CPU with the cpumask cpu_mask_to_apicid_and() should not fail and return BAD_APICID These limitations are impossible to realize just from looking at the operations prototypes. Most of these shortcomings are resolved by returning a error code instead of BAD_APICID. As the result, faults are reported back early rather than possibilities to cause a unexpected behaviour exist (in case of [1]). The only exception is setup_timer_IRQ0_pin() routine. Although obviously controversial to this fix, its existing behaviour is preserved to not break the fragile check_timer() and would better addressed in a separate fix. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131559.GF4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()Alexander Gordeev
In case of static vector allocation domains (i.e. flat) if all vector numbers are exhausted, an attempt to assign a new vector will lead to useless scans through all CPUs in the cpumask, even though it is known that each new pass would fail. Make this corner case less painful by letting report whether the vector allocation domain depends on passed arguments or not and stop scanning early. The same could have been achived by introducing a static flag to the apic operations. But let's allow vector_allocation_domain() have more intelligence here and decide dynamically, in case we would need it in the future. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131542.GE4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Try to spread IRQ vectors to different priority levelsAlexander Gordeev
When assigning a new vector it is primarially done by adding 8 to the previously given out vector number. Hence, two consequently allocated vector numbers would likely fall into the same priority level. Try to spread vector numbers to different priority levels better by changing the step from 8 to 16. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131514.GD4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Factor out default vector_allocation_domain() operationAlexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131449.GC4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqsTomoki Sekiyama
In current Linux, percpu variable `vector_irq' is not cleared on offlined cpus while disabling devices' irqs. If the cpu that has the disabled irqs in vector_irq is hotplugged, __setup_vector_irq() hits invalid irq vector and may crash. This bug can be reproduced as following; # echo 0 > /sys/devices/system/cpu/cpu7/online # modprobe -r some_driver_using_interrupts # vector_irq@cpu7 uncleared # echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash This patch fixes this bug by clearing vector_irq in __clear_irq_vector() even if the cpu is offlined. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: yrl.pp-manager.tt@hitachi.com Cc: ltc-kernel@ml.yrl.intra.hitachi.co.jp Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/4FC340BE.7080101@hitachi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/apic: Factor out default cpu_mask_to_apicid() operationsAlexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120605112340.GA11454@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/apic: Factor out default target_cpus() operationAlexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120605112324.GA11449@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/apic: Trivial whitespace fixesAlexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120605112310.GA11443@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/x2apic/cluster: Use all the members of one cluster specified in the ↵Suresh Siddha
smp_affinity mask for the interrupt destination If the HW implements round-robin interrupt delivery, this enables multiple cpu's (which are part of the user specified interrupt smp_affinity mask and belong to the same x2apic cluster) to service the interrupt. Also if the platform supports Power Aware Interrupt Routing, then this enables the interrupt to be routed to an idle cpu or a busy cpu depending on the perf/power bias tunable. We are now grouping all the cpu's in a cluster to one vector domain. So that will limit the total number of interrupt sources handled by Linux. Previously we support "cpu-count * available-vectors-per-cpu" interrupt sources but this will now reduce to "cpu-count/16 * available-vectors-per-cpu". Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: gorcunov@openvz.org Cc: agordeev@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337644682-19854-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/irq: Update irq_cfg domain unless the new affinity is a subset of the ↵Suresh Siddha
current domain Until now, irq_cfg domain is mostly static. Either all CPU's (used by flat mode) or one CPU (first CPU in the irq afffinity mask) to which irq is being migrated (this is used by the rest of apic modes). Upcoming x2apic cluster mode optimization patch allows the irq to be sent to any CPU in the x2apic cluster (if supported by the HW). So irq_cfg domain changes on the fly (depending on which CPU in the x2apic cluster is online). Instead of checking for any intersection between the new irq affinity mask and the current irq_cfg domain, check if the new irq affinity mask is a subset of the current irq_cfg domain. Otherwise proceed with updating the irq_cfg domain aswell as assigning vector's on all the CPUs specified in the new mask. This also cleans up a workaround in updating irq_cfg domain for legacy irq's that are handled by the IO-APIC. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yinghai@kernel.org Cc: gorcunov@openvz.org Cc: agordeev@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337644682-19854-1-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>Joe Perches
Use a more current logging style: - Bare printks should have a KERN_<LEVEL> for consistency's sake - Add pr_fmt where appropriate - Neaten some macro definitions - Convert some Ok output to OK - Use "%s: ", __func__ in pr_fmt for summit - Convert some printks to pr_<level> Message output is not identical in all cases. Signed-off-by: Joe Perches <joe@perches.com> Cc: levinsasha928@gmail.com Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop [ merged two similar patches, tidied up the changelog ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06x86/platform: Introduce APIC post-initialization callbackIdo Yariv
Some subarchitectures (such as vSMP) need to slightly adjust the underlying APIC structure. Add an APIC post-initialization callback to 'struct x86_platform_ops' for this purpose and use it for adjusting the APIC structure on vSMP systems. Signed-off-by: Ido Yariv <ido@wizery.com> Acked-by: Shai Fultheim <shai@scalemp.com> Link: http://lkml.kernel.org/r/1338675095-27260-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-24x86: Return IRQ_SET_MASK_OK_NOCOPY from irq affinity functionsJiang Liu
The interrupt chip irq_set_affinity() functions copy the affinity mask to irq_data->affinity but return 0, i.e. IRQ_SET_MASK_OK. IRQ_SET_MASK_OK causes the core code to do another redundant copy. Return IRQ_SET_MASK_OK_NOCOPY to avoid this. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Cliff Wickman <cpw@sgi.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Link: http://lkml.kernel.org/r/1333120296-13563-4-git-send-email-jiang.liu@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-23Merge branch 'delete-mca' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull the MCA deletion branch from Paul Gortmaker: "It was good that we could support MCA machines back in the day, but realistically, nobody is using them anymore. They were mostly limited to 386-sx 16MHz CPU and some 486 class machines and never more than 64MB of RAM. Even the enthusiast hobbyist community seems to have dried up close to ten years ago, based on what you can find searching various websites dedicated to the relatively short lived hardware. So lets remove the support relating to CONFIG_MCA. There is no point carrying this forward, wasting cycles doing routine maintenance on it; wasting allyesconfig build time on validating it, wasting I/O on git grep'ping over it, and so on." Let's see if anybody screams. It generally has compiled, and James Bottomley pointed out that there was a MCA extension from NCR that allowed for up to 4GB of memory and PPro-class machines. So in *theory* there may be users out there. But even James (technically listed as a maintainer) doesn't actually have a system, and while Alan Cox claims to have a machine in his cellar that he offered to anybody who wants to take it off his hands, he didn't argue for keeping MCA support either. So we could bring it back. But somebody had better speak up and talk about how they have actually been using said MCA hardware with modern kernels for us to do that. And David already took the patch to delete all the networking driver code (commit a5e371f61ad3: "drivers/net: delete all code/drivers depending on CONFIG_MCA"). * 'delete-mca' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: MCA: delete all remaining traces of microchannel bus support. scsi: delete the MCA specific drivers and driver code serial: delete the MCA specific 8250 support. arm: remove ability to select CONFIG_MCA
2012-05-22Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Most of the changes are about helping virtualized guest kernels achieve better performance." Fix up trivial conflicts with the iommu updates to arch/x86/kernel/apic/io_apic.c * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Implement EIO micro-optimization x86/apic: Add apic->eoi_write() callback x86/apic: Use symbolic APIC_EOI_ACK x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it x86/xen/apic: Add missing #include <xen/xen.h> x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ x86/apic: Fix UP boot crash x86: Conditionally update time when ack-ing pending irqs xen/apic: implement io apic read with hypercall Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'" xen/x86: Implement x86_apic_ops x86/apic: Replace io_apic_ops with x86_io_apic_ops.
2012-05-18x86/apic: Implement EIO micro-optimizationMichael S. Tsirkin
We know both register and value for eoi beforehand, so there's no need to check it and no need to do math to calculate the msr. Saves instructions/branches on each EOI when using x2apic. I looked at the objdump output to verify that the generated code looks right and actually is shorter. The real improvemements will be on the KVM guest side though, those come in a later patch. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: gleb@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/e019d1a125316f10d3e3a4b2f6bda41473f4fb72.1337184153.git.mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-18x86/apic: Add apic->eoi_write() callbackMichael S. Tsirkin
Add eoi_write callback so that kvm can override eoi accesses without touching the rest of the apic. As a side-effect, this will enable a micro-optimization for apics using msr. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: gleb@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0df425d746c49ac2ecc405174df87752869629d2.1337184153.git.mst@redhat.com [ tidied it up a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-17MCA: delete all remaining traces of microchannel bus support.Paul Gortmaker
Hardware with MCA bus is limited to 386 and 486 class machines that are now 20+ years old and typically with less than 32MB of memory. A quick search on the internet, and you see that even the MCA hobbyist/enthusiast community has lost interest in the early 2000 era and never really even moved ahead from the 2.4 kernels to the 2.6 series. This deletes anything remaining related to CONFIG_MCA from core kernel code and from the x86 architecture. There is no point in carrying this any further into the future. One complication to watch for is inadvertently scooping up stuff relating to machine check, since there is overlap in the TLA name space (e.g. arch/x86/boot/mca.c). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: James Bottomley <JBottomley@Parallels.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-08x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQMárton Németh
The local function io_apic_level_ack_pending() is only called from io_apic_level_ack_pending(). The later function is only compiled if CONFIG_GENERIC_PENDING_IRQ is defined. Move the io_apic_level_ack_pending() to the existing #ifdef CONFIG_GENERIC_PENDING_IRQ code block. This will remove the following warning message during compiling without CONFIG_GENERIC_PENDING_IRQ defined: * arch/x86/kernel/apic/io_apic.c:382: warning: ‘io_apic_level_ack_pending’ defined but not used Signed-off-by: Márton Németh <nm127@freemail.hu> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Link: http://lkml.kernel.org/r/1336461860.2296.3.camel@sbsiddha-mobl2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07x86: Conditionally update time when ack-ing pending irqsShai Fultheim
On virtual environments, apic_read could take a long time. As a result, under certain conditions the ack pending loop may exit without any queued irqs left, but after more than one second. A warning will be printed needlessly in this case. If the loop is about to exit regardless of max_loops, don't update it. Signed-off-by: Shai Fultheim <shai@scalemp.com> [ rebased and reworded the commit message] Signed-off-by: Ido Yariv <ido@wizery.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07iommu: rename intr_remapping.[ch] to irq_remapping.[ch]Suresh Siddha
Make the file names consistent with the naming conventions of irq subsystem. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07iommu: rename intr_remapping references to irq_remappingSuresh Siddha
Make the code consistent with the naming conventions of irq subsystem. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07x86, iommu/vt-d: Clean up interfaces for interrupt remappingJoerg Roedel
Remove the Intel specific interfaces from dmar.h and remove asm/irq_remapping.h which is only used for io_apic.c anyway. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-07iommu/vt-d: Convert MSI remapping setup to remap_opsJoerg Roedel
This patch introduces remapping-ops for setting ups MSI interrupts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>