summaryrefslogtreecommitdiff
path: root/arch/powerpc/sysdev/xive
AgeCommit message (Collapse)Author
2021-09-03Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch into next. That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c. Between cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs"), which moved request_irq() out of xive_init_ipis(), and 17df41fec5b8 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added IRQF_NO_DEBUG to that request_irq() call, which has now moved.
2021-08-18powerpc/xive: Do not mark xive_request_ipi() as __initNathan Chancellor
Compiling ppc64le_defconfig with clang-14 shows a modpost warning: WARNING: modpost: vmlinux.o(.text+0xa74e0): Section mismatch in reference from the function xive_setup_cpu_ipi() to the function .init.text:xive_request_ipi() The function xive_setup_cpu_ipi() references the function __init xive_request_ipi(). This is often because xive_setup_cpu_ipi lacks a __init annotation or the annotation of xive_request_ipi is wrong. xive_request_ipi() is called from xive_setup_cpu_ipi(), which is not __init, so xive_request_ipi() should not be marked __init. Remove the attribute so there is no more warning. Fixes: cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210816185711.21563-1-nathan@kernel.org
2021-08-13powerpc: rename powerpc_debugfs_root to arch_debugfs_dirAneesh Kumar K.V
No functional change in this patch. arch_debugfs_dir is the generic kernel name declared in linux/debugfs.h for arch-specific debugfs directory. Architectures like x86/s390 already use the name. Rename powerpc specific powerpc_debugfs_root to arch_debugfs_dir. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
2021-08-12powerpc/xive: Do not skip CPU-less nodes when creating the IPIsCédric Le Goater
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups. We can not directly allocate and request the IPI on demand because bringup_up() is called under the IRQ sparse lock. The alternative is to allocate the IPIs for all possible nodes at startup and to request the mapping on demand when the first CPU of a node is brought up. Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210807072057.184698-1-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restoreCédric Le Goater
On P10, the feature doing an automatic "save & restore" of a VCPU interrupt context is set by default in OPAL. When a VP context is pulled out, the state of the interrupt registers are saved by the XIVE interrupt controller under the internal NVP structure representing the VP. This saves a costly store/load in guest entries and exits. If OPAL advertises the "save & restore" feature in the device tree, it should also have set the 'H' bit in the CAM line. Check that when vCPUs are connected to their ICP in KVM before going any further. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
2021-08-10powerpc: use IRQF_NO_DEBUG for IPIsCédric Le Goater
There is no need to use the lockup detector ("noirqdebug") for IPIs. The ipistorm benchmark measures a ~10% improvement on high systems when this flag is set. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210719130614.195886-1-clg@kaod.org
2021-08-10powerpc/xive: Use XIVE domain under xmon and debugfsCédric Le Goater
The default domain of the PCI/MSIs is not the XIVE domain anymore. To list the IRQ mappings under XMON and debugfs, query the IRQ data from the low level XIVE domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-32-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add a msi_free() handler to clear XIVE dataCédric Le Goater
The MSI domain clears the IRQ with msi_domain_free(), which calls irq_domain_free_irqs_top(), which clears the handler data. This is a problem for the XIVE controller since we need to unmap MMIO pages and free a specific XIVE structure. The 'msi_free()' handler is called before irq_domain_free_irqs_top() when the handler data is still available. Use that to clear the XIVE controller data. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-10-clg@kaod.org
2021-08-10powerpc/xive: Remove irqd_is_started() check when setting the affinityCédric Le Goater
In the early days of XIVE support, commit cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity") tried to fix an issue related to interrupt migration. If the root cause was related to CPU unplug, it should have been fixed and there is no reason to keep the irqd_is_started() check. This test is also breaking affinity setting of MSIs which can set before starting the associated IRQ. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-8-clg@kaod.org
2021-08-10powerpc/xive: Drop unmask of MSIs at startupCédric Le Goater
That was a workaround in the XIVE domain because of the lack of MSI domain. This is now handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-7-clg@kaod.org
2021-08-10powerpc/xive: Ease debugging of xive_irq_set_affinity()Cédric Le Goater
pr_debug() is easier to activate and it helps to know how the kernel configures the HW when tweaking the IRQ subsystem. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-5-clg@kaod.org
2021-08-10powerpc/xive: Add support for IRQ domain hierarchyCédric Le Goater
This adds handlers to allocate/free IRQs in a domain hierarchy. We could try to use xive_irq_domain_map() in xive_irq_domain_alloc() but we rely on xive_irq_alloc_data() to set the IRQ handler data and duplicating the code is simpler. xive_irq_free_data() needs to be called when IRQ are freed to clear the MMIO mappings and free the XIVE handler data, xive_irq_data structure. This is going to be a problem with MSI domains which we will address later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-4-clg@kaod.org
2021-07-05powerpc/xive: Fix error handling when allocating an IPICédric Le Goater
This is a smatch warning: arch/powerpc/sysdev/xive/common.c:1161 xive_request_ipi() warn: unsigned 'xid->irq' is never less than zero. Fixes: fd6db2892eba ("powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler") Cc: stable@vger.kernel.org # v5.13 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701152412.1507612-1-clg@kaod.org
2021-06-10powerpc: Move the use of irq_domain_add_nomap() behind a config optionMarc Zyngier
Only a handful of old PPC systems are still using the old 'nomap' variant of the irqdomain library. Move the associated definitions behind a configuration option, which will allow us to make some more radical changes. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "A few misc subsystems and some of MM. 175 patches. Subsystems affected by this patch series: ia64, kbuild, scripts, sh, ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub, kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap, mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits) mm/memory-failure: unnecessary amount of unmapping mm/mmzone.h: fix existing kernel-doc comments and link them to core-api mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1 net: page_pool: use alloc_pages_bulk in refill code path net: page_pool: refactor dma_map into own function page_pool_dma_map SUNRPC: refresh rq_pages using a bulk page allocator SUNRPC: set rq_page_end differently mm/page_alloc: inline __rmqueue_pcplist mm/page_alloc: optimize code layout for __alloc_pages_bulk mm/page_alloc: add an array-based interface to the bulk page allocator mm/page_alloc: add a bulk page allocator mm/page_alloc: rename alloced to allocated mm/page_alloc: duplicate include linux/vmalloc.h mm, page_alloc: avoid page_to_pfn() in move_freepages() mm/Kconfig: remove default DISCONTIGMEM_MANUAL mm: page_alloc: dump migrate-failed pages mm/mempolicy: fix mpol_misplaced kernel-doc mm/mempolicy: rewrite alloc_pages_vma documentation mm/mempolicy: rewrite alloc_pages documentation mm/mempolicy: rename alloc_pages_current to alloc_pages ...
2021-04-30powerpc/xive: remove unnecessary unmap_kernel_rangeNicholas Piggin
iounmap will remove ptes. Link: https://lkml.kernel.org/r/20210322021806.892164-4-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Cédric Le Goater <clg@kaod.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-17powerpc/xive: Use the "ibm, chip-id" property only under PowerNVCédric Le Goater
The 'chip_id' field of the XIVE CPU structure is used to choose a target for a source located on the same chip. For that, the XIVE driver queries the chip identifier from the "ibm,chip-id" property and compares it to a 'src_chip' field identifying the chip of a source. This information is only available on the PowerNV platform, 'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries. The "ibm,chip-id" property is also not available on all platforms. It was first introduced on PowerNV and later, under QEMU for pSeries/KVM. However, the property is not part of PAPR and does not exist under pSeries/PowerVM. Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the PowerNV platform override the value with the "ibm,chip-id" property. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org
2021-04-14powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handlerCédric Le Goater
Instead of calling irq_create_mapping() to map the IPI for a node, introduce an 'alloc' handler. This is usually an extension to support hierarchy irq_domains which is not exactly the case for XIVE-IPI domain. However, we can now use the irq_domain_alloc_irqs() routine which allocates the IRQ descriptor on the specified node, even better for cache performance on multi node machines. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-10-clg@kaod.org
2021-04-14powerpc/xive: Map one IPI interrupt per nodeCédric Le Goater
ipistorm [*] can be used to benchmark the raw interrupt rate of an interrupt controller by measuring the number of IPIs a system can sustain. When applied to the XIVE interrupt controller of POWER9 and POWER10 systems, a significant drop of the interrupt rate can be observed when crossing the second node boundary. This is due to the fact that a single IPI interrupt is used for all CPUs of the system. The structure is shared and the cache line updates impact greatly the traffic between nodes and the overall IPI performance. As a workaround, the impact can be reduced by deactivating the IRQ lockup detector ("noirqdebug") which does a lot of accounting in the Linux IRQ descriptor structure and is responsible for most of the performance penalty. As a fix, this proposal allocates an IPI interrupt per node, to be shared by all CPUs of that node. It solves the scaling issue, the IRQ lockup detector still has an impact but the XIVE interrupt rate scales linearly. It also improves the "noirqdebug" case as showed in the tables below. * P9 DD2.2 - 2s * 64 threads "noirqdebug" Mint/s Mint/s chips cpus IPI/sys IPI/chip IPI/chip IPI/sys -------------------------------------------------------------- 1 0-15 4.984023 4.875405 4.996536 5.048892 0-31 10.879164 10.544040 10.757632 11.037859 0-47 15.345301 14.688764 14.926520 15.310053 0-63 17.064907 17.066812 17.613416 17.874511 2 0-79 11.768764 21.650749 22.689120 22.566508 0-95 10.616812 26.878789 28.434703 28.320324 0-111 10.151693 31.397803 31.771773 32.388122 0-127 9.948502 33.139336 34.875716 35.224548 * P10 DD1 - 4s (not homogeneous) 352 threads "noirqdebug" Mint/s Mint/s chips cpus IPI/sys IPI/chip IPI/chip IPI/sys -------------------------------------------------------------- 1 0-15 2.409402 2.364108 2.383303 2.395091 0-31 6.028325 6.046075 6.089999 6.073750 0-47 8.655178 8.644531 8.712830 8.724702 0-63 11.629652 11.735953 12.088203 12.055979 0-79 14.392321 14.729959 14.986701 14.973073 0-95 12.604158 13.004034 17.528748 17.568095 2 0-111 9.767753 13.719831 19.968606 20.024218 0-127 6.744566 16.418854 22.898066 22.995110 0-143 6.005699 19.174421 25.425622 25.417541 0-159 5.649719 21.938836 27.952662 28.059603 0-175 5.441410 24.109484 31.133915 31.127996 3 0-191 5.318341 24.405322 33.999221 33.775354 0-207 5.191382 26.449769 36.050161 35.867307 0-223 5.102790 29.356943 39.544135 39.508169 0-239 5.035295 31.933051 42.135075 42.071975 0-255 4.969209 34.477367 44.655395 44.757074 4 0-271 4.907652 35.887016 47.080545 47.318537 0-287 4.839581 38.076137 50.464307 50.636219 0-303 4.786031 40.881319 53.478684 53.310759 0-319 4.743750 43.448424 56.388102 55.973969 0-335 4.709936 45.623532 59.400930 58.926857 0-351 4.681413 45.646151 62.035804 61.830057 [*] https://github.com/antonblanchard/ipistorm Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-9-clg@kaod.org
2021-04-14powerpc/xive: Fix xmon command "dxi"Cédric Le Goater
When under xmon, the "dxi" command dumps the state of the XIVE interrupts. If an interrupt number is specified, only the state of the associated XIVE interrupt is dumped. This form of the command lacks an irq_data parameter which is nevertheless used by xmon_xive_get_irq_config(), leading to an xmon crash. Fix that by doing a lookup in the system IRQ mapping to query the IRQ descriptor data. Invalid interrupt numbers, or not belonging to the XIVE IRQ domain, OPAL event interrupt number for instance, should be caught by the previous query done at the firmware level. Fixes: 97ef27507793 ("powerpc/xive: Fix xmon support on the PowerNV platform") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-8-clg@kaod.org
2021-04-14powerpc/xive: Simplify the dump of XIVE interrupts under xmonCédric Le Goater
Move the xmon routine under XIVE subsystem and rework the loop on the interrupts taking into account the xive_irq_domain to filter out IPIs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-7-clg@kaod.org
2021-04-14powerpc/xive: Drop check on irq_data in xive_core_debug_show()Cédric Le Goater
When looping on IRQ descriptor, irq_data is always valid. Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-6-clg@kaod.org
2021-04-14powerpc/xive: Simplify xive_core_debug_show()Cédric Le Goater
Now that the IPI interrupt has its own domain, the checks on the HW interrupt number XIVE_IPI_HW_IRQ and on the chip can be replaced by a check on the domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-5-clg@kaod.org
2021-04-14powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQCédric Le Goater
The IPI interrupt has its own domain now. Testing the HW interrupt number is not needed anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-4-clg@kaod.org
2021-04-14powerpc/xive: Introduce an IPI interrupt domainCédric Le Goater
The IPI interrupt is a special case of the XIVE IRQ domain. When mapping and unmapping the interrupts in the Linux interrupt number space, the HW interrupt number 0 (XIVE_IPI_HW_IRQ) is checked to distinguish the IPI interrupt from other interrupts of the system. Simplify the XIVE interrupt domain by introducing a specific domain for the IPI. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331144514.892250-3-clg@kaod.org
2021-03-29powerpc/xive: use true and false for bool variableYang Li
fixed the following coccicheck: ./arch/powerpc/sysdev/xive/spapr.c:552:8-9: WARNING: return of 0/1 in function 'xive_spapr_match' with return type bool Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1615793096-83758-1-git-send-email-yang.lee@linux.alibaba.com
2020-12-11powerpc/xive: Improve error reporting of OPAL callsCédric Le Goater
Introduce a vp_err() macro to standardize error reporting. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-13-clg@kaod.org
2020-12-11powerpc/xive: Simplify xive_do_source_eoi()Cédric Le Goater
Previous patches removed the need of the first argument which was a hack for Firwmware EOI. Remove it and flatten the routine which has became simpler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-12-clg@kaod.org
2020-12-11powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FWCédric Le Goater
This flag was used to support the P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Also, remove eoi handler which is now unused. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org
2020-12-11powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FWCédric Le Goater
This flag was used to support the PHB4 LSIs on P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org
2020-12-11powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUGCédric Le Goater
This flag was used to support the PHB4 LSIs on P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-9-clg@kaod.org
2020-12-11powerpc/xive: Add a debug_show handler to the XIVE irq_domainCédric Le Goater
Full state of the Linux interrupt descriptors can be dumped under debugfs when compiled with CONFIG_GENERIC_IRQ_DEBUGFS. Add support for the XIVE interrupt controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-7-clg@kaod.org
2020-12-11powerpc/xive: Add a name to the IRQ domainCédric Le Goater
We hope one day to handle multiple irq_domain in the XIVE driver. Start simple by setting the name using the DT node. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-6-clg@kaod.org
2020-12-11powerpc/xive: Introduce XIVE_IPI_HW_IRQCédric Le Goater
The XIVE driver deals with CPU IPIs in a peculiar way. Each CPU has its own XIVE IPI interrupt allocated at the HW level, for PowerNV, or at the hypervisor level for pSeries. In practice, these interrupts are not always used. pSeries/PowerVM prefers local doorbells for local threads since they are faster. On PowerNV, global doorbells are also preferred for the same reason. The mapping in the Linux is reduced to a single interrupt using HW interrupt number 0 and a custom irq_chip to handle EOI. This can cause performance issues in some benchmark (ipistorm) on multichip systems. Clarify the use of the 0 value, it will help in improving multichip support. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-4-clg@kaod.org
2020-12-11powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flagCédric Le Goater
This is a simple cleanup to identify easily all flags of the XIVE interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI are the escalations used to wake up vCPUs in KVM. They are handled very differently from the rest. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org
2020-09-18powerpc/xive: Make debug routines staticCédric Le Goater
This fixes a compile error with W=1. CC arch/powerpc/sysdev/xive/common.o ../arch/powerpc/sysdev/xive/common.c:1568:6: error: no previous prototype for ‘xive_debug_show_cpu’ [-Werror=missing-prototypes] void xive_debug_show_cpu(struct seq_file *m, int cpu) ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/sysdev/xive/common.c:1602:6: error: no previous prototype for ‘xive_debug_show_irq’ [-Werror=missing-prototypes] void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d) ^~~~~~~~~~~~~~~~~~~ Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914211007.2285999-5-clg@kaod.org
2020-07-30powerpc: fix function annotations to avoid section mismatch warnings with gcc-10Vladis Dronov
Certain warnings are emitted for powerpc code when building with a gcc-10 toolset: WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in reference from the function remove_pmd_table() to the function .meminit.text:split_kernel_mapping() The function remove_pmd_table() references the function __meminit split_kernel_mapping(). This is often because remove_pmd_table lacks a __meminit annotation or the annotation of split_kernel_mapping is wrong. Add the appropriate __init and __meminit annotations to make modpost not complain. In all the cases there are just a single callsite from another __init or __meminit function: __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table() __init prom_init() -> setup_secure_guest() __init xive_spapr_init() -> xive_spapr_disabled() Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200729133741.62789-1-vdronov@redhat.com
2020-06-22powerpc/xive: Ignore kmemleak false positivesAlexey Kardashevskiy
xive_native_provision_pages() allocates memory and passes the pointer to OPAL so kmemleak cannot find the pointer usage in the kernel memory and produces a false positive report (below) (even if the kernel did scan OPAL memory, it is unable to deal with __pa() addresses anyway). This silences the warning. unreferenced object 0xc000200350c40000 (size 65536): comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s) hex dump (first 32 bytes): 02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00 ....P........... 01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000081ff046c>] xive_native_alloc_vp_block+0x120/0x250 [<00000000d555d524>] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm] [<00000000d69b9c9f>] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm] [<000000006acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm] [<0000000089c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm] [<00000000902ae91e>] ksys_ioctl+0x184/0x1b0 [<00000000f3e68bd7>] sys_ioctl+0x48/0xb0 [<0000000001b2c127>] system_call_exception+0x124/0x1f0 [<00000000d2b2ee40>] system_call_common+0xe8/0x214 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612043303.84894-1-aik@ozlabs.ru
2020-05-28powerpc/xive: Share the event-queue page with the Hypervisor.Ram Pai
XIVE interrupt controller uses an Event Queue (EQ) to enqueue event notifications when an exception occurs. The EQ is a single memory page provided by the O/S defining a circular buffer, one per server and priority couple. On baremetal, the EQ page is configured with an OPAL call. On pseries, an extra hop is necessary and the guest OS uses the hcall H_INT_SET_QUEUE_CONFIG to configure the XIVE interrupt controller. The XIVE controller being Hypervisor privileged, it will not be allowed to enqueue event notifications for a Secure VM unless the EQ pages are shared by the Secure VM. Hypervisor/Ultravisor still requires support for the TIMA and ESB page fault handlers. Until this is complete, QEMU can use the emulated XIVE device for Secure VMs, option "kernel_irqchip=off" on the QEMU pseries machine. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Cedric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200426020518.GC5853@oc0525413822.ibm.com
2020-05-28powerpc/xive: Do not expose a debugfs file when XIVE is disabledCédric Le Goater
The XIVE interrupt mode can be disabled with the "xive=off" kernel parameter, in which case there is nothing to present to the user in the associated /sys/kernel/debug/powerpc/xive file. Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-4-clg@kaod.org
2020-05-26powerpc/xive: Clear the page tables for the ESB IO mappingCédric Le Goater
Commit 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") fixed an issue in the FW assisted dump of machines using hash MMU and the XIVE interrupt mode under the POWER hypervisor. It forced the mapping of the ESB page of interrupts being mapped in the Linux IRQ number space to make sure the 'crash kexec' sequence worked during such an event. But it didn't handle the un-mapping. This mapping is now blocking the removal of a passthrough IO adapter under the POWER hypervisor because it expects the guest OS to have cleared all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations". Remove these mapping in the IRQ data cleanup routine. Under KVM, this cleanup is not required because the ESB pages for the adapter interrupts are un-mapped from the guest by the hypervisor in the KVM XIVE native device. This is now redundant but it's harmless. Fixes: 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org
2020-05-20Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our topic branch shared with the kvm-ppc tree. This brings in one commit that touches the XIVE interrupt controller logic across core and KVM code.
2020-05-07powerpc/xive: Enforce load-after-store ordering when StoreEOI is activeCédric Le Goater
When an interrupt has been handled, the OS notifies the interrupt controller with a EOI sequence. On a POWER9 system using the XIVE interrupt controller, this can be done with a load or a store operation on the ESB interrupt management page of the interrupt. The StoreEOI operation has less latency and improves interrupt handling performance but it was deactivated during the POWER9 DD2.0 timeframe because of ordering issues. We use the LoadEOI today but we plan to reactivate StoreEOI in future architectures. There is usually no need to enforce ordering between ESB load and store operations as they should lead to the same result. E.g. a store trigger and a load EOI can be executed in any order. Assuming the interrupt state is PQ=10, a store trigger followed by a load EOI will return a Q bit. In the reverse order, it will create a new interrupt trigger from HW. In both cases, the handler processing interrupts is notified. In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to disable temporarily the interrupt source (mask/unmask). When the source is reenabled, the OS can detect if interrupts were received while the source was disabled and reinject them. This process needs special care when StoreEOI is activated. The ESB load and store operations should be correctly ordered because a XIVE_ESB_STORE_EOI operation could leave the source enabled if it has not completed before the loads. For those cases, we enforce Load-after-Store ordering with a special load operation offset. To avoid performance impact, this ordering is only enforced when really needed, that is when interrupt sources are temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not be needed for other loads. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org
2020-04-20powerpc/xive: Define xive_native_alloc_irq_on_chip()Haren Myneni
This function allocates IRQ on a specific chip. VAS needs per chip IRQ allocation and will have IRQ handler per VAS instance. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587016720.2275.1047.camel@hbabu-laptop
2020-03-27powerpc/xive: Add a debugfs file to dump internal XIVE stateCédric Le Goater
As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes the XIVE internal state of the machine CPUs and interrupts. Available on the PowerNV and sPAPR platforms. Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Make the debugfs file 0400] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org
2020-03-27powerpc/xmon: Add source flags to output of XIVE interruptsCédric Le Goater
Some firmwares or hypervisors can advertise different source characteristics. Track their value under XMON. What we are mostly interested in is the StoreEOI flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org
2020-03-27powerpc/xive: Fix xmon support on the PowerNV platformCédric Le Goater
The PowerNV platform has multiple IRQ chips and the xmon command dumping the state of the XIVE interrupt should only operate on the XIVE IRQ chip. Fixes: 5896163f7f91 ("powerpc/xmon: Improve output of XIVE interrupts") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org
2020-03-27powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIsCédric Le Goater
When a CPU is brought up, an IPI number is allocated and recorded under the XIVE CPU structure. Invalid IPI numbers are tracked with interrupt number 0x0. On the PowerNV platform, the interrupt number space starts at 0x10 and this works fine. However, on the sPAPR platform, it is possible to allocate the interrupt number 0x0 and this raises an issue when CPU 0 is unplugged. The XIVE spapr driver tracks allocated interrupt numbers in a bitmask and it is not correctly updated when interrupt number 0x0 is freed. It stays allocated and it is then impossible to reallocate. Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms. Reported-by: David Gibson <david@gibson.dropbear.id.au> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org
2020-01-22powerpc/xive: Discard ESB load value when interrupt is invalidFrederic Barrat
A load on an ESB page returning all 1's means that the underlying device has invalidated the access to the PQ state of the interrupt through mmio. It may happen, for example when querying a PHB interrupt while the PHB is in an error state. In that case, we should consider the interrupt to be invalid when checking its state in the irq_get_irqchip_state() handler. Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> [clg: wrote a commit log, introduced XIVE_ESB_INVALID ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org
2019-12-05powerpc/xive: Skip ioremap() of ESB pages for LSI interruptsCédric Le Goater
The PCI INTx interrupts and other LSI interrupts are handled differently under a sPAPR platform. When the interrupt source characteristics are queried, the hypervisor returns an H_INT_ESB flag to inform the OS that it should be using the H_INT_ESB hcall for interrupt management and not loads and stores on the interrupt ESB pages. A default -1 value is returned for the addresses of the ESB pages. The driver ignores this condition today and performs a bogus IO mapping. Recent changes and the DEBUG_VM configuration option make the bug visible with : kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1 NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000 REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le) ... NIP ioremap_page_range+0x4c4/0x6e0 LR ioremap_page_range+0x74/0x6e0 Call Trace: ioremap_page_range+0x74/0x6e0 (unreliable) do_ioremap+0x8c/0x120 __ioremap_caller+0x128/0x140 ioremap+0x30/0x50 xive_spapr_populate_irq_data+0x170/0x260 xive_irq_domain_map+0x8c/0x170 irq_domain_associate+0xb4/0x2d0 irq_create_mapping+0x1e0/0x3b0 irq_create_fwspec_mapping+0x27c/0x3e0 irq_create_of_mapping+0x98/0xb0 of_irq_parse_and_map_pci+0x168/0x230 pcibios_setup_device+0x88/0x250 pcibios_setup_bus_devices+0x54/0x100 __of_scan_bus+0x160/0x310 pcibios_scan_phb+0x330/0x390 pcibios_init+0x8c/0x128 do_one_initcall+0x60/0x2c0 kernel_init_freeable+0x290/0x378 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x80 Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org