summaryrefslogtreecommitdiff
path: root/drivers/virtio
AgeCommit message (Collapse)Author
2016-07-26mm: fix build warnings in <linux/compaction.h>Minchan Kim
Randy reported below build error. > In file included from ../include/linux/balloon_compaction.h:48:0, > from ../mm/balloon_compaction.c:11: > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default] > static inline int compaction_register_node(struct node *node) > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default] > static inline void compaction_unregister_node(struct node *node) > It was caused by non-lru page migration which needs compaction.h but compaction.h doesn't include any header to be standalone. I think proper header for non-lru page migration is migrate.h rather than compaction.h because migrate.h has already headers needed to work non-lru page migration indirectly like isolate_mode_t, migrate_mode MIGRATEPAGE_SUCCESS. [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix] Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26mm: balloon: use general non-lru movable page featureMinchan Kim
Now, VM has a feature to migrate non-lru movable pages so balloon doesn't need custom migration hooks in migrate.c and compaction.c. Instead, this patch implements the page->mapping->a_ops-> {isolate|migrate|putback} functions. With that, we could remove hooks for ballooning in general migration functions and make balloon compaction simple. [akpm@linux-foundation.org: compaction.h requires that the includer first include node.h] Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rafael Aquini <aquini@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22virtio_balloon: fix PFN format for virtio-1Michael S. Tsirkin
Everything should be LE when using virtio-1, but the linux balloon driver does not seem to care about that. Reported-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-01virtio: Silence uninitialized variable warningDan Carpenter
Smatch complains that we might not initialize "queue". The issue is callers like setup_vq() from virtio_pci_modern.c where "num" could be something like 2 and "vring_align" is 64. In that case, vring_size() is less than PAGE_SIZE. It won't happen in real life, but we're getting the value of "num" from a register so it's not really possible to tell what value it holds with static analysis. Let's just silence the warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07virtio: virtio 1.0 cs04 spec compliance for resetMichael S. Tsirkin
The spec says: after writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing the device. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2016-03-20Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost updates from Michael Tsirkin: "New features, performance improvements, cleanups: - basic polling support for vhost - rework virtio to optionally use DMA API, fixing it on Xen - balloon stats gained a new entry - using the new napi_alloc_skb speeds up virtio net - virtio blk stats can now be read while another VCPU is busy inflating or deflating the balloon plus misc cleanups in various places" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb() vhost_net: basic polling support vhost: introduce vhost_vq_avail_empty() vhost: introduce vhost_has_work() virtio_balloon: Allow to resize and update the balloon stats in parallel virtio_balloon: Use a workqueue instead of "vballoon" kthread virtio/s390: size of SET_IND payload virtio/s390: use dev_to_virtio vhost: rename vhost_init_used() vhost: rename cross-endian helpers virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH vring: Use the DMA API on Xen virtio_pci: Use the DMA API if enabled virtio_mmio: Use the DMA API if enabled virtio: Add improved queue allocation API virtio_ring: Support DMA APIs vring: Introduce vring_use_dma_api() s390/dma: Allow per device dma ops alpha/dma: use common noop dma ops dma: Provide simple noop dma ops
2016-03-17virtio_balloon: export 'available' memory to balloon statisticsIgor Redko
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. Signed-off-by: Igor Redko <redkoi@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16Merge tag 'pci-v4.6-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for v4.6: Enumeration: - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas) - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas Resource management: - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas) - Don't assign or reassign immutable resources (Bjorn Helgaas) - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas) - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas) - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas) - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas) - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas) - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas) - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) Virtualization: - Wait for up to 1000ms after FLR reset (Alex Williamson) - Support SR-IOV on any function type (Kelly Zytaruk) - Add ACS quirk for all Cavium devices (Manish Jaggi) AER: - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas) - Restore pci_ops pointer while calling original pci_ops (David Daney) - Fix aer_inject error codes (Jean Delvare) - Use dev_warn() in aer_inject (Jean Delvare) - Log actual error causes in aer_inject (Jean Delvare) - Log aer_inject error injections (Jean Delvare) VPD: - Prevent VPD access for buggy devices (Babu Moger) - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas) - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas) - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas) - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas) - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas) - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas) - Update VPD definitions (Hannes Reinecke) - Allow access to VPD attributes with size 0 (Hannes Reinecke) - Determine actual VPD size on first access (Hannes Reinecke) Generic host bridge driver: - Move structure definitions to separate header file (David Daney) - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney) - Expose pci_host_common_probe() for use by other drivers (David Daney) Altera host bridge driver: - Fix altera_pcie_link_is_up() (Ley Foon Tan) Cavium ThunderX host bridge driver: - Add PCIe host driver for ThunderX processors (David Daney) - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney) Freescale i.MX6 host bridge driver: - Add DT bindings to configure PHY Tx driver settings (Justin Waters) - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach) - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach) - Remove broken Gen2 workaround (Lucas Stach) - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach) Freescale Layerscape host bridge driver: - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi) Intel VMD host bridge driver: - Attach VMD resources to parent domain's resource tree (Jon Derrick) - Set bus resource start to 0 (Keith Busch) Microsoft Hyper-V host bridge driver: - Add fwnode_handle to x86 pci_sysdata (Jake Oshins) - Look up IRQ domain by fwnode_handle (Jake Oshins) - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins) NVIDIA Tegra host bridge driver: - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding) - Implement ->{add,remove}_bus() callbacks (Thierry Reding) - Remove unused struct tegra_pcie.num_ports field (Thierry Reding) - Track bus -> CPU mapping (Thierry Reding) - Remove misleading PHYS_OFFSET (Thierry Reding) Renesas R-Car host bridge driver: - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman) Synopsys DesignWare host bridge driver: - ARC: Add PCI support (Joao Pinto) - Add generic dw_pcie_wait_for_link() (Joao Pinto) - Add default link up check if sub-driver doesn't override (Joao Pinto) - Add driver for prototyping kits based on ARC SDP (Joao Pinto) TI Keystone host bridge driver: - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin) Xilinx AXI host bridge driver: - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada) - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada) - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada) - Update Zynq binding with Microblaze node (Bharat Kumar Gogada) - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada) Xilinx NWL host bridge driver: - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada) Miscellaneous: - Check device_attach() return value always (Bjorn Helgaas) - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas) - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas) - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove includes of asm/pci-bridge.h (Bjorn Helgaas) - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas) - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas) - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler) - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas) - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa) - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig) - Move pci_dma_* helpers to common code (Christoph Hellwig) - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus) - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson) - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)" * tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits) PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition PCI: designware: Add driver for prototyping kits based on ARC SDP PCI: designware: Add default link up check if sub-driver doesn't override PCI: designware: Add generic dw_pcie_wait_for_link() PCI: Cleanup pci/pcie/Kconfig whitespace PCI: Simplify pci_create_attr() control flow PCI: Don't leak memory if sysfs_create_bin_file() fails PCI: Simplify sysfs ROM cleanup PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource MIPS: Loongson 3: Use temporary struct resource * to avoid repetition ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource ia64/PCI: Use ioremap() instead of open-coded equivalent ia64/PCI: Use temporary struct resource * to avoid repetition PCI: Clean up pci_map_rom() whitespace PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices PCI: thunder: Add PCIe host driver for ThunderX processors PCI: generic: Expose pci_host_common_probe() for use by other drivers PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe() ...
2016-03-11virtio_balloon: Allow to resize and update the balloon stats in parallelPetr Mladek
The virtio balloon statistics are not updated when the balloon is being resized. But it seems that both tasks could be done in parallel. stats_handle_request() updates the statistics in the balloon structure and then communicates with the host. update_balloon_stats() calls all_vm_events() that just reads some per-CPU variables. The values might change during and after the call but it is expected and happens even without this patch. update_balloon_stats() also calls si_meminfo(). It is a bit more complex function. It too just reads some variables and looks lock-less safe. In each case, it seems to be called lock-less on several similar locations, e.g. from post_status() in dm_thread_func(), or from vmballoon_send_get_target(). The communication with the host is done via a separate virtqueue, see vb->stats_vq vs. vb->inflate_vq and vb->deflate_vq. Therefore it could be used in parallel with fill_balloon() and leak_balloon(). This patch splits the existing work into two pieces. One is for updating the balloon stats. The other is for resizing of the balloon. It seems that they can be proceed in parallel without any extra locking. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11virtio_balloon: Use a workqueue instead of "vballoon" kthreadPetr Mladek
This patch moves the deferred work from the "vballoon" kthread into a system freezable workqueue. We do not need to maintain and run a dedicated kthread. Also the event driven workqueues API makes the logic much easier. Especially, we do not longer need an own wait queue, wait function, and freeze point. The conversion is pretty straightforward. One cycle of the main loop is put into a work. The work is queued instead of waking the kthread. fill_balloon() and leak_balloon() have a limit for the amount of modified pages. The work re-queues itself when necessary. For this, we make fill_balloon() to return the number of really modified pages. Note that leak_balloon() already did this. virtballoon_restore() queues the work only when really needed. The only complication is that we need to prevent queuing the work when the balloon is being removed. It was easier before because the kthread simply removed itself from the wait queue. We need an extra boolean and spin lock now. My initial idea was to use a dedicated workqueue. Michael S. Tsirkin suggested using a system one. Tejun Heo confirmed that the system workqueue has a pretty high concurrency level (256) by default. Therefore we need not be afraid of too long blocking. Signed-off-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-08PCI: Add QEMU top-level IDs for (sub)vendor & deviceRobin H. Johnson
Introduce PCI_VENDOR/PCI_SUBVENDOR/PCI_SUBDEVICE defines to replace the constants scattered in the kernel already used to detect QEMU. They are defined in the QEMU codebase per docs/specs/pci-ids.txt. Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-03-02vring: Use the DMA API on XenAndy Lutomirski
Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
2016-03-02virtio_pci: Use the DMA API if enabledAndy Lutomirski
This switches to vring_create_virtqueue, simplifying the driver and adding DMA API support. This fixes virtio-pci on platforms and busses that have IOMMUs. This will break the experimental QEMU Q35 IOMMU support until QEMU is fixed. In exchange, it fixes physical virtio hardware as well as virtio-pci running under Xen. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02virtio_mmio: Use the DMA API if enabledAndy Lutomirski
This switches to vring_create_virtqueue, simplifying the driver and adding DMA API support. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02virtio: Add improved queue allocation APIAndy Lutomirski
This leaves vring_new_virtqueue alone for compatbility, but it adds two new improved APIs: vring_create_virtqueue: Creates a virtqueue backed by automatically allocated coherent memory. (Some day it this could be extended to support non-coherent memory, too, if there ends up being a platform on which it's worthwhile.) __vring_new_virtqueue: Creates a virtqueue with a manually-specified layout. This should allow mic_virtio to work much more cleanly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02virtio_ring: Support DMA APIsAndy Lutomirski
virtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02vring: Introduce vring_use_dma_api()Andy Lutomirski
This is a kludge, but no one has come up with a a better idea yet. We'll introduce DMA API support guarded by vring_use_dma_api(). Eventually we may be able to return true on more and more systems, and hopefully we can get rid of vring_use_dma_api() entirely some day. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02virtio-pci: read the right virtio_pci_notify_cap fieldLadi Prosek
Looks like a copy-paste bug. The value is used as an optimization and a wrong value probably isn't causing any serious damage. Found when porting this code to Windows. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-26virtio_pci: fix use after free on releaseMichael S. Tsirkin
KASan detected a use-after-free error in virtio-pci remove code. In virtio_pci_remove(), vp_dev is still used after being freed in unregister_virtio_device() (in virtio_pci_release_dev() more precisely). To fix, keep a reference until cleanup is done. Fixes: 63bd62a08ca4 ("virtio_pci: defer kfree until release callback") Reported-by: Jerome Marchand <jmarchan@redhat.com> Cc: stable@vger.kernel.org Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Jerome Marchand <jmarchan@redhat.com>
2016-01-12virtio: make find_vqs() checkpatch.pl-friendlyStefan Hajnoczi
checkpatch.pl wants arrays of strings declared as follows: static const char * const names[] = { "vq-1", "vq-2", "vq-3" }; Currently the find_vqs() function takes a const char *names[] argument so passing checkpatch.pl's const char * const names[] results in a compiler error due to losing the second const. This patch adjusts the find_vqs() prototype and updates all virtio transports. This makes it possible for virtio_balloon.c, virtio_input.c, virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly type. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
2016-01-12virtio_balloon: fix race by fill and leakMinchan Kim
During my compaction-related stuff, I encountered a bug with ballooning. With repeated inflating and deflating cycle, guest memory( ie, cat /proc/meminfo | grep MemTotal) is decreased and couldn't be recovered. The reason is balloon_lock doesn't cover release_pages_balloon so struct virtio_balloon fields could be overwritten by race of fill_balloon(e,g, vb->*pfns could be critical). This patch fixes it in my test. Cc: <stable@vger.kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-12virtio_ring: use virt_store_mbMichael S. Tsirkin
We need a full barrier after writing out event index, using virt_store_mb there seems better than open-coding. As usual, we need a wrapper to account for strong barriers. It's tempting to use this in vhost as well, for that, we'll need a variant of smp_store_mb that works on __user pointers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-12-07virtio_ring: shadow available ring flags & indexVenkatesh Srinivas
Improves cacheline transfer flow of available ring header. Virtqueues are implemented as a pair of rings, one producer->consumer avail ring and one consumer->producer used ring; preceding the avail ring in memory are two contiguous u16 fields -- avail->flags and avail->idx. A producer posts work by writing to avail->idx and a consumer reads avail->idx. The flags and idx fields only need to be written by a producer CPU and only read by a consumer CPU; when the producer and consumer are running on different CPUs and the virtio_ring code is structured to only have source writes/sink reads, we can continuously transfer the avail header cacheline between 'M' states between cores. This flow optimizes core -> core bandwidth on certain CPUs. (see: "Software Optimization Guide for AMD Family 15h Processors", Section 11.6; similar language appears in the 10h guide and should apply to CPUs w/ exclusive caches, using LLC as a transfer cache) Unfortunately the existing virtio_ring code issued reads to the avail->idx and read-modify-writes to avail->flags on the producer. This change shadows the flags and index fields in producer memory; the vring code now reads from the shadows and only ever writes to avail->flags and avail->idx, allowing the cacheline to transfer core -> core optimally. In a concurrent version of vring_bench, the time required for 10,000,000 buffer checkout/returns was reduced by ~2% (average across many runs) on an AMD Piledriver (15h) CPU: (w/o shadowing): Performance counter stats for './vring_bench': 5,451,082,016 L1-dcache-loads ... 2.221477739 seconds time elapsed (w/ shadowing): Performance counter stats for './vring_bench': 5,405,701,361 L1-dcache-loads ... 2.168405376 seconds time elapsed The further away (in a NUMA sense) virtio producers and consumers are from each other, the more we expect to benefit. Physical implementations of virtio devices and implementations of virtio where the consumer polls vring avail indexes (vhost) should also benefit. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-07virtio: Do not drop __GFP_HIGH in alloc_indirectMichal Hocko
b92b1b89a33c ("virtio: force vring descriptors to be allocated from lowmem") tried to exclude highmem pages for descriptors so it cleared __GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH which doesn't make much sense for this fix because __GFP_HIGH only controls access to memory reserves and it doesn't have any influence on the zone selection. Some of the call paths use GFP_ATOMIC and dropping __GFP_HIGH will reduce their changes for success because the lack of access to memory reserves. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
2015-12-07virtio: fix memory leak of virtio ida cache layersSuman Anna
The virtio core uses a static ida named virtio_index_ida for assigning index numbers to virtio devices during registration. The ida core may allocate some internal idr cache layers and an ida bitmap upon any ida allocation, and all these layers are truely freed only upon the ida destruction. The virtio_index_ida is not destroyed at present, leading to a memory leak when using the virtio core as a module and atleast one virtio device is registered and unregistered. Fix this by invoking ida_destroy() in the virtio core module exit. Cc: stable@vger.kernel.org Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08virtio_balloon: do not change memory amount visible via /proc/meminfoDenis V. Lunev
Balloon device is frequently used as a mean of cooperative memory control in between guest and host to manage memory overcommitment. This is the typical case for any hosting workload when KVM guest is provided for end-user. Though there is a problem in this setup. The end-user and hosting provider have signed SLA agreement in which some amount of memory is guaranted for the guest. The good thing is that this memory will be given to the guest when the guest will really need it (f.e. with OOM in guest and with VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing is that end-user does not know this. Balloon by default reduce the amount of memory exposed to the end-user each time when the page is stolen from guest or returned back by using adjust_managed_page_count and thus /proc/meminfo shows reduced amount of memory. Fortunately the solution is simple, we should just avoid to call adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08virtio_ballon: change stub of release_pages_by_pfnDenis V. Lunev
and rename it to release_pages_balloon. The function originally takes arrays of pfns and now it takes pointer to struct virtio_ballon. This change is necessary to conditionally call adjust_managed_page_count in the next patch. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-08virtio_mmio: add ACPI probingGraeme Gregory
Added the match table and pointers for ACPI probing to the driver. This uses the same identifier for virt devices as being used for qemu ARM64 ACPI support. http://git.linaro.org/people/shannon.zhao/qemu.git/commit/d0bf1955a3ecbab4b51d46f8c5dda02b7e14a17e Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-08-06virtio-input: reset device and detach unused during removeJason Wang
Spec requires a device reset during cleanup, so do it and avoid warn in virtio core. And detach unused buffers to avoid memory leak. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-07-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost cross endian support from Michael Tsirkin: "I have just queued some more bugfix patches today but none fix regressions and none are related to these ones, so it looks like a good time for a merge for -rc1. The motivation for this is support for legacy BE guests on the new LE hosts. There are two redeeming properties that made me merge this: - It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY, and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce the second point: prevent people from enabling it eg on x86. I will look into this" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-pci: alloc only resources actually used. macvtap/tun: cross-endian support for little-endian hosts vhost: cross-endian support for legacy devices virtio: add explicit big-endian support to memory accessors vhost: introduce vhost_is_little_endian() helper vringh: introduce vringh_is_little_endian() helper macvtap: introduce macvtap_is_little_endian() helper tun: add tun_is_little_endian() helper virtio: introduce virtio_is_little_endian() helper
2015-07-01Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits) modules: only use mod->param_lock if CONFIG_MODULES param: fix module param locks when !CONFIG_SYSFS. rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() module: add per-module param_lock module: make perm const params: suppress unused variable error, warn once just in case code changes. modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'. kernel/module.c: avoid ifdefs for sig_enforce declaration kernel/workqueue.c: remove ifdefs over wq_power_efficient kernel/params.c: export param_ops_bool_enable_only kernel/params.c: generalize bool_enable_only kernel/module.c: use generic module param operaters for sig_enforce kernel/params: constify struct kernel_param_ops uses sysfs: tightened sysfs permission checks module: Rework module_addr_{min,max} module: Use __module_address() for module_address_lookup() module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING module: Optimize __module_address() using a latched RB-tree rbtree: Implement generic latch_tree seqlock: Introduce raw_read_seqcount_latch() ...
2015-06-24virtio-pci: alloc only resources actually used.Gerd Hoffmann
Move resource allocation from common code to legacy and modern code. Only request resources actually used, i.e. bar0 in legacy mode and the bar(s) specified by capabilities in modern mode. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-06-23Merge tag 'pci-v4.2-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.2 merge window: Enumeration - Move pci_ari_enabled() to global header (Alex Williamson) - Account for ARI in _PRT lookups (Alex Williamson) - Remove unused pci_scan_bus_parented() (Yijing Wang) Resource management - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas) - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas) - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan) - Add pci_bus_addr_t (Yinghai Lu) PCI device hotplug - Wait for pciehp command completion where necessary (Alex Williamson) - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki) - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki) - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki) - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas) - Clean up pciehp debug logging (Bjorn Helgaas) Power management - Remove redundant PCIe port type checking (Yijing Wang) - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang) - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang) - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas) - Simplify Clock Power Management setting (Bjorn Helgaas) Virtualization - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus) MSI - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin) - Remove unused pci_msi_off() (Bjorn Helgaas) - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Drop pci_msi_off() calls during probe (Michael S. Tsirkin) APM X-Gene host bridge driver - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang) - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang) - Disable Configuration Request Retry Status for v1 silicon (Duc Dang) - Allow config access to Root Port even when link is down (Duc Dang) Broadcom iProc host bridge driver - Allow override of device tree IRQ mapping function (Hauke Mehrtens) - Add BCMA PCIe driver (Hauke Mehrtens) - Directly add PCI resources (Hauke Mehrtens) - Free resource list after registration (Hauke Mehrtens) Freescale i.MX6 host bridge driver - Add speed change timeout message (Troy Kisky) - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas) Freescale Layerscape host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) - Factor out ls_pcie_establish_link() (Bjorn Helgaas) Marvell MVEBU host bridge driver - Remove mvebu_pcie_scan_bus() (Yijing Wang) NVIDIA Tegra host bridge driver - Remove tegra_pcie_scan_bus() (Yijing Wang) Synopsys DesignWare host bridge driver - Consolidate outbound iATU programming functions (Jisheng Zhang) - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang) - Add support for x8 links (Zhou Wang) - Wait for link to come up with consistent style (Bjorn Helgaas) - Use pci_scan_root_bus() for simplicity (Yijing Wang) TI DRA7xx host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) Miscellaneous - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas) - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas) - Remove unused pcibios_select_root() (again) (Bjorn Helgaas) - Remove unused pci_dma_burst_advice() (Bjorn Helgaas) - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)" * tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: pciehp: Inline the "handle event" functions into the ISR PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event() PCI: pciehp: Make queue_interrupt_event() void PCI: xgene: Allow config access to Root Port even when link is down PCI: xgene: Disable Configuration Request Retry Status for v1 silicon PCI: pciehp: Clean up debug logging x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing PCI: imx6: Add #define PCIE_RC_LCSR PCI: imx6: Use "u32", not "uint32_t" PCI: Remove unused pci_scan_bus_parented() xen/pcifront: Don't use deprecated function pci_scan_bus_parented() PCI: imx6: Add speed change timeout message PCI/ASPM: Simplify Clock Power Management setting PCI: designware: Wait for link to come up with consistent style PCI: layerscape: Factor out ls_pcie_establish_link() PCI: layerscape: Use dw_pcie_link_up() consistently PCI: dra7xx: Use dw_pcie_link_up() consistently x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A PCI: pciehp: Wait for hotplug command completion where necessary PCI: Remove unused pci_dma_burst_advice() ...
2015-06-04virtio_pci: Clear stale cpumask when setting irq affinityJiang Liu
The cpumask vp_dev->msix_affinity_masks[info->msix_vector] may contain staled information when vp_set_vq_affinity() gets called, so clear it before setting the new cpu bit mask. Cc: stable@vger.kernel.org Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-05-28kernel/params: constify struct kernel_param_ops usesLuis R. Rodriguez
Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-07virtio_pci: drop pci_msi_off() call during probeMichael S. Tsirkin
The PCI core now disables MSI and MSI-X for all devices during enumeration regardless of CONFIG_PCI_MSI. Remove device-specific code to disable MSI/MSI-X. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-04-15virtio: drop virtio_device_is_legacy_onlyMichael S. Tsirkin
virtio_device_is_legacy_only is now unused, drop it from core. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15virtio_pci: support non-legacy balloon devicesMichael S. Tsirkin
virtio_device_is_legacy_only is always false now, drop the test from virtio pci. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15virtio_mmio: support non-legacy balloon devicesMichael S. Tsirkin
virtio_device_is_legacy_only is always false now, drop the test from virtio mmio. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15virtio: balloon might not be a legacy deviceMichael S. Tsirkin
We added transitional device support to balloon driver, so we don't need to black-list it in core anymore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15virtio_balloon: transitional interfaceMichael S. Tsirkin
Virtio 1.0 doesn't include a modern balloon device. But it's not a big change to support a transitional balloon device: this has the advantage of supporting existing drivers, transparently. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-01virtio_pci_modern: switch to type-safe io accessorsMichael S. Tsirkin
As Rusty noted, we were accessing queue_enable with an incorrect width. Switch to type-safe accessors so we don't make this mistake again in the future. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-01virtio_pci_modern: type-safe io accessorsMichael S. Tsirkin
The spec is very clear on this: 4.1.3.1 Driver Requirements: PCI Device Layout The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. Add type-safe wrappers to prevent access with incorrect width. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-29Add virtio-input driver.Gerd Hoffmann
virtio-input is basically evdev-events-over-virtio, so this driver isn't much more than reading configuration from config space and forwarding incoming events to the linux input layer. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-17virtio_mmio: fix access width for mmioMichael S. Tsirkin
Going over the virtio mmio code, I noticed that it doesn't correctly access modern device config values using "natural" accessors: it uses readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states: 4.2.2.2 Driver Requirements: MMIO Device Register Layout ... The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Borrow code from virtio_pci_modern to do this correctly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-13virtio_mmio: generation supportMichael S. Tsirkin
virtio_mmio currently lacks generation support which makes multi-byte field access racy. Fix by getting the value at offset 0xfc for version 2 devices. Nothing we can do for version 1, so return generation id 0. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-10virtio-balloon: do not call blocking ops when !TASK_RUNNINGMichael S. Tsirkin
virtio balloon has this code: wait_event_interruptible(vb->config_change, (diff = towards_target(vb)) != 0 || vb->need_stats_update || kthread_should_stop() || freezing(current)); Which is a problem because towards_target() call might block after wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing the task_struct::state collision typical of nesting of sleeping primitives See also http://lwn.net/Articles/628628/ or Thomas's bug report http://article.gmane.org/gmane.linux.kernel.virtualization/24846 for a fuller explanation. To fix, rewrite using wait_woken. Cc: stable@vger.kernel.org Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-10virtio_balloon: set DRIVER_OK before using deviceMichael S. Tsirkin
virtio spec requires that all drivers set DRIVER_OK before using devices. While balloon isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio balloon might violate this rule: probe calls kthread_run before setting DRIVER_OK, which might run immediately and cause balloon to inflate/deflate. To fix, call virtio_device_ready before running the kthread. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
2015-02-17virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.Rusty Russell
I noticed this with the console device. It's not *wrong*, just a bit weird. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-11virtio_pci: use 16-bit accessor for queue_enable.Rusty Russell
Since PCI is little endian, 8-bit access might work, but the spec section is very clear on this: 4.1.3.1 Driver Requirements: PCI Device Layout The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>