summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2015-12-14sfc: push partner queue for skb->xmit_moreMartin Habets
[ Upstream commit b2663a4f30e85ec606b806f5135413e6d5c78d1e ] When the IP stack passes SKBs the sfc driver puts them in 2 different TX queues (called partners), one for checksummed and one for not checksummed. If the SKB has xmit_more set the driver will delay pushing the work to the NIC. When later it does decide to push the buffers this patch ensures it also pushes the partner queue, if that also has any delayed work. Before this fix the work in the partner queue would be left for a long time and cause a netdev watchdog. Fixes: 70b33fb ("sfc: add support for skb->xmit_more") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-12-03stmmac: Correctly report PTP capabilities.Phil Reid
[ Upstream commit e6dbe1eb2db0d7a14991c06278dd3030c45fb825 ] priv->hwts_*_en indicate if timestamping is enabled/disabled at run time. But priv->dma_cap.time_stamp and priv->dma_cap.atime_stamp indicates HW is support for PTPv1/PTPv2. Signed-off-by: Phil Reid <preid@electromag.com.au> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-12-03net/mlx4: Copy/set only sizeof struct mlx4_eqe bytesCarol L Soto
[ Upstream commit c02b05011fadf8e409e41910217ca689f2fc9d91 ] When doing memcpy/memset of EQEs, we should use sizeof struct mlx4_eqe as the base size and not caps.eqe_size which could be bigger. If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt data in the master context. When using a 64 byte stride, the memcpy copied over 63 bytes to the slave_eq structure. This resulted in copying over the entire eqe of interest, including its ownership bit -- and also 31 bytes of garbage into the next WQE in the slave EQ -- which did NOT include the ownership bit (and therefore had no impact). However, once the stride is increased to 128, we are overwriting the ownership bits of *three* eqes in the slave_eq struct. This results in an incorrect ownership bit for those eqes, which causes the eq to seem to be full. The issue therefore surfaced only once 128-byte EQEs started being used in SRIOV and (overarchitectures that have 128/256 byte cache-lines such as PPC) - e.g after commit 77507aa249ae "net/mlx4_core: Enable CQE/EQE stride support". Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-11-15sfc: Fix memcpy() with const destination compiler warning.David S. Miller
[ Upstream commit 1d20a16062e771b6e26b843c0cde3b17c1146e00 ] drivers/net/ethernet/sfc/selftest.c: In function ‘efx_iterate_state’: drivers/net/ethernet/sfc/selftest.c:388:9: warning: passing argument 1 of ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers] This is because the msg[] member of struct efx_loopback_payload is marked as 'const'. Remove that. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27igb: do not re-init SR-IOV during probeStefan Assmann
[ Upstream commit 6423fc34160939142d72ffeaa2db6408317f54df ] During driver probing the following code path is triggered. igb_probe ->igb_sw_init ->igb_probe_vfs ->igb_pci_enable_sriov ->igb_sriov_reinit Doing the SR-IOV re-init is not necessary during probing since we're starting from scratch. Here we can call igb_enable_sriov() right away. Running igb_sriov_reinit() during igb_probe() also seems to cause occasional packet loss on some onboard 82576 NICs. Reproduced on Dell and HP servers with onboard 82576 NICs. Example: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) Subsystem: Dell Device [1028:0481] Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27net: eth: altera: fix napi poll_list corruptionAtsushi Nemoto
[ Upstream commit 4548a697e4969d695047cebd6d9af5e2f6cc728e ] tse_poll() calls __napi_complete() with irq enabled. This leads napi poll_list corruption and may stop all napi drivers working. Use napi_complete() instead of __napi_complete(). Signed-off-by: Atsushi Nemoto <nemoto@toshiba-tops.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27igb: Fix oops caused by missing queue pairingShota Suzuki
[ Upstream commit 72ddef0506da852dc82f078f37ced8ef4d74a2bf ] When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is set if adapter->rss_queues exceeds half of max_rss_queues in igb_init_queue_configuration(). On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of queues exceeds half of max_combined in igb_set_channels() when changing the number of queues by "ethtool -L". In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size of adapter->msix_entries[], an overflow can occur in igb_set_interrupt_capability(), which in turn leads to an oops. Fix this problem as follows: - When changing the number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver. - When increasing the size of q_vector, reallocate it appropriately. (With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.) Another possible way to fix this problem is to cap the queues at its initial number, which is the number of the initial online cpus. But this is not the optimal way because we cannot increase queues when another cpu becomes online. Note that before commit cd14ef54d25b ("igb: Change to use statically allocated array for MSIx entries"), this problem did not cause oops but just made the number of queues become 1 because of entering msi_only mode in igb_set_interrupt_capability(). Fixes: 907b7835799f ("igb: Add ethtool support to configure number of channels") CC: stable <stable@vger.kernel.org> Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27bna: fix interrupts storm caused by erroneous packetsIvan Vecera
[ Upstream commit ade4dc3e616e33c80d7e62855fe1b6f9895bc7c3 ] The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter increment from the beginning of the NAPI processing loop after the check for erroneous packets so they are never accounted. This counter is used to inform firmware about number of processed completions (packets). As these packets are never acked the firmware fires IRQs for them again and again. Fixes: e29aa33 ("bna: Enable Multi Buffer RX") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-07tg3: Fix temperature reportingJean Delvare
[ Upstream commit d3d11fe08ccc9bff174fc958722b5661f0932486 ] The temperature registers appear to report values in degrees Celsius while the hwmon API mandates values to be exposed in millidegrees Celsius. Do the conversion so that the values reported by "sensors" are correct. Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature") Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: stable@vger.kernel.org [v3.6+] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-09-28net/mlx4_core: Fix wrong index in propagating port change event to VFsJack Morgenstein
[ Upstream commit 1c1bf34951e8d17941bf708d1901c47e81b15d55 ] The port-change event processing in procedure mlx4_eq_int() uses "slave" as the vf_oper array index. Since the value of "slave" is the PF function index, the result is that the PF link state is used for deciding to propagate the event for all the VFs. The VF link state should be used, so the VF function index should be used here. Fixes: 948e306d7d64 ('net/mlx4: Add VF link state support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05bnx2x: fix lockdep splatEric Dumazet
[ Upstream commit d53c66a5b80698620f7c9ba2372fff4017e987b8 ] Michel reported following lockdep splat [ 44.718117] INFO: trying to register non-static key. [ 44.723081] the code is fine but needs lockdep annotation. [ 44.728559] turning off the locking correctness validator. [ 44.734036] CPU: 8 PID: 5483 Comm: ethtool Not tainted 4.1.0 [ 44.770289] Call Trace: [ 44.772741] [<ffffffff816eb1cd>] dump_stack+0x4c/0x65 [ 44.777879] [<ffffffff8111d921>] ? console_unlock+0x1f1/0x510 [ 44.783708] [<ffffffff811121f5>] __lock_acquire+0x1d05/0x1f10 [ 44.789538] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90 [ 44.795276] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 44.801967] [<ffffffff8111390d>] ? trace_hardirqs_on+0xd/0x10 [ 44.807793] [<ffffffff811330fa>] ? hrtimer_try_to_cancel+0x4a/0x250 [ 44.814142] [<ffffffff81112ba6>] lock_acquire+0xb6/0x290 [ 44.819537] [<ffffffff810d6675>] ? flush_work+0x5/0x280 [ 44.824844] [<ffffffff810d66ad>] flush_work+0x3d/0x280 [ 44.830061] [<ffffffff810d6675>] ? flush_work+0x5/0x280 [ 44.835366] [<ffffffff816f3c43>] ? schedule_hrtimeout_range+0x13/0x20 [ 44.841889] [<ffffffff8112ec9b>] ? usleep_range+0x4b/0x50 [ 44.847365] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90 [ 44.853102] [<ffffffff810d8585>] ? __cancel_work_timer+0x105/0x1c0 [ 44.859359] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 44.866045] [<ffffffff810d851f>] __cancel_work_timer+0x9f/0x1c0 [ 44.872048] [<ffffffffa0010982>] ? bnx2x_func_stop+0x42/0x90 [bnx2x] [ 44.878481] [<ffffffff810d8670>] cancel_work_sync+0x10/0x20 [ 44.884134] [<ffffffffa00259e5>] bnx2x_chip_cleanup+0x245/0x730 [bnx2x] [ 44.890829] [<ffffffff8110ce02>] ? up+0x32/0x50 [ 44.895439] [<ffffffff811306b5>] ? del_timer_sync+0x5/0xd0 [ 44.901005] [<ffffffffa005596d>] bnx2x_nic_unload+0x20d/0x8e0 [bnx2x] [ 44.907527] [<ffffffff811f1aef>] ? might_fault+0x5f/0xb0 [ 44.912921] [<ffffffffa005851c>] bnx2x_reload_if_running+0x2c/0x50 [bnx2x] [ 44.919879] [<ffffffffa005a3c5>] bnx2x_set_ringparam+0x2b5/0x460 [bnx2x] [ 44.926664] [<ffffffff815d498b>] dev_ethtool+0x55b/0x1c40 [ 44.932148] [<ffffffff815dfdc7>] ? rtnl_lock+0x17/0x20 [ 44.937364] [<ffffffff815e7f8b>] dev_ioctl+0x17b/0x630 [ 44.942582] [<ffffffff815abf8d>] sock_do_ioctl+0x5d/0x70 [ 44.947972] [<ffffffff815ac013>] sock_ioctl+0x73/0x280 [ 44.953192] [<ffffffff8124c1c8>] do_vfs_ioctl+0x88/0x5b0 [ 44.958587] [<ffffffff8110d0b3>] ? up_read+0x23/0x40 [ 44.963631] [<ffffffff812584cc>] ? __fget_light+0x6c/0xa0 [ 44.969105] [<ffffffff8124c781>] SyS_ioctl+0x91/0xb0 [ 44.974149] [<ffffffff816f4dd7>] system_call_fastpath+0x12/0x6f As bnx2x_init_ptp() is only called if bp->flags contains PTP_SUPPORTED, we also need to guard bnx2x_stop_ptp() with same condition, otherwise ptp_task workqueue is not initialized and kernel barfs on cancel_work_sync() Fixes: eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support") Reported-by: Michel Lespinasse <walken@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michal Kalderon <Michal.Kalderon@qlogic.com> Cc: Ariel Elior <Ariel.Elior@qlogic.com> Cc: Yuval Mintz <Yuval.Mintz@qlogic.com> Cc: David Decotigny <decot@google.com> Acked-by: Sony Chacko <sony.chacko@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05net/mlx4_en: Wake TX queues only when there's enough roomIdo Shamay
[ Upstream commit 488a9b48e398b157703766e2cd91ea45ac6997c5 ] Indication of a single completed packet, marked by txbbs_skipped being bigger then zero, in not enough in order to wake up a stopped TX queue. The completed packet may contain a single TXBB, while next packet to be sent (after the wake up) may have multiple TXBBs (LSO/TSO packets for example), causing overflow in queue followed by WQE corruption and TX queue timeout. Instead, wake the stopped queue only when there's enough room for the worst case (maximum sized WQE) packet that we should need to handle after the queue is opened again. Also created an helper routine - mlx4_en_is_tx_ring_full, which checks if the current TX ring is full or not. It provides better code readability and removes code duplication. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05net: mvneta: disable IP checksum with jumbo frames for Armada 370Simon Guinot
[ Upstream commit b65657fc240ae6c1d2a1e62db9a0e61ac9631d7a ] The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't support TCP/IP checksumming with frame sizes larger than 1600 bytes. This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set to a value greater than 1600 bytes. Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: <stable@vger.kernel.org> # v3.8+ Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05net: mvneta: introduce compatible string "marvell, armada-xp-neta"Simon Guinot
[ Upstream commit f522a975a8101895a85354b9c143f41b8248e71a ] The mvneta driver supports the Ethernet IP found in the Armada 370, XP, 380 and 385 SoCs. Since at least one more hardware feature is available for the Armada XP SoCs then a way to identify them is needed. This patch introduces a new compatible string "marvell,armada-xp-neta". Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: <stable@vger.kernel.org> # v3.8+ Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05stmmac: troubleshoot unexpected bits in des0 & des1Alexey Brodkin
[ Upstream commit f1590670ce069eefeb93916391a67643e6ad1630 ] Current implementation of descriptor init procedure only takes care about setting/clearing ownership flag in "des0"/"des1" fields while it is perfectly possible to get unexpected bits set because of the following factors: [1] On driver probe underlying memory allocated with dma_alloc_coherent() might not be zeroed and so it will be filled with garbage. [2] During driver operation some bits could be set by SD/MMC controller (for example error flags etc). And unexpected and/or randomly set flags in "des0"/"des1" fields may lead to unpredictable behavior of GMAC DMA block. This change addresses both items above with: [1] Use of dma_zalloc_coherent() instead of simple dma_alloc_coherent() to make sure allocated memory is zeroed. That shouldn't affect performance because this allocation only happens once on driver probe. [2] Do explicit zeroing of both "des0" and "des1" fields of all buffer descriptors during initialization of DMA transfer. And while at it fixed identation of dma_free_coherent() counterpart as well. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: arc-linux-dev@synopsys.com Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit ↵Honggang LI
architectures [ Upstream commit 59d2d18cc4e9ba30b370db18d0e02d792699da96 ] If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical memory is more than 4GB, dma_map_page may return a valid memory address which greater than 0xffffffff. As a result, the mlx5 device page allocator RB tree will be initialized with valid addresses greater than 0xfffffff. However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's impossible for the function, free_4k, to release the pages whose addresses greater than 4GB. Memory leaks. And mlx5_ib module can't release the pages when user try to remove the module, as a result, system hang. [root@rdma05 root]# dmesg | grep addr | head addr = 3fe384000 addr & PAGE_MASK = fe384000 [root@rdma05 root]# rmmod mlx5_ib <---- hang on ---------------------- cosnole log ----------------- mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X alloc irq_desc for 139 on node -1 alloc kstat_irqs on node -1 mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found ---------------------- cosnole log ----------------- Fixes: bf0bf77f6519 ('mlx5: Support communicating arbitrary host page size to firmware') Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-15be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent()Sriharsha Basavapatna
[ Upstream commit e51000db4c880165eab06ec0990605f24e75203f ] There are several places in the driver (all in control paths) where coherent dma memory is being allocated using either dma_alloc_coherent() or the deprecated pci_alloc_consistent(). All these calls should be changed to use dma_zalloc_coherent() to avoid uninitialized fields in data structures backed by this memory. Reported-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17e1000: add dummy allocator to fix race condition between mtu change and netpollSabrina Dubroca
[ Upstream commit 08e8331654d1d7b2c58045e549005bc356aa7810 ] There is a race condition between e1000_change_mtu's cleanups and netpoll, when we change the MTU across jumbo size: Changing MTU frees all the rx buffers: e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings -> e1000_clean_rx_ring Then, close to the end of e1000_change_mtu: pr_info -> ... -> netpoll_poll_dev -> e1000_clean -> e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag And when we come back to do the rest of the MTU change: e1000_up -> e1000_configure -> e1000_configure_rx -> e1000_alloc_jumbo_rx_buffers alloc_jumbo finds the buffers already != NULL, since data (shared with page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage, or at least not what is expected when in jumbo state. This results in an unusable adapter (packets don't get through), and a NULL pointer dereference on the next call to e1000_clean_rx_ring (other mtu change, link down, shutdown): BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330 [...] Call Trace: [<ffffffff81195445>] put_page+0x55/0x60 [<ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200 [<ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60 [<ffffffff815df5e0>] e1000_down+0x1c0/0x1d0 [<ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840 [<ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170 [<ffffffff81647050>] dev_set_mtu+0xa0/0x140 [<ffffffff81664218>] do_setlink+0x218/0xac0 [<ffffffff814459e9>] ? nla_parse+0xb9/0x120 [<ffffffff816652d0>] rtnl_newlink+0x6d0/0x890 [<ffffffff8104f000>] ? kvm_clock_read+0x20/0x40 [<ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100 [<ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260 By setting the allocator to a dummy version, netpoll can't mess up our rx buffers. The allocator is set back to a sane value in e1000_configure_rx. Fixes: edbbb3ca1077 ("e1000: implement jumbo receive with partial descriptors") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net/mlx4_en: Schedule napi when RX buffers allocation failsIdo Shamay
[ Upstream commit 07841f9d94c11afe00c0498cf242edf4075729f4 ] When system is out of memory, refilling of RX buffers fails while the driver continue to pass the received packets to the kernel stack. At some point, when all RX buffers deplete, driver may fall into a sleep, and not recover when memory for new RX buffers is once again availible. This is because hardware does not have valid descriptors, so no interrupt will be generated for the driver to return to work in napi context. Fix it by schedule the napi poll function from stats_task delayed workqueue, as long as the allocations fail. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11mlx4: Fix tx ring affinity_mask creationBenjamin Poirier
[ Upstream commit 42eab005a5dd5d7ea2b0328aecc4d6cc0c23c9c2 ] By default, the number of tx queues is limited by the number of online cpus in mlx4_en_get_profile(). However, this limit no longer holds after the ethtool .set_channels method has been called. In that situation, the driver may access invalid bits of certain cpumask variables when queue_index >= nr_cpu_ids. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Acked-by: Ido Shamay <idos@mellanox.com> Fixes: d03a68f ("net/mlx4_en: Configure the XPS queue mapping on driver load") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()Jun'ichi Nomura \\\\(NEC\\\\)
[ Upstream commit d0af71a3573f1217b140c60b66f1a9b335fb058b ] tg3_init_one() calls tg3_halt() without tp->lock despite its assumption and causes deadlock. If lockdep is enabled, a warning like this shows up before the stall: [ BUG: bad unlock balance detected! ] 3.19.0test #3 Tainted: G E ------------------------------------- insmod/369 is trying to release lock (&(&tp->lock)->rlock) at: [<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3] but there are no more locks to release! tg3_init_one() doesn't call tg3_halt() under normal situation but during kexec kdump I hit this problem. Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: rename __vlan_put_tag to vlan_insert_tag_set_protoJiri Pirko
[ Upstream commit 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ] Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27bnx2x: Fix busy_poll vs netpollEric Dumazet
[ Upstream commit 074975d0374333f656c48487aa046a21a9b9d7a1 ] Commit 9a2620c877454 ("bnx2x: prevent WARN during driver unload") switched the napi/busy_lock locking mechanism from spin_lock() into spin_lock_bh(), breaking inter-operability with netconsole, as netpoll disables interrupts prior to calling our napi mechanism. This switches the driver into using atomic assignments instead of the spinlock mechanisms previously employed. Based on initial patch from Yuval Mintz & Ariel Elior I basically added softirq starvation avoidance, and mixture of atomic operations, plain writes and barriers. Note this slightly reduces the overhead for this driver when no busy_poll sockets are in use. Fixes: 9a2620c877454 ("bnx2x: prevent WARN during driver unload") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27net/mlx4_core: Fix error message deprecation for ConnectX-2 cardsJack Morgenstein
[ Upstream commit fde913e25496761a4e2a4c81230c913aba6289a2 ] Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") did the deprecation only for port 1 of the card. Need to deprecate for port 2 as well. Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27net/mlx4_en: Call register_netdevice in the proper locationIdo Shamay
[ Upstream commit e5eda89d97ec256ba14e7e861387cc0468259c18 ] Netdevice registration should be performed a the end of the driver initialization flow. If we don't do that, after calling register_netdevice, device callbacks may be issued by higher layers of the stack before final configuration of the device is done. For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued after the register_netdev command. System network scripts may configure the interface (UP) right after the registration, which also attach unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called, causing the firmware to fail the rule attachment. Fixes: 837052d0ccc5 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling") Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}Markos Chandras
[ Upstream commit 87f966d97b89774162df04d2106c6350c8fe4cb3 ] On a MIPS Malta board, tons of fifo underflow errors have been observed when using u-boot as bootloader instead of YAMON. The reason for that is that YAMON used to set the pcnet device to SRAM mode but u-boot does not. As a result, the default Tx threshold (64 bytes) is now too small to keep the fifo relatively used and it can result to Tx fifo underflow errors. As a result of which, it's best to setup the SRAM on supported controllers so we can always use the NOUFLO bit. Cc: <netdev@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Don Fry <pcnet32@frontier.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28bnx2x: Force fundamental reset for EEH recoveryBrian King
[ Upstream commit da293700568ed3d96fcf062ac15d7d7c41377f11 ] EEH recovery for bnx2x based adapters is not reliable on all Power systems using the default hot reset, which can result in an unrecoverable EEH error. Forcing the use of fundamental reset during EEH recovery fixes this. Cc: stable<stable@vger.kernel.org> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-24net: fec: fix receive VLAN CTAG HW acceleration issueNimrod Andy
[ Upstream commit af5cbc9822f6bbe399925760a4d5ee82c21f258c ] The current driver support receive VLAN CTAG HW acceleration feature (NETIF_F_HW_VLAN_CTAG_RX) through software simulation. There calls the api .skb_copy_to_linear_data_offset() to skip the VLAN tag, but there have overlap between the two memory data point range. The patch just fix the issue. V2: Michael Grzeschik suggest to use memmove() instead of skb_copy_to_linear_data_offset(). Reported-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Fixes: 1b7bde6d659d ("net: fec: implement rx_copybreak to improve rx performance") Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-14net: bcmgenet: fix throughtput regressionJaedon Shin
[ Upstream commit 4092e6acf5cb16f56154e2dd22d647023dc3d646 ] This patch adds bcmgenet_tx_poll for the tx_rings. This can reduce the interrupt load and send xmit in network stack on time. This also separated for the completion of tx_ring16 from bcmgenet_poll. The bcmgenet_tx_reclaim of tx_ring[{0,1,2,3}] operative by an interrupt is to be not more than a certain number TxBDs. It is caused by too slowly reclaiming the transmitted skb. Therefore, performance degradation of xmit after 605ad7f ("tcp: refine TSO autosizing"). Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-14Revert "r8169: add support for Byte Queue Limits"David S. Miller
This reverts commit 1e918876853aa85435e0f17fd8b4a92dcfff53d6. Revert BQL support in r8169 driver as several regressions point to this commit and we cannot figure out the real cause yet. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-02-26bnx2x: fix napi poll return value for repollGovindarajulu Varadarajan
[ Upstream commit 24e579c8898aa641ede3149234906982290934e5 ] With the commit d75b1ade567ffab ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. When in busy_poll is we return 0 in napi_poll. We should return budget. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26netxen: fix netxen_nic_poll() logicEric Dumazet
[ Upstream commit 6088beef3f7517717bd21d90b379714dd0837079 ] NAPI poll logic now enforces that a poller returns exactly the budget when it wants to be called again. If a driver limits TX completion, it has to return budget as well when the limit is hit, not the number of received packets. Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: d75b1ade567f ("net: less interrupt masking in NAPI") Cc: Manish Chopra <manish.chopra@qlogic.com> Acked-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05drivers: net: cpsw: discard dual emac default vlan configurationMugunthan V N
commit 02a54164c52ed6eca3089a0d402170fbf34d6cf5 upstream. In Dual EMAC, the default VLANs are used to segregate Rx packets between the ports, so adding the same default VLAN to the switch will affect the normal packet transfers. So returning error on addition of dual EMAC default VLANs. Even if EMAC 0 default port VLAN is added to EMAC 1, it will lead to break dual EMAC port separations. Fixes: d9ba8f9e6298 (driver: net: ethernet: cpsw: dual emac interface implementation) Reported-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27drivers: net: cpsw: fix multicast flush in dual emac modeMugunthan V N
commit 25906052d953d3fbdb7e19480b9de5e6bb949f3f upstream. Since ALE table is a common resource for both the interfaces in Dual EMAC mode and while bringing up the second interface in cpsw_ndo_set_rx_mode() all the multicast entries added by the first interface is flushed out and only second interface multicast addresses are added. Fixing this by flushing multicast addresses based on dual EMAC port vlans which will not affect the other emac port multicast addresses. Fixes: d9ba8f9 (driver: net: ethernet: cpsw: dual emac interface implementation) Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27i40e: adds FCoE configure optionVasu Dev
commit 776d4e9f5c0229037707f692b386b1f2a5bac054 upstream. Adds FCoE config option I40E_FCOE, so that FCoE can be enabled as needed but otherwise have it disabled by default. This also eliminate multiple FCoE config checks, instead now just one config check for CONFIG_I40E_FCOE. The I40E FCoE was added with 3.17 kernel and therefore this patch shall be applied to stable 3.17 kernel also. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27net: ethernet: cpsw: fix hangs with interruptsFelipe Balbi
commit 7ce67a38f799d1fb332f672b117efbadedaa5352 upstream. The CPSW IP implements pulse-signaled interrupts. Due to that we must write a correct, pre-defined value to the CPDMA_MACEOIVECTOR register so the controller generates a pulse on the correct IRQ line to signal the End Of Interrupt. The way the driver is written today, all four IRQ lines are requested using the same IRQ handler and, because of that, we could fall into situations where a TX IRQ fires but we tell the controller that we ended an RX IRQ (or vice-versa). This situation triggers an IRQ storm on the reserved IRQ 127 of INTC which will in turn call ack_bad_irq() which will, then, print a ton of: unexpected IRQ trap at vector 00 In order to fix the problem, we are moving all calls to cpdma_ctlr_eoi() inside the IRQ handler and making sure we *always* write the correct value to the CPDMA_MACEOIVECTOR register. Note that the algorithm assumes that IRQ numbers and value-to-be-written-to-EOI are proportional, meaning that a write of value 0 would trigger an EOI pulse for the RX_THRESHOLD Interrupt and that's the IRQ number sitting in the 0-th index of our irqs_table array. This, however, is safe at least for current implementations of CPSW so we will refrain from making the check smarter (and, as a side-effect, slower) until we actually have a platform where IRQ lines are swapped. This patch has been tested for several days with AM335x- and AM437x-based platforms. AM57x was left out because there are still pending patches to enable ethernet in mainline for that platform. A read of the TRM confirms the statement on previous paragraph. Reported-by: Yegor Yefremov <yegorslists@googlemail.com> Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly) Signed-off-by: Felipe Balbi <balbi@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27enic: fix rx skb checksumGovindarajulu Varadarajan
[ Upstream commit 17e96834fd35997ca7cdfbf15413bcd5a36ad448 ] Hardware always provides compliment of IP pseudo checksum. Stack expects whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set. This causes checksum error in nf & ovs. kernel: qg-19546f09-f2: hw csum failure kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF O-------------- 3.10.0-123.8.1.el7.x86_64 #1 kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014 kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0 kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00 kernel: Call Trace: kernel: <IRQ> [<ffffffff815e237b>] dump_stack+0x19/0x1b kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40 kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70 kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20 kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100 kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4] kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0 kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch] kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack] kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40 kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4] kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0 kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40 kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140 kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40 kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380 Hardware verifies IP & tcp/udp header checksum but does not provide payload checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet. Cc: Jiri Benc <jbenc@redhat.com> Cc: Stefan Assmann <sassmann@redhat.com> Reported-by: Sunil Choudhary <schoudha@redhat.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27alx: fix alx_poll()Eric Dumazet
[ Upstream commit 7a05dc64e2e4c611d89007b125b20c0d2a4d31a5 ] Commit d75b1ade567f ("net: less interrupt masking in NAPI") uncovered wrong alx_poll() behavior. A NAPI poll() handler is supposed to return exactly the budget when/if napi_complete() has not been called. It is also supposed to return number of frames that were received, so that netdev_budget can have a meaning. Also, in case of TX pressure, we still have to dequeue received packets : alx_clean_rx_irq() has to be called even if alx_clean_tx_irq(alx) returns false, otherwise device is half duplex. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: d75b1ade567f ("net: less interrupt masking in NAPI") Reported-by: Oded Gabbay <oded.gabbay@amd.com> Bisected-by: Oded Gabbay <oded.gabbay@amd.com> Tested-by: Oded Gabbay <oded.gabbay@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flowMaor Gottlieb
[ Upstream commit a51e0df4c1e06afd7aba84496c14238e6b363caa ] Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the old MTT's page shift, which could result in using an incorrect offset. Fix the initialization to be after we calculate the new MTT offset. In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This is necessary in order to mark the MTT as invalid and avoid freeing it later. Fixes: e630664 ('mlx4_core: Add helper functions to support MR re-registration') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27net: Generalize ndo_gso_check to ndo_features_checkJesse Gross
[ Upstream commit 5f35227ea34bb616c436d9da47fc325866c428f3 ] GSO isn't the only offload feature with restrictions that potentially can't be expressed with the current features mechanism. Checksum is another although it's a general issue that could in theory apply to anything. Even if it may be possible to implement these restrictions in other ways, it can result in duplicate code or inefficient per-packet behavior. This generalizes ndo_gso_check so that drivers can remove any features that don't make sense for a given packet, similar to netif_skb_features(). It also converts existing driver restrictions to the new format, completing the work that was done to support tunnel protocols since the issues apply to checksums as well. By actually removing features from the set that are used to do offloading, it solves another problem with the existing interface. In these cases, GSO would run with the original set of features and not do anything because it appears that segmentation is not required. CC: Tom Herbert <therbert@google.com> CC: Joe Stringer <joestringer@nicira.com> CC: Eric Dumazet <edumazet@google.com> CC: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Tom Herbert <therbert@google.com> Fixes: 04ffcb255f22 ("net: Add ndo_gso_check") Tested-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27net/mlx4_en: Doorbell is byteswapped in Little Endian archsAmir Vadai
[ Upstream commit 492f5add4be84652bbe13da8a250d60c6856a5c5 ] iowrite32() will byteswap it's argument on big endian archs. iowrite32be() will byteswap on little endian archs. Since we don't want to do this unnecessary byteswap on the fast path, doorbell is stored in the NIC's native endianness. Using the right iowrite() according to the arch endianness. CC: Wei Yang <weiyang@linux.vnet.ibm.com> CC: David Laight <david.laight@aculab.com> Fixes: 6a4e812 ("net/mlx4_en: Avoid calling bswap in tx fast path") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27tg3: tg3_disable_ints using uninitialized mailbox value to disable interruptsPrashant Sreedharan
[ Upstream commit 05b0aa579397b734f127af58e401a30784a1e315 ] During driver load in tg3_init_one, if the driver detects DMA activity before intializing the chip tg3_halt is called. As part of tg3_halt interrupts are disabled using routine tg3_disable_ints. This routine was using mailbox value which was not initialized (default value is 0). As a result driver was writing 0x00000001 to pci config space register 0, which is the vendor id / device id. This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only the Vendor ID to identify Configuration Request Retry). Also this issue is only seen in older generation chipsets like 5722 because config space write to offset 0 from driver is possible. The newer generation chips ignore writes to offset 0. Also without commit a7877b17a667, for these older chips when a GRC reset is issued the Bootcode would reprogram the vendor id/device id, which is the reason this bug was masked earlier. Fixed by initializing the interrupt mailbox registers before calling tg3_halt. Please queue for -stable. Reported-by: Nils Holland <nholland@tisys.org> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27net/mlx4: Cache line CQE/EQE stride fixesIdo Shamay
[ Upstream commit c3f2511feac088030055012cc8f64ebd84c87dbc ] This commit contains 2 fixes for the 128B CQE/EQE stride feaure. Wei found that mlx4_QUERY_HCA function marked the wrong capability in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled. Also added small fix in initial CQE ownership bit assignment, when CQE is size is not default 32B. Fixes: 77507aa24 (net/mlx4: Enable CQE/EQE stride support) Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16net: mvneta: fix race condition in mvneta_tx()Eric Dumazet
[ Upstream commit 5f478b41033606d325e420df693162e2524c2b94 ] mvneta_tx() dereferences skb to get skb->len too late, as hardware might have completed the transmit and TX completion could have freed the skb from another cpu. Fixes: 71f6d1b31fb1 ("net: mvneta: replace Tx timer with a real interrupt") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16net: mvneta: fix Tx interrupt delaywilly tarreau
[ Upstream commit aebea2ba0f7495e1a1c9ea5e753d146cb2f6b845 ] The mvneta driver sets the amount of Tx coalesce packets to 16 by default. Normally that does not cause any trouble since the driver uses a much larger Tx ring size (532 packets). But some sockets might run with very small buffers, much smaller than the equivalent of 16 packets. This is what ping is doing for example, by setting SNDBUF to 324 bytes rounded up to 2kB by the kernel. The problem is that there is no documented method to force a specific packet to emit an interrupt (eg: the last of the ring) nor is it possible to make the NIC emit an interrupt after a given delay. In this case, it causes trouble, because when ping sends packets over its raw socket, the few first packets leave the system, and the first 15 packets will be emitted without an IRQ being generated, so without the skbs being freed. And since the socket's buffer is small, there's no way to reach that amount of packets, and the ping ends up with "send: no buffer available" after sending 6 packets. Running with 3 instances of ping in parallel is enough to hide the problem, because with 6 packets per instance, that's 18 packets total, which is enough to grant a Tx interrupt before all are sent. The original driver in the LSP kernel worked around this design flaw by using a software timer to clean up the Tx descriptors. This timer was slow and caused terrible network performance on some Tx-bound workloads (such as routing) but was enough to make tools like ping work correctly. Instead here, we simply set the packet counts before interrupt to 1. This ensures that each packet sent will produce an interrupt. NAPI takes care of coalescing interrupts since the interrupt is disabled once generated. No measurable performance impact nor CPU usage were observed on small nor large packets, including when saturating the link on Tx, and this fixes tools like ping which rely on too small a send buffer. If one wants to increase this value for certain workloads where it is safe to do so, "ethtool -C $dev tx-frames" will override this default setting. This fix needs to be applied to stable kernels starting with 3.10. Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-02cxgb4: Fill in supported link mode for SFP modulesHariprasad Shenai
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29sh_eth: Fix sleeping function called from invalid contextMitsuhiro Kimura
This resolves the following bug which can be reproduced by building the kernel with CONFIG_DEBUG_ATOMIC_SLEEP=y and reading network statistics while the network interface is down. e.g.: ifconfig eth0 down cat /sys/class/net/eth0/statistics/tx_errors ---- [ 1238.161349] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:952 [ 1238.188279] in_atomic(): 1, irqs_disabled(): 0, pid: 1388, name: cat [ 1238.207425] CPU: 0 PID: 1388 Comm: cat Not tainted 3.10.31-ltsi-00046-gefa0b46 #1087 [ 1238.230737] Backtrace: [ 1238.238123] [<c0012e64>] (dump_backtrace+0x0/0x10c) from [<c0013000>] (show_stack+0x18/0x1c) [ 1238.263499] r6:000003b8 r5:c06160c0 r4:c0669e00 r3:00404000 [ 1238.280583] [<c0012fe8>] (show_stack+0x0/0x1c) from [<c04515a4>] (dump_stack+0x20/0x28) [ 1238.304631] [<c0451584>] (dump_stack+0x0/0x28) from [<c004970c>] (__might_sleep+0xf8/0x118) [ 1238.329734] [<c0049614>] (__might_sleep+0x0/0x118) from [<c02465ac>] (__pm_runtime_resume+0x38/0x90) [ 1238.357170] r7:d616f000 r6:c049c458 r5:00000004 r4:d6a17210 [ 1238.374251] [<c0246574>] (__pm_runtime_resume+0x0/0x90) from [<c029b1c4>] (sh_eth_get_stats+0x44/0x280) [ 1238.402468] r7:d616f000 r6:c049c458 r5:d5c21000 r4:d5c21000 [ 1238.419552] [<c029b180>] (sh_eth_get_stats+0x0/0x280) from [<c03ae39c>] (dev_get_stats+0x54/0x88) [ 1238.446204] r5:d5c21000 r4:d5ed7e08 [ 1238.456980] [<c03ae348>] (dev_get_stats+0x0/0x88) from [<c03c677c>] (netstat_show.isra.15+0x54/0x9c) [ 1238.484413] r6:d5c21000 r5:d5c21238 r4:00000028 r3:00000001 [ 1238.501495] [<c03c6728>] (netstat_show.isra.15+0x0/0x9c) from [<c03c69b8>] (show_tx_errors+0x18/0x1c) [ 1238.529196] r7:d5f945d8 r6:d5f945c0 r5:c049716c r4:c0650e7c [ 1238.546279] [<c03c69a0>] (show_tx_errors+0x0/0x1c) from [<c023963c>] (dev_attr_show+0x24/0x50) [ 1238.572157] [<c0239618>] (dev_attr_show+0x0/0x50) from [<c010c148>] (sysfs_read_file+0xb0/0x140) [ 1238.598554] r5:c049716c r4:d5c21240 [ 1238.609326] [<c010c098>] (sysfs_read_file+0x0/0x140) from [<c00b9ee4>] (vfs_read+0xb0/0x13c) [ 1238.634679] [<c00b9e34>] (vfs_read+0x0/0x13c) from [<c00ba0ac>] (SyS_read+0x44/0x74) [ 1238.657944] r8:bef45bf0 r7:00000000 r6:d6ac0600 r5:00000000 r4:00000000 [ 1238.678172] [<c00ba068>] (SyS_read+0x0/0x74) from [<c000eec0>] (ret_fast_syscall+0x0/0x30) ---- Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29stmmac: platform: Move plat_dat checking earlierHuacai Chen
Original code only check/alloc plat_dat for the CONFIG_OF case, this patch check/alloc it earlier and unconditionally to avoid kernel build warnings: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:275 stmmac_pltfr_probe() warn: variable dereferenced before check 'plat_dat' V2: Fix coding style. Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29sh_eth: Fix skb alloc size and alignment adjust rule.Mitsuhiro Kimura
In the current driver, allocation size of skb does not care the alignment adjust after allocation. And also, in the current implementation, buffer alignment method by sh_eth_set_receive_align function has a bug that this function displace buffer start address forcedly when the alignment is corrected. In the result, tail of the skb will exceed allocated area and kernel panic will be occurred. This patch fix this issue. Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26net: Check for presence of IFLA_AF_SPECThomas Graf
ndo_bridge_setlink() is currently only called on the slave if IFLA_AF_SPEC is set but this is a very fragile assumption and may change in the future. Cc: Ajit Khaparde <ajit.khaparde@emulex.com> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>