summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2015-08-09bpf: Implement function bpf_perf_event_read() that get the selected hardware ↵Kaixu Xia
PMU conuter According to the perf_event_map_fd and index, the function bpf_perf_event_read() can convert the corresponding map value to the pointer to struct perf_event and return the Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09bpf: Add new bpf map type to store the pointer to struct perf_eventKaixu Xia
Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'. This map only stores the pointer to struct perf_event. The user space event FDs from perf_event_open() syscall are converted to the pointer to struct perf_event and stored in map. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09bpf: Make the bpf_prog_array_map more genericWang Nan
All the map backends are of generic nature. In order to avoid adding much special code into the eBPF core, rewrite part of the bpf_prog_array map code and make it more generic. So the new perf_event_array map type can reuse most of code with bpf_prog_array map and add fewer lines of special code. Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09perf: add the necessary core perf APIs when accessing events counters in ↵Kaixu Xia
eBPF programs This patch add three core perf APIs: - perf_event_attrs(): export the struct perf_event_attr from struct perf_event; - perf_event_get(): get the struct perf_event from the given fd; - perf_event_read_local(): read the events counters active on the current CPU; These APIs are needed when accessing events counters in eBPF programs. The API perf_event_read_local() comes from Peter and I add the corresponding SOB. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net_dbg_ratelimited: turn into no-op when !DEBUGJason A. Donenfeld
The pr_debug family of functions turns into a no-op when -DDEBUG is not specified, opting instead to call "no_printk", which gets compiled to a no-op (but retains gcc's nice warnings about printf-style arguments). The problem with net_dbg_ratelimited is that it is defined to be a variant of net_ratelimited_function, which expands to essentially: if (net_ratelimit()) pr_debug(fmt, ...); When DEBUG is not defined, then this becomes, if (net_ratelimit()) ; This seems benign, except it isn't. Firstly, there's the obvious overhead of calling net_ratelimit needlessly, which does quite some book keeping for the rate limiting. Given that the pr_debug and net_dbg_ratelimited family of functions are sprinkled liberally through performance critical code, with developers assuming they'll be compiled out to a no-op most of the time, we certainly do not want this needless book keeping. Secondly, and most visibly, even though no debug message is printed when DEBUG is not defined, if there is a flood of invocations, dmesg winds up peppered with messages such as "net_ratelimit: 320 callbacks suppressed". This is because our aforementioned net_ratelimit() function actually prints this text in some circumstances. It's especially odd to see this when there isn't any other accompanying debug message. So, in sum, it doesn't make sense to have this function's current behavior, and instead it should match what every other debug family of functions in the kernel does with !DEBUG -- nothing. This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Also, though not strictly neccessary, it guards the call with an if (0) so that all evaluation of any arguments are sure to be compiled out. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net/mlx5_core: Support physical port countersGal Pressman
Added physical port counters in the following standard formats to ethtool statistics: - IEEE 802.3 - RFC2863 - RFC2819 Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net/mlx5e: Light-weight netdev open/stopAchiad Shochat
Create/destroy TIRs, TISs and flow tables upon PCI probe/remove rather than upon the netdev ndo_open/stop. Upon ndo_stop(), redirect all RX traffic to the (lately introduced) "Drop RQ" and then close only the RX/TX rings, leaving the TIRs, TISs and flow tables alive. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net/mlx5_core: Introduce access function to modify RSS/LRO paramsAchiad Shochat
To be used by the mlx5 Eth driver in following commit. This is in preparation for netdev "light-weight" open/stop flow change described in previous commit. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) A couple of cleanups for the netfilter core hook from Eric Biederman. 2) Net namespace hook registration, also from Eric. This adds a dependency with the rtnl_lock. This should be fine by now but we have to keep an eye on this because if we ever get the per-subsys nfnl_lock before rtnl we have may problems in the future. But we have room to remove this in the future by propagating the complexity to the clients, by registering hooks for the init netns functions. 3) Update nf_tables to use the new net namespace hook infrastructure, also from Eric. 4) Three patches to refine and to address problems from the new net namespace hook infrastructure. 5) Switch to alternate jumpstack in xtables iff the packet is reentering. This only applies to a very special case, the TEE target, but Eric Dumazet reports that this is slowing down things for everyone else. So let's only switch to the alternate jumpstack if the tee target is in used through a static key. This batch also comes with offline precalculation of the jumpstack based on the callchain depth. From Florian Westphal. 6) Minimal SCTP multihoming support for our conntrack helper, from Michal Kubecek. 7) Reduce nf_bridge_info per skbuff scratchpad area to 32 bytes, from Florian Westphal. 8) Fix several checkpatch errors in bridge netfilter, from Bernhard Thaler. 9) Get rid of useless debug message in ip6t_REJECT, from Subash Abhinov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: arch/s390/net/bpf_jit_comp.c drivers/net/ethernet/ti/netcp_ethss.c net/bridge/br_multicast.c net/ipv4/ip_fragment.c All four conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Must teardown SR-IOV before unregistering netdev in igb driver, from Alex Williamson. 2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell. 3) Default route selection in ipv4 should take the prefix length, table ID, and TOS into account, from Julian Anastasov. 4) sch_plug must have a reset method in order to purge all buffered packets when the qdisc is reset, likewise for sch_choke, from WANG Cong. 5) Fix deadlock and races in slave_changelink/br_setport in bridging. From Nikolay Aleksandrov. 6) mlx4 bug fixes (wrong index in port even propagation to VFs, overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack Morgenstein, and Or Gerlitz. 7) Turn off klog message about SCTP userspace interface compat that makes no sense at all, from Daniel Borkmann. 8) Fix unbounded restarts of inet frag eviction process, causing NMI watchdog soft lockup messages, from Florian Westphal. 9) Suspend/resume fixes for r8152 from Hayes Wang. 10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from Sabrina Dubroca. 11) Fix performance regression when removing a lot of routes from the ipv4 routing tables, from Alexander Duyck. 12) Fix device leak in AF_PACKET, from Lars Westerhoff. 13) AF_PACKET also has a header length comparison bug due to signedness, from Alexander Drozdov. 14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann. 15) Memory leaks, TSO stats, watchdog timeout and other fixes to thunderx driver from Sunil Goutham and Thanneeru Srinivasulu. 16) act_bpf can leak memory when replacing programs, from Daniel Borkmann. 17) WOL packet fixes in gianfar driver, from Claudiu Manoil. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) stmmac: fix missing MODULE_LICENSE in stmmac_platform gianfar: Enable device wakeup when appropriate gianfar: Fix suspend/resume for wol magic packet gianfar: Fix warning when CONFIG_PM off act_pedit: check binding before calling tcf_hash_release() net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket net: sched: fix refcount imbalance in actions r8152: reset device when tx timeout r8152: add pre_reset and post_reset qlcnic: Fix corruption while copying act_bpf: fix memory leaks when replacing bpf programs net: thunderx: Fix for crash while BGX teardown net: thunderx: Add PCI driver shutdown routine net: thunderx: Fix crash when changing rss with mutliple traffic flows net: thunderx: Set watchdog timeout value net: thunderx: Wakeup TXQ only if CQE_TX are processed net: thunderx: Suppress alloc_pages() failure warnings net: thunderx: Fix TSO packet statistic net: thunderx: Fix memory leak when changing queue count net: thunderx: Fix RQ_DROP miscalculation ...
2015-07-31net: Add functions to get skb->hash based on flow structuresTom Herbert
Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff hash from flowi6 and flowi4 structures respectively. These functions can be called when creating a packet in the output path where the new sk_buff does not yet contain a fully formed packet that is parsable by flow dissector. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net/ipv6: add sysctl option accept_ra_min_hop_limitHangbin Liu
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface") disabled accept hop limit from RA if it is smaller than the current hop limit for security stuff. But this behavior kind of break the RFC definition. RFC 4861, 6.3.4. Processing Received Router Advertisements A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time, and Retrans Timer) may contain a value denoting that it is unspecified. In such cases, the parameter should be ignored and the host should continue using whatever value it is already using. If the received Cur Hop Limit value is non-zero, the host SHOULD set its CurHopLimit variable to the received value. So add sysctl option accept_ra_min_hop_limit to let user choose the minimum hop limit value they can accept from RA. And set default to 1 to meet RFC standards. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30bpf: also show process name/pid in bpf_jit_dumpDaniel Borkmann
It can be useful for testing to see the actual process/pid who is loading a given filter. I was running some BPF test program and noticed unusual filter loads from time to time, triggered by some other application in the background. bpf_jit_disasm is still working after this change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30bpf: provide helper that indicates eBPF was migratedDaniel Borkmann
During recent discussions we had with Michael, we found that it would be useful to have an indicator that tells the JIT that an eBPF program had been migrated from classic instructions into eBPF instructions, as only in that case A and X need to be cleared in the prologue. Such eBPF programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC. Thus, introduce a small helper for cde66c2d88da ("s390/bpf: Only clear A and X for converted BPF programs") and possibly others in future. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30netfilter: bridge: reduce nf_bridge_info to 32 bytes againFlorian Westphal
We can use union for most of the temporary cruft (original ipv4/ipv6 address, source mac, physoutdev) since they're used during different stages of br netfilter traversal. Also get rid of the last two ->mask users. Shrinks struct from 48 to 32 on 64bit arch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-29Merge tag 'pm+acpi-4.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These fix three regressions, two recent ones (cpufreq core and ACPI device power management) and one introduced during the 4.1 cycle (intel_pstate). Specifics: - Fix a recently introduced issue in the cpufreq core causing it to attempt to create duplicate symbolic links to the policy directory in sysfs for CPUs that are offline when the cpufreq driver is being registered (Rafael J Wysocki) - Fix a recently introduced problem in the ACPI device power management core code causing it to store an incorrect value in the device object's power.state field in some cases which in turn leads to attempts to turn power resources off while they should still be on going forward (Mika Westerberg) - Fix an intel_pstate driver issue introduced during the 4.1 cycle which leads to kernel panics on boot on Knights Landing chips due to incomplete support for them in that driver (Lukasz Anaczkowski)" * tag 'pm+acpi-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid attempts to create duplicate symbolic links ACPI / PM: Use target_state to set the device power state intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
2015-07-29stmmac: remove setup/free glue callbacksJoachim Eastwood
As all dwmac-* drivers have been converted to have a proper probe function the setup callback can now be removed. Also remove the free callback that wasn't used by any driver. New dwmac-* drivers should implement standard probe and remove functions to preform any needed setup and teardown. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29stmmac: remove unused stmmac_of_data structJoachim Eastwood
As dwmac-* drivers that need OF match have been converted to use their own internal OF match data structure this can now be removed. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28Merge tag 'devicetree-fixes-for-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: "A handful of DT related fixes for 4.2-rc" * tag 'devicetree-fixes-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: Drop owner assignment from platform and i2c driver DEVICETREE: Misc fix for the AR7100 SPI controller binding of: constify drv arg of of_driver_match_device stub of: add HAS_IOMEM depends to OF_ADDRESS
2015-07-28Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...
2015-07-28cpufreq: Avoid attempts to create duplicate symbolic linksRafael J. Wysocki
After commit 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) there is a problem with CPUs that share cpufreq policy objects with other CPUs and are initially offline. Say CPU1 shares a policy with CPU0 which is online and is registered first. As part of the registration process, cpufreq_add_dev() is called for it. It creates the policy object and a symbolic link to it from the CPU1's sysfs directory. If CPU1 is registered subsequently and it is offline at that time, cpufreq_add_dev() will attempt to create a symbolic link to the policy object for it, but that link is present already, so a warning about that will be triggered. To avoid that warning, make cpufreq use an additional CPU mask containing related CPUs that are actually present for each policy object. That mask is initialized when the policy object is populated after its creation (for the first online CPU using it) and it includes CPUs from the "policy CPUs" mask returned by the cpufreq driver's ->init() callback that are physically present at that time. Symbolic links to the policy are created only for the CPUs in that mask. If cpufreq_add_dev() is invoked for an offline CPU, it checks the new mask and only creates the symlink if the CPU was not in it (the CPU is added to the mask at the same time). In turn, cpufreq_remove_dev() drops the given CPU from the new mask, removes its symlink to the policy object and returns, unless it is the CPU owning the policy object. In that case, the policy object is moved to a new CPU's sysfs directory or deleted if the CPU being removed was the last user of the policy. While at it, notice that cpufreq_remove_dev() can't fail, because its return value is ignored, so make it ignore return values from __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish() and prevent these functions from aborting on errors returned by __cpufreq_governor(). Also drop the now unused sif argument from them. Fixes: 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-07-27net/mlx4_en: Add support for hardware accelerated 802.1ad vlanHadar Hen Zion
To enable device support in accelerated 802.1ad vlan, the port capability "packet has vlan enable" (phv_en) should be set. Firmware won't work properly, in case phv_en is not set. The user can enable "phv_en" port capability with the new ethtool private flag phv-bit. The phv-bit private flag default value is OFF, users who are interested in 802.1ad hardware acceleration should turn ON the phv-bit private flag: $ ethtool --set-priv-flags eth1 phv-bit on Once the private flag is set, the device is ready for 802.1ad vlan acceleration. The user should also change the interface device features and turn on "tx-vlan-stag-hw-insert" which is off by default: $ ethtool -K eth1 tx-vlan-stag-hw-insert on "phv-bit" private flag setting is available only for Physical Functions(PF), the Virtual Function (VF) will be able to use the feature by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the feature was enabled by the Hypervisor. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated supportHadar Hen Zion
To add Hardware accelerated support in 802.1ad vlan, replace Current VLAN macros to CVLAN. Replace: MLX4_WQE_CTRL_INS_VLAN MLX4_CQE_VLAN_PRESENT_MASK With: MLX4_WQE_CTRL_INS_CVLAN MLX4_CQE_CVLAN_PRESENT_MASK Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27net/mlx4_core: Preparations for 802.1ad VLAN supportHadar Hen Zion
mlx4_core preparation to support hardware accelerated 802.1ad VLAN device. To allow 802.1ad accelerated device, "packet has vlan" (phv) Firmware capability should be available. Firmware without the phv capability won't behave properly and can't support 802.1ad device acceleration. The driver checks the Firmware capability and sets the phv bit accordingly in SET_PORT command. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27of: constify drv arg of of_driver_match_device stubTomeu Vizoso
With this change the stub has the same signature as the actual function, preventing this compiler warning when building without CONFIG_OF: drivers/base/property.c: In function 'fwnode_driver_match_device': >> drivers/base/property.c:608:38: warning: passing argument 2 of 'of_driver_match_device' discards 'const' qualifier from pointer target type return of_driver_match_device(dev, drv); ^ In file included from drivers/base/property.c:18:0: include/linux/of_device.h:61:19: note: expected 'struct device_driver *' but argument is of type 'const struct device_driver *' static inline int of_driver_match_device(struct device *dev, ^ Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org>
2015-07-27net/macb: convert to kernel docAndy Shevchenko
This patch coverts struct description to the kernel doc format. There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27net/mlx5e: TX latency optimization to save DMA readsAchiad Shochat
A regular TX WQE execution involves two or more DMA reads - one to fetch the WQE, and another one per WQE gather entry. These DMA reads obviously increase the TX latency. There are two mlx5 mechanisms to bypass these DMA reads: 1) Inline WQE 2) Blue Flame (BF) An inline WQE contains a whole packet, thus saves the DMA read/s of the regular WQE gather entry/s. Inline WQE support was already added in the previous commit. A BF WQE is written directly to the device I/O mapped memory, thus enables saving the DMA read that fetches the WQE. The BF WQE I/O write must be in cache line granularity, thus uses the CPU write combining mechanism. A BF WQE I/O write acts also as a TX doorbell for notifying the device of new TX WQEs. A BF WQE is written to the same I/O mapped address as the regular TX doorbell, thus this address is being mapped twice - once by ioremap() and once by io_mapping_map_wc(). While both mechanisms reduce the TX latency, they both consume more CPU cycles than a regular WQE: - A BF WQE must still be written to host memory, in addition to being written directly to the device I/O mapped memory. - An inline WQE involves copying the SKB data into it. To handle this tradeoff, we introduce here a heuristic algorithm that strives to avoid using these two mechanisms in case the TX queue is being back-pressured by the device, and limit their usage rate otherwise. An inline WQE will always be "Blue Flamed" (written directly to the device I/O mapped memory) while a BF WQE may not be inlined (may contain gather entries). Preliminary testing using netperf UDP_RR shows that the latency goes down from 17.5us to 16.9us, while the message rate (tested with pktgen) stays the same. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27net/mlx5e: Allocate DMA coherent memory on reader NUMA nodeSaeed Mahameed
By affinity hints and XPS, each mlx5e channel is assigned a CPU core. Channel DMA coherent memory that is written by the NIC and read by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU core assigned for the channel. Channel DMA coherent memory that is written by SW and read by the NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC. Doorbell record (written by SW and read by the NIC) is an exception since it is accessed by SW more frequently. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27net/mlx5e: Support ETH_RSS_HASH_XORSaeed Mahameed
The ConnectX-4 HW implements inverted XOR8. To make it act as XOR we re-order the HW RSS indirection table. Set XOR to be the default RSS hash function and add ethtool API to control it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26lwtunnel: export linux/lwtunnel.h to userspaceNicolas Dichtel
Note also that include/linux/lwtunnel.h is not needed. CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 499a24256862 ("lwtunnel: infrastructure for handling light weight tunnels like mpls") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update contains: - the manual revert of the SYSCALL32 changes which caused a regression - a fix for the MPX vma handling - three fixes for the ioremap 'is ram' checks. - PAT warning fixes - a trivial fix for the size calculation of TLB tracepoints - handle old EFI structures gracefully This also contains a PAT fix from Jan plus a revert thereof. Toshi explained why the code is correct" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Revert 'Adjust default caching mode translation tables' x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit x86/mm: Fix newly introduced printk format warnings mm: Fix bugs in region_is_ram() x86/mm: Remove region_is_ram() call from ioremap x86/mm: Move warning from __ioremap_check_ram() to the call site x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn() x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn() x86/mm/pat: Adjust default caching mode translation tables x86/fpu: Disable dependent CPU features on "noxsave" x86/mpx: Do not set ->vm_ops on MPX VMAs x86/mm: Add parenthesis for TLB tracepoint size calculation efi: Handle memory error structures produced based on old versions of standard
2015-07-25Merge tag 'trace-v4.2-rc2-fix3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Back in 3.16 the ftrace code was redesigned and cleaned up to remove the double iteration list (one for registered ftrace ops, and one for registered "global" ops), to just use one list. That simplified the code but also broke the function tracing filtering on pid. This updates the code to handle the filtering again with the new logic" * tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix breakage of set_ftrace_pid
2015-07-25Merge tag 'for-linus-20150724' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Brian Norris: "Two trivial updates. I meant to send these much earlier, but I've been preoccupied. - Add MAINTAINERS entry for diskonchip g3 driver - Fix an overlooked conflict in bitfield value assignments The latter update is a bit overdue, but there's no reason to wait any longer" * tag 'for-linus-20150724' of git://git.infradead.org/linux-mtd: mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict MAINTAINERS: mtd: docg3: add docg3 maintainer
2015-07-24Merge branch 'for-4.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "A couple important fixes. - A block layer change which removed restriction on max transfer size led to silent data corruption on some devices. A new quirk is added to restore the old size limit for the reported device. If it gets reported on more devices, we might have to consider restoring the restriction for ATA devices by default. - There finally is a SSD which is confirmed to cause data corruption on TRIM regardless of which flavor is used. A new quirk is added and the device is blacklisted - Other device-specific workarounds" * 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: Do not blacklist M510DC libata: increase the timeout when setting transfer mode libata: add ATA_HORKAGE_MAX_SEC_1024 to revert back to previous max_sectors limit libata: force disable trim for SuperSSpeed S238 libata: add ATA_HORKAGE_NOTRIM libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for HP 250GB SATA disk VB0250EAVER ata: pmp: add quirk for Marvell 4140 SATA PMP
2015-07-24ftrace: Fix breakage of set_ftrace_pidSteven Rostedt (Red Hat)
Commit 4104d326b670 ("ftrace: Remove global function list and call function directly") simplified the ftrace code by removing the global_ops list with a new design. But this cleanup also broke the filtering of PIDs that are added to the set_ftrace_pid file. Add back the proper hooks to have pid filtering working once again. Cc: stable@vger.kernel.org # 3.16+ Reported-by: Matt Fleming <matt@console-pimps.org> Reported-by: Richard Weinberger <richard.weinberger@gmail.com> Tested-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-24mmc: sdhci-esdhc-imx: clear f_max in boarddataDong Aisheng
After commit 8d86e4fcccf6 ("mmc: sdhci-esdhc-imx: Call mmc_of_parse()"), it's not used anymore. Signed-off-by: Dong Aisheng <aisheng.dong@freescale.com> Reviewed-by: Johan Derycke <johan.derycke@barco.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-07-23netfilter: rename local nf_hook_list to hook_listPablo Neira Ayuso
085db2c04557 ("netfilter: Per network namespace netfilter hooks.") introduced a new nf_hook_list that is global, so let's avoid this overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bridge/br_mdb.c br_mdb.c conflict was a function call being removed to fix a bug in 'net' but whose signature was changed in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHETrond Myklebust
I'm not aware of any existing bugs around this, but the expectation is that nfs_mark_for_revalidate() should always force a revalidation of the cached metadata. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22NFS: Remove the "NFS_CAP_CHANGE_ATTR" capabilityTrond Myklebust
Setting the change attribute has been mandatory for all NFS versions, since commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should therefore not have anything be conditional on it being set/unset. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22ipv6: sysctl to restrict candidate source addressesErik Kline
Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21vxlan: Flow based tunnelingThomas Graf
Allows putting a VXLAN device into a new flow-based mode in which skbs with a ip_tunnel_info dst metadata attached will be encapsulated according to the instructions stored in there with the VXLAN device defaults taken into consideration. Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is set, the packet processing will populate a ip_tunnel_info struct for each packet received and attach it to the skb using the new metadata dst. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields and perform forwarding. It is the responsibility of upper layers to ensure that the flag is set if the metadata is needed. The flag limits the additional cost of metadata collecting based on demand. This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. It also allows for OVS to leverage this mode which in turn allows for the removal of the OVS specific VXLAN code. Because the skb is currently scrubed in vxlan_rcv(), the attachment of the new dst metadata is postponed until after scrubing which requires the temporary addition of a new member to vxlan_metadata. This member is removed again in a later commit after the indirect VXLAN receive API has been removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21mpls: ip tunnel supportRoopa Prabhu
This implementation uses lwtunnel infrastructure to register hooks for mpls tunnel encaps. It picks cues from iptunnel_encaps infrastructure and previous mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21lwtunnel: infrastructure for handling light weight tunnels like mplsRoopa Prabhu
Provides infrastructure to parse/dump/store encap information for light weight tunnels like mpls. Encap information for such tunnels is associated with fib routes. This infrastructure is based on previous suggestions from Eric Biederman to follow the xfrm infrastructure. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull an EFI fix from Matt Fleming: - Fix a bug in the Common Platform Error Record (CPER) driver that caused old UEFI spec (< 2.3) versions of the memory error record structure to be declared invalid. (Tony Luck) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-20bpf: introduce bpf_skb_vlan_push/pop() helpersAlexei Starovoitov
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via helper functions. These functions may change skb->data/hlen which are cached by some JITs to improve performance of ld_abs/ld_ind instructions. Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls, re-compute header len and re-cache skb->data/hlen back into cpu registers. Note, skb->data/hlen are not directly accessible from the programs, so any changes to skb->data done either by these helpers or by other TC actions are safe. eBPF JIT supported by three architectures: - arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is. - x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls (experiments showed that conditional re-caching is slower). - s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present in the program (re-caching is tbd). These helpers allow more scalable handling of vlan from the programs. Instead of creating thousands of vlan netdevs on top of eth0 and attaching TC+ingress+bpf to all of them, the program can be attached to eth0 directly and manipulate vlans as necessary. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20stmmac: drop custom_* fields from plat_stmmacenet_dataJoachim Eastwood
Both of these fields are unused and has been unused since they were added 3 and 5 years ago. Drop them since they are clearly not very useful. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20net: remove skb_frag_add_headJiri Benc
It's not used anywhere. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20net: add phys ID compare helper to test if two IDs are the sameScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>