summaryrefslogtreecommitdiff
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2019-06-17tcp: add tcp_min_snd_mss sysctlEric Dumazet
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ipv4: set the tcp_min_rtt_wlen range from 0 to one dayZhangXiaoxu
[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ] There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: <IRQ> dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf </IRQ> It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18inet: frags: break the 2GB limit for frags storageEric Dumazet
Some users are willing to provision huge amounts of memory to be able to perform reassembly reasonnably well under pressure. Current memory tracking is using one atomic_t and integers. Switch to atomic_long_t so that 64bit arches can use more than 2GB, without any cost for 32bit arches. Note that this patch avoids an overflow error, if high_thresh was set to ~2GB, since this test in inet_frag_alloc() was never true : if (... || frag_mem_limit(nf) > nf->high_thresh) Tested: $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh <frag DDOS> $ grep FRAG /proc/net/sockstat FRAG: inuse 14705885 memory 16000002880 $ nstat -n ; sleep 1 ; nstat | grep Reas IpReasmReqds 3317150 0.0 IpReasmFails 3317112 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18inet: frags: use rhashtables for reassembly unitsEric Dumazet
Some applications still rely on IP fragmentation, and to be fair linux reassembly unit is not working under any serious load. It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!) A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU. A followup patch will even remove the refcount hold/release left from prior implementation and save a couple of atomic operations. Before this patch, 16 cpus (16 RX queue NIC) could not handle more than 1 Mpps frags DDOS. After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB of storage for the fragments (exact number depends on frags being evicted after timeout) $ grep FRAG /proc/net/sockstat FRAG: inuse 1966916 memory 2140004608 A followup patch will change the limits for 64bit arches. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Florian Westphal <fw@strlen.de> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-13netdev-FAQ: clarify DaveM's position for stable backportsCong Wang
[ Upstream commit 75d4e704fa8d2cf33ff295e5b441317603d7f9fd ] Per discussion with David at netconf 2018, let's clarify DaveM's position of handling stable backports in netdev-FAQ. This is important for people relying on upstream -stable releases. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-24netfilter: fix nf_conntrack_helper documentationFlorian Westphal
Since kernel 4.7 this defaults to off. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-07Documentation: networking: dsa: Update tagging protocolsFabian Mewes
Add Qualcomm QCA tagging introduced in cafdc45c9 to the list of supported protocols. Signed-off-by: Fabian Mewes <architekt@coding4coffee.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, mostly drivers as is usually the case. 1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey Khoroshilov. 2) Fix element timeouts in netfilter's nft_dynset, from Anders K. Pedersen. 3) Don't put aead_req crypto struct on the stack in mac80211, from Ard Biesheuvel. 4) Several uninitialized variable warning fixes from Arnd Bergmann. 5) Fix memory leak in cxgb4, from Colin Ian King. 6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann. 7) Several VRF semantic fixes from David Ahern. 8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper. 9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet. 10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev. 11) Fix stale link state during failover in NCSCI driver, from Gavin Shan. 12) Fix netdev lower adjacency list traversal, from Ido Schimmel. 13) Propvide proper handle when emitting notifications of filter deletes, from Jamal Hadi Salim. 14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen. 15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac. 16) Several routing offload fixes in mlxsw driver, from Jiri Pirko. 17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy. 18) Validate chunk len before using it in SCTP, from Marcelo Ricardo Leitner. 19) Revert a netns locking change that causes regressions, from Paul Moore. 20) Add recursion limit to GRO handling, from Sabrina Dubroca. 21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon. 22) Avoid accessing stale vxlan/geneve socket in data path, from Pravin Shelar" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits) geneve: avoid using stale geneve socket. vxlan: avoid using stale vxlan socket. qede: Fix out-of-bound fastpath memory access net: phy: dp83848: add dp83822 PHY support enic: fix rq disable tipc: fix broadcast link synchronization problem ibmvnic: Fix missing brackets in init_sub_crq_irqs ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context" arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold net/mlx4_en: Save slave ethtool stats command net/mlx4_en: Fix potential deadlock in port statistics flow net/mlx4: Fix firmware command timeout during interrupt test net/mlx4_core: Do not access comm channel if it has not yet been initialized net/mlx4_en: Fix panic during reboot net/mlx4_en: Process all completions in RX rings after port goes up net/mlx4_en: Resolve dividing by zero in 32-bit system net/mlx4_core: Change the default value of enable_qos net/mlx4_core: Avoid setting ports to auto when only one port type is supported net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec ...
2016-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from Geert Uytterhoeven. 2) Fix wrong timeout in set elements added from packet path via nft_dynset, from Anders K. Pedersen. 3) Remove obsolete nf_conntrack_events_retry_timeout sysctl documentation, from Nicolas Dichtel. 4) Ensure proper initialization of log flags via xt_LOG, from Liping Zhang. 5) Missing alias to autoload ipcomp, also from Liping Zhang. 6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping. 7) Wrong integer type in the new nft_parse_u32_check() function, from Dan Carpenter. 8) Another wrong integer type declaration in nft_exthdr_init, also from Dan Carpenter. 9) Fix insufficient mode validation in nft_range. 10) Fix compilation warning in nft_range due to possible uninitialized value, from Arnd Bergmann. 11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to calm down kmemcheck, from Florian Westphal. 12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached, from Nicolas Dichtel. 13) Fix nf_queue() after conversion to single-linked hook list, related to incorrect bypass flag handling and incorrect hook point of reinjection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17netfilter: conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)Nicolas Dichtel
This entry has been removed in commit 9500507c6138. Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-14Merge tag 'linux-kselftest-4.9-rc1-update' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest updates from Shuah Khan: "This update consists of: - Fixes and improvements to existing tests - Moving code from Documentation to selftests, samples, and tools: * Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and networking tests from Documentation to selftests. * Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay, and blackfin examples from Documentation to samples. * Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from Documentation to tools. * Deletes BUILD_DOCSRC and its dependencies" * tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits) selftests/futex: Check ANSI terminal color support Doc: update 00-INDEX files to reflect the runnable code move samples: move blackfin gptimers-example from Documentation tools: move pcmcia crc32hash tool from Documentation tools: move laptops dslm tool from Documentation tools: move accounting tool from Documentation samples: move auxdisplay example code from Documentation samples: move watchdog example code from Documentation samples: move timers example code from Documentation samples: move misc-devices/mei example code from Documentation samples: move mic/mpssd example code from Documentation selftests: Move networking/timestamping from Documentation selftests: move watchdog tests from Documentation/watchdog selftests: move ia64 tests from Documentation/ia64 selftests: move vDSO tests from Documentation/vDSO selftests: move ptp tests from Documentation/ptp selftests: move prctl tests from Documentation/prctl selftests: move dnotify_test from Documentation/filesystems selftests/timers: Add missing error code assignment before test selftests/zram: replace ZRAM_LZ4_COMPRESS ...
2016-10-14Documentation/networking: update git urls to use https over httpAlexander Alemayhu
This fixes the following errors when trying to clone the urls: Cloning into 'net'... fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/' not found Cloning into 'net-next'... fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/' not found Cloning into 'linux'... fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/' not found Cloning into 'stable-queue'... fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/' not found Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-10Doc: update 00-INDEX files to reflect the runnable code moveShuah Khan
Update 00-INDEX files with the current file list to reflect the runnable code move. Acked-by: Michal Marek <mmarek@suse.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-09-28doc: update switchdev L3 sectionJiri Pirko
This is to reflect the change of FIB offload infrastructure from switchdev objects to FIB notifier. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20selftests: Move networking/timestamping from DocumentationShuah Khan
Remove networking from Documentation Makefile to move the test to selftests. Update networking/timestamping Makefile to work under selftests. These tests will not be run as part of selftests suite and will not be included in install targets. They can be built and run separately for now. This is part of the effort to move runnable code from Documentation. Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-09-19ipvlan: Introduce l3s modeMahesh Bandewar
In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and if admins want to deploy iptables rules in default-ns, this asymmetric data path makes it impossible to do so. This patch makes use of the l3_rcv() (added as part of l3mdev enhancements) to perform input route lookup on RX packets without changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN to change the skb->dev just before handing over skb to L4. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01rxrpc: Don't expose skbs to in-kernel users [ver #2]David Howells
Don't expose skbs to in-kernel users, such as the AFS filesystem, but instead provide a notification hook the indicates that a call needs attention and another that indicates that there's a new call to be collected. This makes the following possibilities more achievable: (1) Call refcounting can be made simpler if skbs don't hold refs to calls. (2) skbs referring to non-data events will be able to be freed much sooner rather than being queued for AFS to pick up as rxrpc_kernel_recv_data will be able to consult the call state. (3) We can shortcut the receive phase when a call is remotely aborted because we don't have to go through all the packets to get to the one cancelling the operation. (4) It makes it easier to do encryption/decryption directly between AFS's buffers and sk_buffs. (5) Encryption/decryption can more easily be done in the AFS's thread contexts - usually that of the userspace process that issued a syscall - rather than in one of rxrpc's background threads on a workqueue. (6) AFS will be able to wait synchronously on a call inside AF_RXRPC. To make this work, the following interface function has been added: int rxrpc_kernel_recv_data( struct socket *sock, struct rxrpc_call *call, void *buffer, size_t bufsize, size_t *_offset, bool want_more, u32 *_abort_code); This is the recvmsg equivalent. It allows the caller to find out about the state of a specific call and to transfer received data into a buffer piecemeal. afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction logic between them. They don't wait synchronously yet because the socket lock needs to be dealt with. Five interface functions have been removed: rxrpc_kernel_is_data_last() rxrpc_kernel_get_abort_code() rxrpc_kernel_get_error_number() rxrpc_kernel_free_skb() rxrpc_kernel_data_consumed() As a temporary hack, sk_buffs going to an in-kernel call are queued on the rxrpc_call struct (->knlrecv_queue) rather than being handed over to the in-kernel user. To process the queue internally, a temporary function, temp_deliver_data() has been added. This will be replaced with common code between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-31net: dsa: add MDB supportVivien Didelot
Add SWITCHDEV_OBJ_ID_PORT_MDB support to the DSA layer. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30rxrpc: Pass struct socket * to more rxrpc kernel interface functionsDavid Howells
Pass struct socket * to more rxrpc kernel interface functions. They should be starting from this rather than the socket pointer in the rxrpc_call struct if they need to access the socket. I have left: rxrpc_kernel_is_data_last() rxrpc_kernel_get_abort_code() rxrpc_kernel_get_error_number() rxrpc_kernel_free_skb() rxrpc_kernel_data_consumed() unmodified as they're all about to be removed (and, in any case, don't touch the socket). Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Provide a way for AFS to ask for the peer address of a callDavid Howells
Provide a function so that kernel users, such as AFS, can ask for the peer address of a call: void rxrpc_kernel_get_peer(struct rxrpc_call *call, struct sockaddr_rxrpc *_srx); In the future the kernel service won't get sk_buffs to look inside. Further, this allows us to hide any canonicalisation inside AF_RXRPC for when IPv6 support is added. Also propagate this through to afs_find_server() and issue a warning if we can't handle the address family yet. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28Documentation: networking: dsa: Remove platform device TODOFlorian Fainelli
Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"), the shortcomings of the dsa platform device have been addressed, remove that TODO item. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26bridge: switchdev: Add forward mark support for stacked devicesIdo Schimmel
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of port netdevs so that packets being flooded by the device won't be flooded twice. It works by assigning a unique identifier (the ifindex of the first bridge port) to bridge ports sharing the same parent ID. This prevents packets from being flooded twice by the same switch, but will flood packets through bridge ports belonging to a different switch. This method is problematic when stacked devices are taken into account, such as VLANs. In such cases, a physical port netdev can have upper devices being members in two different bridges, thus requiring two different 'offload_fwd_mark's to be configured on the port netdev, which is impossible. The main problem is that packet and netdev marking is performed at the physical netdev level, whereas flooding occurs between bridge ports, which are not necessarily port netdevs. Instead, packet and netdev marking should really be done in the bridge driver with the switch driver only telling it which packets it already forwarded. The bridge driver will mark such packets using the mark assigned to the ingress bridge port and will prevent the packet from being forwarded through any bridge port sharing the same mark (i.e. having the same parent ID). Remove the current switchdev 'offload_fwd_mark' implementation and instead implement the proposed method. In addition, make rocker - the sole user of the mark - use the proposed method. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24net: dsa: rename switch operations structureVivien Didelot
Now that the dsa_switch_driver structure contains only function pointers as it is supposed to, rename it to the more appropriate dsa_switch_ops, uniformly to any other operations structure in the kernel. No functional changes here, basically just the result of something like: s/dsa_switch_driver *drv/dsa_switch_ops *ops/g However keep the {un,}register_switch_driver functions and their dsa_switch_drivers list as is, since they represent the -- likely to be deprecated soon -- legacy DSA registration framework. In the meantime, also fix the following checks from checkpatch.pl to make it happy with this patch: CHECK: Comparison to NULL could be written "!ops" #403: FILE: net/dsa/dsa.c:470: + if (ops == NULL) { CHECK: Comparison to NULL could be written "ds->ops->get_strings" #773: FILE: net/dsa/slave.c:697: + if (ds->ops->get_strings != NULL) CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats" #824: FILE: net/dsa/slave.c:785: + if (ds->ops->get_ethtool_stats != NULL) CHECK: Comparison to NULL could be written "ds->ops->get_sset_count" #835: FILE: net/dsa/slave.c:798: + if (ds->ops->get_sset_count != NULL) total: 0 errors, 0 warnings, 4 checks, 784 lines checked Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23net-tcp: retire TFO_SERVER_WO_SOCKOPT2 configYuchung Cheng
TFO_SERVER_WO_SOCKOPT2 was intended for debugging purposes during Fast Open development. Remove this config option and also update/clean-up the documentation of the Fast Open sysctl. Reported-by: Piotr Jurkiewicz <piotr.jerzy.jurkiewicz@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17strparser: DocumentationTom Herbert
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12Merge tag 'batadv-next-for-davem-20160812' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes (mostly chronological order): - bump version strings, by Simon Wunderlich - kerneldoc clean up, by Sven Eckelmann - enable RTNL automatic loading and according documentation changes, by Sven Eckelmann (2 patches) - fix/improve interface removal and associated locking, by Sven Eckelmann (3 patches) - clean up unused variables, by Linus Luessing - implement Gateway selection code for B.A.T.M.A.N. V by Antonio Quartulli (4 patches) - rewrite TQ comparison by Markus Pargmann - fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels kbuild test robot) and bitwise arithmetic operations (by Linus Luessing) - rewrite packet creation for forwarding for readability and to avoid reference count mistakes, by Linus Luessing - use kmem_cache for translation table, which results in more efficient storing of translation table entries, by Sven Eckelmann - rewrite/clarify reference handling for send_skb_unicast, by Sven Eckelmann - fix debug messages when updating routes, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)Netanel Belgazal
This is a driver for the ENA family of networking devices. Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09batman-adv: Use rtnl link in device creation exampleSven Eckelmann
The standard kernel API to add new virtual interfaces and attach other interfaces to it is rtnl-link. batman-adv supports it since v3.10. This functionality should be used instead of the legacy batman-adv-only sysfs interface. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-08-06rxrpc: Fix races between skb free, ACK generation and replyingDavid Howells
Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Documentation: RDS: Document Multipath RDS (mprds)Sowmini Varadhan
Document the design of mprds, covering a brief description of the motivation, data-structures and modifications to the RDS control plane. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15Documentation: RDS: updates for SO_RDS_TRANSPORT socket optionSowmini Varadhan
Update the documentation to describe the changes added by commit 8ba38460f363 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT") Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13net: vrf: Address comments from last documentation updateDavid Ahern
Comments from Frank Kellerman on last doc update: - extra whitespace in front of a neigh show command - convert the brief link example to 'vrf red' Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13net: vrf: Documentation updateDavid Ahern
Update vrf documentation for changes made to 4.4 - 4.8 kernels and iproute2 support for vrf keyword. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Don't use userspace datatypes in bridge netfilter code, from Tobin Harding. 2) Iterate only once over the expectation table when removing the helper module, instead of once per-netns, from Florian Westphal. 3) Extra sanitization in xt_hook_ops_alloc() to return error in case we ever pass zero hooks, xt_hook_ops_alloc(): 4) Handle NFPROTO_INET from the logging core infrastructure, from Liping Zhang. 5) Autoload loggers when TRACE target is used from rules, this doesn't change the behaviour in case the user already selected nfnetlink_log as preferred way to print tracing logs, also from Liping Zhang. 6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields by cache lines, increases the size of entries in 11% per entry. From Florian Westphal. 7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian. 8) Remove useless defensive check in nf_logger_find_get() from Shivani Bhardwaj. 9) Remove zone extension as place it in the conntrack object, this is always include in the hashing and we expect more intensive use of zones since containers are in place. Also from Florian Westphal. 10) Owner match now works from any namespace, from Eric Bierdeman. 11) Make sure we only reply with TCP reset to TCP traffic from nf_reject_ipv4, patch from Liping Zhang. 12) Introduce --nflog-size to indicate amount of network packet bytes that are copied to userspace via log message, from Vishwanath Pai. This obsoletes --nflog-range that has never worked, it was designed to achieve this but it has never worked. 13) Introduce generic macros for nf_tables object generation masks. 14) Use generation mask in table, chain and set objects in nf_tables. This allows fixes interferences with ongoing preparation phase of the commit protocol and object listings going on at the same time. This update is introduced in three patches, one per object. 15) Check if the object is active in the next generation for element deactivation in the rbtree implementation, given that deactivation happens from the commit phase path we have to observe the future status of the object. 16) Support for deletion of just added elements in the hash set type. 17) Allow to resize hashtable from /proc entry, not only from the obscure /sys entry that maps to the module parameter, from Florian Westphal. 18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised anymore since we tear down the ruleset whenever the netdevice goes away. 19) Support for matching inverted set lookups, from Arturo Borrero. 20) Simplify the iptables_mangle_hook() by removing a superfluous extra branch. 21) Introduce ether_addr_equal_masked() and use it from the netfilter codebase, from Joe Perches. 22) Remove references to "Use netfilter MARK value as routing key" from the Netfilter Kconfig description given that this toggle doesn't exists already for 10 years, from Moritz Sichert. 23) Introduce generic NF_INVF() and use it from the xtables codebase, from Joe Perches. 24) Setting logger to NONE via /proc was not working unless explicit nul-termination was included in the string. This fixes seems to leave the former behaviour there, so we don't break backward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28drivers: net: stmmac: reworking the PCS code.Giuseppe CAVALLARO
The 3.xx and 4.xx synopsys gmacs have a very similar PCS embedded module and they share almost the same registers: for example: AN_Control, AN_Status, AN_Advertisement, AN_Link_Partner_Ability, AN_Expansion, TBI_Extended_Status. Just the RGMII/SMII Control/Status register differs. So This patch aims to reorganize and enhance the PCS support. It removes the existent support from the dwmac1000/dwmac4_core.c moving basic PCS functions inside a new file called: stmmac_pcs.h. The patch also reviews the available APIs to be better shared among different hardware and easily enhanced to support new features. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-24netfilter: conntrack: allow increasing bucket size via sysctl tooFlorian Westphal
No need to restrict this to module parameter. We export a copy of the real hash size -- when user alters the value we allocate the new table, copy entries etc before we update the real size to the requested one. This is also needed because the real size is used by concurrent readers and cannot be changed without synchronizing the conntrack generation seqcnt. We only allow changing this value from the initial net namespace. Tested using http-client-benchmark vs. httpterm with concurrent while true;do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-17can: bcm: add documentation for CAN FD supportOliver Hartkopp
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-07net: sched: do not acquire qdisc spinlock in qdisc/class stats dumpEric Dumazet
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale : For each qdisc/class found in the dump, we currently lock the root qdisc spinlock in order to get stats. Sampling stats every 5 seconds from thousands of HTB classes is a challenge when the root qdisc spinlock is under high pressure. Not only the dumps take time, they also slow down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases. An audit of existing qdiscs showed that sch_fq_codel is the only qdisc that might need the qdisc lock in fq_codel_dump_stats() and fq_codel_dump_class_stats() In v2 of this patch, I now use the Qdisc running seqcount to provide consistent reads of packets/bytes counters, regardless of 32/64 bit arches. I also changed rate estimators to use the same infrastructure so that they no longer need to lock root qdisc lock. [1] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kevin Athey <kda@google.com> Cc: Xiaotian Pei <xiaotian@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29Documentation: ip-sysctl.txt: clarify secure_redirectsEric Garver
Clarify how secure_redirects works. Mention that RFC1122 always applies. Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25Documentation: networking: dsa: Describe port_vlan_filteringFlorian Fainelli
Described what the port_vlan_filtering function is supposed to accomplish. Fixes: fb2dabad69f0 ("net: dsa: support VLAN filtering switchdev attr") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25Documentation: networking: dsa: Remove priv_size descriptionFlorian Fainelli
We no longer have a priv_size structure member since 5feebd0a8a79 ("net: dsa: Remove allocation of driver private memory") Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25Documentation: networking: dsa: Remove poll_link descriptionFlorian Fainelli
This function has been removed in 4baee937b8d5 ("net: dsa: remove DSA link polling") in favor of using the PHYLIB polling mechanism. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-19Merge tag 'docs-for-linus' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jon Corbet: "A bit busier this time around. The most interesting thing (IMO) this time around is some beginning infrastructural work to allow documents to be written using restructured text. Maybe someday, in a galaxy far far away, we'll be able to eliminate the DocBook dependency and have a much better integrated set of kernel docs. Someday. Beyond that, there's a new document on security hardening from Kees, the movement of some sample code over to samples/, a number of improvements to the serial docs from Geert, and the usual collection of corrections, typo fixes, etc" * tag 'docs-for-linus' of git://git.lwn.net/linux: (55 commits) doc: self-protection: provide initial details serial: doc: Use port->state instead of info serial: doc: Always refer to tty_port->mutex Documentation: vm: Spelling s/paltform/platform/g Documentation/memcg: update kmem limit doc as codes behavior docproc: print a comment about autogeneration for rst output docproc: add support for reStructuredText format via --rst option docproc: abstract terminating lines at first space docproc: abstract docproc directive detection docproc: reduce unnecessary indentation docproc: add variables for subcommand and filename kernel-doc: use rst C domain directives and references for types kernel-doc: produce RestructuredText output kernel-doc: rewrite usage description, remove duplicated comments Doc: correct the location of sysrq.c Documentation: fix common spelling mistakes samples: v4l: from Documentation to samples directory samples: connector: from Documentation to samples directory Documentation: xillybus: fix spelling mistake Documentation: x86: fix spelling mistakes ...
2016-05-16bpf, doc: fix typo on bpf_asm descriptionsDaniel Borkmann
Fix description of some of the bpf_asm tool related jump instructions and generally move them to format A <op> k. Reported-by: Sebastian Amend <sebastian.amend@googlemail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08Documentation/networking: more accurate LCO explanationShmulik Ladkani
In few places the term "ones-complement sum" was used but the actual meaning is "the complement of the ones-complement sum". Also, avoid enclosing long statements with underscore, to ease readability. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: add documentation for 'direct packet access'Alexei Starovoitov
explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04bonding: update documentation section after dev->trans_start removalFlorian Westphal
Drivers that use LLTX need to update trans_start of the netdev_queue. (Most drivers don't use LLTX; stack does this update if .ndo_start_xmit returned TX_OK). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>