From e7096c131e5161fa3b8e52a650d7719d2857adfd Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Mon, 9 Dec 2019 00:27:34 +0100 Subject: net: WireGuard secure network tunnel WireGuard is a layer 3 secure networking tunnel made specifically for the kernel, that aims to be much simpler and easier to audit than IPsec. Extensive documentation and description of the protocol and considerations, along with formal proofs of the cryptography, are available at: * https://www.wireguard.com/ * https://www.wireguard.com/papers/wireguard.pdf This commit implements WireGuard as a simple network device driver, accessible in the usual RTNL way used by virtual network drivers. It makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of networking subsystem APIs. It has a somewhat novel multicore queueing system designed for maximum throughput and minimal latency of encryption operations, but it is implemented modestly using workqueues and NAPI. Configuration is done via generic Netlink, and following a review from the Netlink maintainer a year ago, several high profile userspace tools have already implemented the API. This commit also comes with several different tests, both in-kernel tests and out-of-kernel tests based on network namespaces, taking profit of the fact that sockets used by WireGuard intentionally stay in the namespace the WireGuard interface was originally created, exactly like the semantics of userspace tun devices. See wireguard.com/netns/ for pictures and examples. The source code is fairly short, but rather than combining everything into a single file, WireGuard is developed as cleanly separable files, making auditing and comprehension easier. Things are laid out as follows: * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the cryptographic aspects of the protocol, and are mostly data-only in nature, taking in buffers of bytes and spitting out buffers of bytes. They also handle reference counting for their various shared pieces of data, like keys and key lists. * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for ratelimiting certain types of cryptographic operations in accordance with particular WireGuard semantics. * allowedips.[ch], peerlookup.[ch]: The main lookup structures of WireGuard, the former being trie-like with particular semantics, an integral part of the design of the protocol, and the latter just being nice helper functions around the various hashtables we use. * device.[ch]: Implementation of functions for the netdevice and for rtnl, responsible for maintaining the life of a given interface and wiring it up to the rest of WireGuard. * peer.[ch]: Each interface has a list of peers, with helper functions available here for creation, destruction, and reference counting. * socket.[ch]: Implementation of functions related to udp_socket and the general set of kernel socket APIs, for sending and receiving ciphertext UDP packets, and taking care of WireGuard-specific sticky socket routing semantics for the automatic roaming. * netlink.[ch]: Userspace API entry point for configuring WireGuard peers and devices. The API has been implemented by several userspace tools and network management utility, and the WireGuard project distributes the basic wg(8) tool. * queueing.[ch]: Shared function on the rx and tx path for handling the various queues used in the multicore algorithms. * send.c: Handles encrypting outgoing packets in parallel on multiple cores, before sending them in order on a single core, via workqueues and ring buffers. Also handles sending handshake and cookie messages as part of the protocol, in parallel. * receive.c: Handles decrypting incoming packets in parallel on multiple cores, before passing them off in order to be ingested via the rest of the networking subsystem with GRO via the typical NAPI poll function. Also handles receiving handshake and cookie messages as part of the protocol, in parallel. * timers.[ch]: Uses the timer wheel to implement protocol particular event timeouts, and gives a set of very simple event-driven entry point functions for callers. * main.c, version.h: Initialization and deinitialization of the module. * selftest/*.h: Runtime unit tests for some of the most security sensitive functions. * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing script using network namespaces. This commit aims to be as self-contained as possible, implementing WireGuard as a standalone module not needing much special handling or coordination from the network subsystem. I expect for future optimizations to the network stack to positively improve WireGuard, and vice-versa, but for the time being, this exists as intentionally standalone. We introduce a menu option for CONFIG_WIREGUARD, as well as providing a verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG. Signed-off-by: Jason A. Donenfeld Cc: David Miller Cc: Greg KH Cc: Linus Torvalds Cc: Herbert Xu Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller --- include/uapi/linux/wireguard.h | 196 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 196 insertions(+) create mode 100644 include/uapi/linux/wireguard.h (limited to 'include') diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h new file mode 100644 index 000000000000..dd8a47c4ad11 --- /dev/null +++ b/include/uapi/linux/wireguard.h @@ -0,0 +1,196 @@ +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ +/* + * Copyright (C) 2015-2019 Jason A. Donenfeld . All Rights Reserved. + * + * Documentation + * ============= + * + * The below enums and macros are for interfacing with WireGuard, using generic + * netlink, with family WG_GENL_NAME and version WG_GENL_VERSION. It defines two + * methods: get and set. Note that while they share many common attributes, + * these two functions actually accept a slightly different set of inputs and + * outputs. + * + * WG_CMD_GET_DEVICE + * ----------------- + * + * May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command should contain + * one but not both of: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * + * The kernel will then return several messages (NLM_F_MULTI) containing the + * following tree of nested items: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN + * WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN + * WGDEVICE_A_LISTEN_PORT: NLA_U16 + * WGDEVICE_A_FWMARK: NLA_U32 + * WGDEVICE_A_PEERS: NLA_NESTED + * 0: NLA_NESTED + * WGPEER_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN + * WGPEER_A_PRESHARED_KEY: NLA_EXACT_LEN, len WG_KEY_LEN + * WGPEER_A_ENDPOINT: NLA_MIN_LEN(struct sockaddr), struct sockaddr_in or struct sockaddr_in6 + * WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16 + * WGPEER_A_LAST_HANDSHAKE_TIME: NLA_EXACT_LEN, struct __kernel_timespec + * WGPEER_A_RX_BYTES: NLA_U64 + * WGPEER_A_TX_BYTES: NLA_U64 + * WGPEER_A_ALLOWEDIPS: NLA_NESTED + * 0: NLA_NESTED + * WGALLOWEDIP_A_FAMILY: NLA_U16 + * WGALLOWEDIP_A_IPADDR: NLA_MIN_LEN(struct in_addr), struct in_addr or struct in6_addr + * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * 0: NLA_NESTED + * ... + * 0: NLA_NESTED + * ... + * ... + * WGPEER_A_PROTOCOL_VERSION: NLA_U32 + * 0: NLA_NESTED + * ... + * ... + * + * It is possible that all of the allowed IPs of a single peer will not + * fit within a single netlink message. In that case, the same peer will + * be written in the following message, except it will only contain + * WGPEER_A_PUBLIC_KEY and WGPEER_A_ALLOWEDIPS. This may occur several + * times in a row for the same peer. It is then up to the receiver to + * coalesce adjacent peers. Likewise, it is possible that all peers will + * not fit within a single message. So, subsequent peers will be sent + * in following messages, except those will only contain WGDEVICE_A_IFNAME + * and WGDEVICE_A_PEERS. It is then up to the receiver to coalesce these + * messages to form the complete list of peers. + * + * Since this is an NLA_F_DUMP command, the final message will always be + * NLMSG_DONE, even if an error occurs. However, this NLMSG_DONE message + * contains an integer error code. It is either zero or a negative error + * code corresponding to the errno. + * + * WG_CMD_SET_DEVICE + * ----------------- + * + * May only be called via NLM_F_REQUEST. The command should contain the + * following tree of nested items, containing one but not both of + * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current + * peers should be removed prior to adding the list below. + * WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove + * WGDEVICE_A_LISTEN_PORT: NLA_U16, 0 to choose randomly + * WGDEVICE_A_FWMARK: NLA_U32, 0 to disable + * WGDEVICE_A_PEERS: NLA_NESTED + * 0: NLA_NESTED + * WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN + * WGPEER_A_FLAGS: NLA_U32, 0 and/or WGPEER_F_REMOVE_ME if the + * specified peer should not exist at the end of the + * operation, rather than added/updated and/or + * WGPEER_F_REPLACE_ALLOWEDIPS if all current allowed + * IPs of this peer should be removed prior to adding + * the list below and/or WGPEER_F_UPDATE_ONLY if the + * peer should only be set if it already exists. + * WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN, all zeros to remove + * WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6 + * WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16, 0 to disable + * WGPEER_A_ALLOWEDIPS: NLA_NESTED + * 0: NLA_NESTED + * WGALLOWEDIP_A_FAMILY: NLA_U16 + * WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr + * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * 0: NLA_NESTED + * ... + * 0: NLA_NESTED + * ... + * ... + * WGPEER_A_PROTOCOL_VERSION: NLA_U32, should not be set or used at + * all by most users of this API, as the + * most recent protocol will be used when + * this is unset. Otherwise, must be set + * to 1. + * 0: NLA_NESTED + * ... + * ... + * + * It is possible that the amount of configuration data exceeds that of + * the maximum message length accepted by the kernel. In that case, several + * messages should be sent one after another, with each successive one + * filling in information not contained in the prior. Note that if + * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably + * should not be specified in fragments that come after, so that the list + * of peers is only cleared the first time but appened after. Likewise for + * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message + * of a peer, it likely should not be specified in subsequent fragments. + * + * If an error occurs, NLMSG_ERROR will reply containing an errno. + */ + +#ifndef _WG_UAPI_WIREGUARD_H +#define _WG_UAPI_WIREGUARD_H + +#define WG_GENL_NAME "wireguard" +#define WG_GENL_VERSION 1 + +#define WG_KEY_LEN 32 + +enum wg_cmd { + WG_CMD_GET_DEVICE, + WG_CMD_SET_DEVICE, + __WG_CMD_MAX +}; +#define WG_CMD_MAX (__WG_CMD_MAX - 1) + +enum wgdevice_flag { + WGDEVICE_F_REPLACE_PEERS = 1U << 0, + __WGDEVICE_F_ALL = WGDEVICE_F_REPLACE_PEERS +}; +enum wgdevice_attribute { + WGDEVICE_A_UNSPEC, + WGDEVICE_A_IFINDEX, + WGDEVICE_A_IFNAME, + WGDEVICE_A_PRIVATE_KEY, + WGDEVICE_A_PUBLIC_KEY, + WGDEVICE_A_FLAGS, + WGDEVICE_A_LISTEN_PORT, + WGDEVICE_A_FWMARK, + WGDEVICE_A_PEERS, + __WGDEVICE_A_LAST +}; +#define WGDEVICE_A_MAX (__WGDEVICE_A_LAST - 1) + +enum wgpeer_flag { + WGPEER_F_REMOVE_ME = 1U << 0, + WGPEER_F_REPLACE_ALLOWEDIPS = 1U << 1, + WGPEER_F_UPDATE_ONLY = 1U << 2, + __WGPEER_F_ALL = WGPEER_F_REMOVE_ME | WGPEER_F_REPLACE_ALLOWEDIPS | + WGPEER_F_UPDATE_ONLY +}; +enum wgpeer_attribute { + WGPEER_A_UNSPEC, + WGPEER_A_PUBLIC_KEY, + WGPEER_A_PRESHARED_KEY, + WGPEER_A_FLAGS, + WGPEER_A_ENDPOINT, + WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, + WGPEER_A_LAST_HANDSHAKE_TIME, + WGPEER_A_RX_BYTES, + WGPEER_A_TX_BYTES, + WGPEER_A_ALLOWEDIPS, + WGPEER_A_PROTOCOL_VERSION, + __WGPEER_A_LAST +}; +#define WGPEER_A_MAX (__WGPEER_A_LAST - 1) + +enum wgallowedip_attribute { + WGALLOWEDIP_A_UNSPEC, + WGALLOWEDIP_A_FAMILY, + WGALLOWEDIP_A_IPADDR, + WGALLOWEDIP_A_CIDR_MASK, + __WGALLOWEDIP_A_LAST +}; +#define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1) + +#endif /* _WG_UAPI_WIREGUARD_H */ -- cgit v1.2.3 From b50b0580d27bc45a0637aefc8bac4d31aa85771a Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Mon, 25 Nov 2019 14:48:57 +0100 Subject: net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram This will be used by ESP over TCP to handle the queue of IKE messages. Signed-off-by: Sabrina Dubroca Acked-by: David S. Miller Signed-off-by: Steffen Klassert --- include/linux/skbuff.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index e9133bcf0544..49a10f9cc538 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3459,7 +3459,8 @@ static inline void skb_frag_list_init(struct sk_buff *skb) for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next) -int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p, +int __skb_wait_for_more_packets(struct sock *sk, struct sk_buff_head *queue, + int *err, long *timeo_p, const struct sk_buff *skb); struct sk_buff *__skb_try_recv_from_queue(struct sock *sk, struct sk_buff_head *queue, @@ -3468,12 +3469,16 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk, struct sk_buff *skb), int *off, int *err, struct sk_buff **last); -struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags, +struct sk_buff *__skb_try_recv_datagram(struct sock *sk, + struct sk_buff_head *queue, + unsigned int flags, void (*destructor)(struct sock *sk, struct sk_buff *skb), int *off, int *err, struct sk_buff **last); -struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags, +struct sk_buff *__skb_recv_datagram(struct sock *sk, + struct sk_buff_head *sk_queue, + unsigned int flags, void (*destructor)(struct sock *sk, struct sk_buff *skb), int *off, int *err); -- cgit v1.2.3 From 7b3801927e52f8621de311277f7fc727635019e7 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Mon, 25 Nov 2019 14:48:58 +0100 Subject: xfrm: introduce xfrm_trans_queue_net This will be used by TCP encapsulation to write packets to the encap socket without holding the user socket's lock. Without this reinjection, we're already holding the lock of the user socket, and then try to lock the encap socket as well when we enqueue the encrypted packet. While at it, add a BUILD_BUG_ON like we usually do for skb->cb, since it's missing for struct xfrm_trans_cb. Co-developed-by: Herbert Xu Signed-off-by: Herbert Xu Signed-off-by: Sabrina Dubroca Acked-by: David S. Miller Signed-off-by: Steffen Klassert --- include/net/xfrm.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/xfrm.h b/include/net/xfrm.h index dda3c025452e..56ff86621bb4 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1547,6 +1547,9 @@ int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload); int xfrm_init_state(struct xfrm_state *x); int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type); int xfrm_input_resume(struct sk_buff *skb, int nexthdr); +int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb, + int (*finish)(struct net *, struct sock *, + struct sk_buff *)); int xfrm_trans_queue(struct sk_buff *skb, int (*finish)(struct net *, struct sock *, struct sk_buff *)); -- cgit v1.2.3 From e27cca96cd68fa2c6814c90f9a1cfd36bb68c593 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Mon, 25 Nov 2019 14:49:02 +0100 Subject: xfrm: add espintcp (RFC 8229) TCP encapsulation of IKE and IPsec messages (RFC 8229) is implemented as a TCP ULP, overriding in particular the sendmsg and recvmsg operations. A Stream Parser is used to extract messages out of the TCP stream using the first 2 bytes as length marker. Received IKE messages are put on "ike_queue", waiting to be dequeued by the custom recvmsg implementation. Received ESP messages are sent to XFRM, like with UDP encapsulation. Some of this code is taken from the original submission by Herbert Xu. Currently, only IPv4 is supported, like for UDP encapsulation. Co-developed-by: Herbert Xu Signed-off-by: Herbert Xu Signed-off-by: Sabrina Dubroca Acked-by: David S. Miller Signed-off-by: Steffen Klassert --- include/net/espintcp.h | 39 +++++++++++++++++++++++++++++++++++++++ include/net/xfrm.h | 1 + include/uapi/linux/udp.h | 1 + 3 files changed, 41 insertions(+) create mode 100644 include/net/espintcp.h (limited to 'include') diff --git a/include/net/espintcp.h b/include/net/espintcp.h new file mode 100644 index 000000000000..dd7026a00066 --- /dev/null +++ b/include/net/espintcp.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _NET_ESPINTCP_H +#define _NET_ESPINTCP_H + +#include +#include + +void __init espintcp_init(void); + +int espintcp_push_skb(struct sock *sk, struct sk_buff *skb); +int espintcp_queue_out(struct sock *sk, struct sk_buff *skb); +bool tcp_is_ulp_esp(struct sock *sk); + +struct espintcp_msg { + struct sk_buff *skb; + struct sk_msg skmsg; + int offset; + int len; +}; + +struct espintcp_ctx { + struct strparser strp; + struct sk_buff_head ike_queue; + struct sk_buff_head out_queue; + struct espintcp_msg partial; + void (*saved_data_ready)(struct sock *sk); + void (*saved_write_space)(struct sock *sk); + struct work_struct work; + bool tx_running; +}; + +static inline struct espintcp_ctx *espintcp_getctx(const struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + /* RCU is only needed for diag */ + return (__force void *)icsk->icsk_ulp_data; +} +#endif diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 56ff86621bb4..8f71c111e65a 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -193,6 +193,7 @@ struct xfrm_state { /* Data for encapsulator */ struct xfrm_encap_tmpl *encap; + struct sock __rcu *encap_sk; /* Data for care-of address */ xfrm_address_t *coaddr; diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h index 30baccb6c9c4..4828794efcf8 100644 --- a/include/uapi/linux/udp.h +++ b/include/uapi/linux/udp.h @@ -42,5 +42,6 @@ struct udphdr { #define UDP_ENCAP_GTP0 4 /* GSM TS 09.60 */ #define UDP_ENCAP_GTP1U 5 /* 3GPP TS 29.060 */ #define UDP_ENCAP_RXRPC 6 +#define TCP_ENCAP_ESPINTCP 7 /* Yikes, this is really xfrm encap types. */ #endif /* _UAPI_LINUX_UDP_H */ -- cgit v1.2.3 From 65e6d90168f3593df0ae598502bcbf20d78ff0fb Mon Sep 17 00:00:00 2001 From: "Kevin(Yudong) Yang" Date: Mon, 9 Dec 2019 14:19:59 -0500 Subject: net-tcp: Disable TCP ssthresh metrics cache by default This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save" that disables TCP ssthresh metrics cache by default. Other parts of TCP metrics cache, e.g. rtt, cwnd, remain unchanged. As modern networks becoming more and more dynamic, TCP metrics cache today often causes more harm than benefits. For example, the same IP address is often shared by different subscribers behind NAT in residential networks. Even if the IP address is not shared by different users, caching the slow-start threshold of a previous short flow using loss-based congestion control (e.g. cubic) often causes the future longer flows of the same network path to exit slow-start prematurely with abysmal throughput. Caching ssthresh is very risky and can lead to terrible performance. Therefore it makes sense to make disabling ssthresh caching by default and opt-in for specific networks by the administrators. This practice also has worked well for several years of deployment with CUBIC congestion control at Google. Acked-by: Eric Dumazet Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Signed-off-by: Kevin(Yudong) Yang Signed-off-by: David S. Miller --- include/net/netns/ipv4.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index c0c0791b1912..08b98414d94e 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -154,6 +154,7 @@ struct netns_ipv4 { int sysctl_tcp_adv_win_scale; int sysctl_tcp_frto; int sysctl_tcp_nometrics_save; + int sysctl_tcp_no_ssthresh_metrics_save; int sysctl_tcp_moderate_rcvbuf; int sysctl_tcp_tso_win_divisor; int sysctl_tcp_workaround_signed_windows; -- cgit v1.2.3 From bae141f54be83b06652c1d47e50e4e75ed4e9c7e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 6 Dec 2019 22:49:34 +0100 Subject: bpf: Emit audit messages upon successful prog load and unload Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event. The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core. Raw example output: # auditctl -D # auditctl -a always,exit -F arch=x86_64 -S bpf # ausearch --start recent -m 1334 ... ---- time->Wed Nov 27 16:04:13 2019 type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf" type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \ success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477 \ pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001 \ egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf" \ exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf" \ subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD ---- time->Wed Nov 27 16:04:13 2019 type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD ... Signed-off-by: Daniel Borkmann Co-developed-by: Jiri Olsa Signed-off-by: Jiri Olsa Acked-by: Paul Moore Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org --- include/uapi/linux/audit.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index 3ad935527177..a534d71e689a 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -116,6 +116,7 @@ #define AUDIT_FANOTIFY 1331 /* Fanotify access decision */ #define AUDIT_TIME_INJOFFSET 1332 /* Timekeeping offset injected */ #define AUDIT_TIME_ADJNTPVAL 1333 /* NTP value adjustment */ +#define AUDIT_BPF 1334 /* BPF subsystem */ #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */ #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */ -- cgit v1.2.3 From a4516c7053b96fed98a0439a9226983b5275474b Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 11 Dec 2019 10:55:59 +0000 Subject: net: sfp: derive interface mode from ethtool link modes We don't need the EEPROM ID to derive the phy interface mode as we can derive it merely from the ethtool link modes. Remove the EEPROM ID argument to sfp_select_interface(). Reviewed-by: Andrew Lunn Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/sfp.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 487fd9412d10..8d7b98c214d7 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -504,7 +504,6 @@ int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id, void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support); phy_interface_t sfp_select_interface(struct sfp_bus *bus, - const struct sfp_eeprom_id *id, unsigned long *link_modes); int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo); @@ -532,7 +531,6 @@ static inline void sfp_parse_support(struct sfp_bus *bus, } static inline phy_interface_t sfp_select_interface(struct sfp_bus *bus, - const struct sfp_eeprom_id *id, unsigned long *link_modes) { return PHY_INTERFACE_MODE_NA; -- cgit v1.2.3 From 0fbd26a9fb6875b98fcfff523831fec47bc5e9a2 Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 11 Dec 2019 10:56:04 +0000 Subject: net: sfp: add more extended compliance codes SFF-8024 is used to define various constants re-used in several SFF SFP-related specifications. Split these constants from the enum, and rename them to indicate that they're defined by SFF-8024. Add and use updated SFF-8024 extended compliance code definitions for 10GBASE-T, 5GBASE-T and 2.5GBASE-T modules. Reviewed-by: Andrew Lunn Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/sfp.h | 82 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 27 deletions(-) (limited to 'include') diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 8d7b98c214d7..373d8b67ea86 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -275,6 +275,61 @@ struct sfp_diag { __be16 cal_v_offset; } __packed; +/* SFF8024 defined constants */ +enum { + SFF8024_ID_UNK = 0x00, + SFF8024_ID_SFF_8472 = 0x02, + SFF8024_ID_SFP = 0x03, + SFF8024_ID_DWDM_SFP = 0x0b, + SFF8024_ID_QSFP_8438 = 0x0c, + SFF8024_ID_QSFP_8436_8636 = 0x0d, + SFF8024_ID_QSFP28_8636 = 0x11, + + SFF8024_ENCODING_UNSPEC = 0x00, + SFF8024_ENCODING_8B10B = 0x01, + SFF8024_ENCODING_4B5B = 0x02, + SFF8024_ENCODING_NRZ = 0x03, + SFF8024_ENCODING_8472_MANCHESTER= 0x04, + SFF8024_ENCODING_8472_SONET = 0x05, + SFF8024_ENCODING_8472_64B66B = 0x06, + SFF8024_ENCODING_8436_MANCHESTER= 0x06, + SFF8024_ENCODING_8436_SONET = 0x04, + SFF8024_ENCODING_8436_64B66B = 0x05, + SFF8024_ENCODING_256B257B = 0x07, + SFF8024_ENCODING_PAM4 = 0x08, + + SFF8024_CONNECTOR_UNSPEC = 0x00, + /* codes 01-05 not supportable on SFP, but some modules have single SC */ + SFF8024_CONNECTOR_SC = 0x01, + SFF8024_CONNECTOR_FIBERJACK = 0x06, + SFF8024_CONNECTOR_LC = 0x07, + SFF8024_CONNECTOR_MT_RJ = 0x08, + SFF8024_CONNECTOR_MU = 0x09, + SFF8024_CONNECTOR_SG = 0x0a, + SFF8024_CONNECTOR_OPTICAL_PIGTAIL= 0x0b, + SFF8024_CONNECTOR_MPO_1X12 = 0x0c, + SFF8024_CONNECTOR_MPO_2X16 = 0x0d, + SFF8024_CONNECTOR_HSSDC_II = 0x20, + SFF8024_CONNECTOR_COPPER_PIGTAIL= 0x21, + SFF8024_CONNECTOR_RJ45 = 0x22, + SFF8024_CONNECTOR_NOSEPARATE = 0x23, + SFF8024_CONNECTOR_MXC_2X16 = 0x24, + + SFF8024_ECC_UNSPEC = 0x00, + SFF8024_ECC_100G_25GAUI_C2M_AOC = 0x01, + SFF8024_ECC_100GBASE_SR4_25GBASE_SR = 0x02, + SFF8024_ECC_100GBASE_LR4_25GBASE_LR = 0x03, + SFF8024_ECC_100GBASE_ER4_25GBASE_ER = 0x04, + SFF8024_ECC_100GBASE_SR10 = 0x05, + SFF8024_ECC_100GBASE_CR4 = 0x0b, + SFF8024_ECC_25GBASE_CR_S = 0x0c, + SFF8024_ECC_25GBASE_CR_N = 0x0d, + SFF8024_ECC_10GBASE_T_SFI = 0x16, + SFF8024_ECC_10GBASE_T_SR = 0x1c, + SFF8024_ECC_5GBASE_T = 0x1d, + SFF8024_ECC_2_5GBASE_T = 0x1e, +}; + /* SFP EEPROM registers */ enum { SFP_PHYS_ID = 0x00, @@ -309,34 +364,7 @@ enum { SFP_SFF8472_COMPLIANCE = 0x5e, SFP_CC_EXT = 0x5f, - SFP_PHYS_ID_SFF = 0x02, - SFP_PHYS_ID_SFP = 0x03, SFP_PHYS_EXT_ID_SFP = 0x04, - SFP_CONNECTOR_UNSPEC = 0x00, - /* codes 01-05 not supportable on SFP, but some modules have single SC */ - SFP_CONNECTOR_SC = 0x01, - SFP_CONNECTOR_FIBERJACK = 0x06, - SFP_CONNECTOR_LC = 0x07, - SFP_CONNECTOR_MT_RJ = 0x08, - SFP_CONNECTOR_MU = 0x09, - SFP_CONNECTOR_SG = 0x0a, - SFP_CONNECTOR_OPTICAL_PIGTAIL = 0x0b, - SFP_CONNECTOR_MPO_1X12 = 0x0c, - SFP_CONNECTOR_MPO_2X16 = 0x0d, - SFP_CONNECTOR_HSSDC_II = 0x20, - SFP_CONNECTOR_COPPER_PIGTAIL = 0x21, - SFP_CONNECTOR_RJ45 = 0x22, - SFP_CONNECTOR_NOSEPARATE = 0x23, - SFP_CONNECTOR_MXC_2X16 = 0x24, - SFP_ENCODING_UNSPEC = 0x00, - SFP_ENCODING_8B10B = 0x01, - SFP_ENCODING_4B5B = 0x02, - SFP_ENCODING_NRZ = 0x03, - SFP_ENCODING_8472_MANCHESTER = 0x04, - SFP_ENCODING_8472_SONET = 0x05, - SFP_ENCODING_8472_64B66B = 0x06, - SFP_ENCODING_256B257B = 0x07, - SFP_ENCODING_PAM4 = 0x08, SFP_OPTIONS_HIGH_POWER_LEVEL = BIT(13), SFP_OPTIONS_PAGING_A2 = BIT(12), SFP_OPTIONS_RETIMER = BIT(11), -- cgit v1.2.3 From 74c551ca5a0edcc9cf66a3b73fd95b9a8615bfd0 Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 11 Dec 2019 10:56:09 +0000 Subject: net: sfp: add module start/stop upstream notifications When dealing with some copper modules, we can't positively know the module capabilities are until we have probed the PHY. Without the full capabilities, we may end up failing a module that we could otherwise drive with a restricted set of capabilities. An example of this would be a module with a NBASE-T PHY plugged into a host that supports phy interface modes 2500BASE-X and SGMII. The PHY supports 10GBASE-R, 5000BASE-X, 2500BASE-X, SGMII interface modes, which means a subset of the capabilities are compatible with the host. However, reading the module EEPROM leads us to believe that the module only supports ethtool link mode 10GBASE-T, which is incompatible with the host - and thus results in the module being rejected. This patch adds an extra notification which are triggered after the SFP module's PHY probe, and a corresponding notification just before the PHY is removed. Reviewed-by: Andrew Lunn Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/sfp.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 373d8b67ea86..66a56396e8e3 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -507,6 +507,8 @@ struct sfp_bus; * @module_insert: called after a module has been detected to determine * whether the module is supported for the upstream device. * @module_remove: called after the module has been removed. + * @module_start: called after the PHY probe step + * @module_stop: called before the PHY is removed * @link_down: called when the link is non-operational for whatever * reason. * @link_up: called when the link is operational. @@ -520,6 +522,8 @@ struct sfp_upstream_ops { void (*detach)(void *priv, struct sfp_bus *bus); int (*module_insert)(void *priv, const struct sfp_eeprom_id *id); void (*module_remove)(void *priv); + int (*module_start)(void *priv); + void (*module_stop)(void *priv); void (*link_down)(void *priv); void (*link_up)(void *priv); int (*connect_phy)(void *priv, struct phy_device *); -- cgit v1.2.3 From 52c956003a9d5bcae1f445f9dfd42b624adb6e87 Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 11 Dec 2019 10:56:45 +0000 Subject: net: phylink: delay MAC configuration for copper SFP modules Knowing whether we need to delay the MAC configuration because a module may have a PHY is useful to phylink to allow NBASE-T modules to work on systems supporting no more than 2.5G speeds. This commit allows us to delay such configuration until after the PHY has been probed by recording the parsed capabilities, and if the module may have a PHY, doing no more until the module_start() notification is called. At that point, we either have a PHY, or we don't. We move the PHY-based setup a little later, and use the PHYs support capabilities rather than the EEPROM parsed capabilities to determine whether we can support the PHY. Reviewed-by: Andrew Lunn Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/sfp.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 66a56396e8e3..38893e4dd0f0 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -533,6 +533,7 @@ struct sfp_upstream_ops { #if IS_ENABLED(CONFIG_SFP) int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support); +bool sfp_may_have_phy(struct sfp_bus *bus, const struct sfp_eeprom_id *id); void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support); phy_interface_t sfp_select_interface(struct sfp_bus *bus, @@ -556,6 +557,12 @@ static inline int sfp_parse_port(struct sfp_bus *bus, return PORT_OTHER; } +static inline bool sfp_may_have_phy(struct sfp_bus *bus, + const struct sfp_eeprom_id *id) +{ + return false; +} + static inline void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support) -- cgit v1.2.3 From ef343b35d46667668a099655fca4a5b2e43a5dfe Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Tue, 10 Dec 2019 11:43:03 +0100 Subject: vsock: add VMADDR_CID_LOCAL definition The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not used anymore, so we can reuse it for local communication (loopback) adding the new well-know CID: VMADDR_CID_LOCAL. Cc: Jorgen Hansen Reviewed-by: Stefan Hajnoczi Reviewed-by: Jorgen Hansen Signed-off-by: Stefano Garzarella Signed-off-by: David S. Miller --- include/uapi/linux/vm_sockets.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index 68d57c5e99bc..fd0ed7221645 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -99,11 +99,13 @@ #define VMADDR_CID_HYPERVISOR 0 -/* This CID is specific to VMCI and can be considered reserved (even VMCI - * doesn't use it anymore, it's a legacy value from an older release). +/* Use this as the destination CID in an address when referring to the + * local communication (loopback). + * (This was VMADDR_CID_RESERVED, but even VMCI doesn't use it anymore, + * it was a legacy value from an older release). */ -#define VMADDR_CID_RESERVED 1 +#define VMADDR_CID_LOCAL 1 /* Use this as the destination CID in an address when referring to the host * (any process other than the hypervisor). VMCI relies on it being 2, but -- cgit v1.2.3 From 0e12190578d081d4fe54d41635ec6e5a6eb0d01a Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Tue, 10 Dec 2019 11:43:04 +0100 Subject: vsock: add local transport support in the vsock core This patch allows to register a transport able to handle local communication (loopback). Reviewed-by: Stefan Hajnoczi Signed-off-by: Stefano Garzarella Signed-off-by: David S. Miller --- include/net/af_vsock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 4206dc6d813f..b1c717286993 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -98,6 +98,8 @@ struct vsock_transport_send_notify_data { #define VSOCK_TRANSPORT_F_G2H 0x00000002 /* Transport provides DGRAM communication */ #define VSOCK_TRANSPORT_F_DGRAM 0x00000004 +/* Transport provides local (loopback) communication */ +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008 struct vsock_transport { struct module *module; -- cgit v1.2.3 From b4653342b1514cb11f25b727c689451aff02996d Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 9 Dec 2019 13:03:40 +0300 Subject: net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd] This adds .show_fdinfo to socket_file_ops, so protocols will be able to print their specific data in fdinfo. Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller --- include/linux/net.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/net.h b/include/linux/net.h index 9cafb5f353a9..6451425e828f 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -171,6 +171,7 @@ struct proto_ops { int (*compat_getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); #endif + void (*show_fdinfo)(struct seq_file *m, struct socket *sock); int (*sendmsg) (struct socket *sock, struct msghdr *m, size_t total_len); /* Notes for implementing recvmsg: -- cgit v1.2.3 From 3c32da19a858fb1ae8a76bf899160be49f338506 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 9 Dec 2019 13:03:46 +0300 Subject: unix: Show number of pending scm files of receive queue in fdinfo Unix sockets like a block box. You never know what is stored there: there may be a file descriptor holding a mount or a block device, or there may be whole universes with namespaces, sockets with receive queues full of sockets etc. The patch adds a little debug and accounts number of files (not recursive), which is in receive queue of a unix socket. Sometimes this is useful to determine, that socket should be investigated or which task should be killed to put reference counter on a resourse. v2: Pass correct argument to lockdep Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller --- include/net/af_unix.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 3426d6dacc45..17e10fba2152 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -41,6 +41,10 @@ struct unix_skb_parms { u32 consumed; } __randomize_layout; +struct scm_stat { + u32 nr_fds; +}; + #define UNIXCB(skb) (*(struct unix_skb_parms *)&((skb)->cb)) #define unix_state_lock(s) spin_lock(&unix_sk(s)->lock) @@ -65,6 +69,7 @@ struct unix_sock { #define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; wait_queue_entry_t peer_wake; + struct scm_stat scm_stat; }; static inline struct unix_sock *unix_sk(const struct sock *sk) -- cgit v1.2.3 From f74877a5457d34d604dba6dbbb13c4c05bac8b93 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Wed, 11 Dec 2019 10:58:14 +0100 Subject: rtnetlink: provide permanent hardware address in RTM_NEWLINK Permanent hardware address of a network device was traditionally provided via ethtool ioctl interface but as Jiri Pirko pointed out in a review of ethtool netlink interface, rtnetlink is much more suitable for it so let's add it to the RTM_NEWLINK message. Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the permanent address is all zeros (i.e. device driver did not fill it). As permanent address is not modifiable, reject userspace requests containing IFLA_PERM_ADDRESS attribute. Note: we already provide permanent hardware address for bond slaves; unfortunately we cannot drop that attribute for backward compatibility reasons. v5 -> v6: only add the attribute if permanent address is not zero Signed-off-by: Michal Kubecek Acked-by: Jiri Pirko Acked-by: Stephen Hemminger Signed-off-by: David S. Miller --- include/uapi/linux/if_link.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8aec8769d944..1d69f637c5d6 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -169,6 +169,7 @@ enum { IFLA_MAX_MTU, IFLA_PROP_LIST, IFLA_ALT_IFNAME, /* Alternative ifname */ + IFLA_PERM_ADDRESS, __IFLA_MAX }; -- cgit v1.2.3 From 32d5109a9d864aea3981f0b5ea736eee4e11b42a Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Wed, 11 Dec 2019 10:58:19 +0100 Subject: netlink: rename nl80211_validate_nested() to nla_validate_nested() Function nl80211_validate_nested() is not specific to nl80211, it's a counterpart to nla_validate_nested_deprecated() with strict validation. For consistency with other validation and parse functions, rename it to nla_validate_nested(). Signed-off-by: Michal Kubecek Acked-by: Jiri Pirko Reviewed-by: Johannes Berg Signed-off-by: David S. Miller --- include/net/netlink.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/net/netlink.h b/include/net/netlink.h index b140c8f1be22..56c365dc6dc7 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -1735,7 +1735,7 @@ static inline void nla_nest_cancel(struct sk_buff *skb, struct nlattr *start) } /** - * nla_validate_nested - Validate a stream of nested attributes + * __nla_validate_nested - Validate a stream of nested attributes * @start: container attribute * @maxtype: maximum attribute type to be expected * @policy: validation policy @@ -1758,9 +1758,9 @@ static inline int __nla_validate_nested(const struct nlattr *start, int maxtype, } static inline int -nl80211_validate_nested(const struct nlattr *start, int maxtype, - const struct nla_policy *policy, - struct netlink_ext_ack *extack) +nla_validate_nested(const struct nlattr *start, int maxtype, + const struct nla_policy *policy, + struct netlink_ext_ack *extack) { return __nla_validate_nested(start, maxtype, policy, NL_VALIDATE_STRICT, extack); -- cgit v1.2.3 From 428c122f5f6b54f40bd51c47495104b534b5a57c Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Wed, 11 Dec 2019 10:58:34 +0100 Subject: ethtool: provide link mode names as a string set Unlike e.g. netdev features, the ethtool ioctl interface requires link mode table to be in sync between kernel and userspace for userspace to be able to display and set all link modes supported by kernel. The way arbitrary length bitsets are implemented in netlink interface, this will be no longer needed. To allow userspace to access all link modes running kernel supports, add table of ethernet link mode names and make it available as a string set to userspace GET_STRSET requests. Add build time check to make sure names are defined for all modes declared in enum ethtool_link_mode_bit_indices. Once the string set is available, make it also accessible via ioctl. Signed-off-by: Michal Kubecek Reviewed-by: Andrew Lunn Reviewed-by: Jiri Pirko Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index d4591792f0b4..f44155840b07 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -593,6 +593,7 @@ struct ethtool_pauseparam { * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS * @ETH_SS_PHY_TUNABLES: PHY tunable names + * @ETH_SS_LINK_MODES: link mode names */ enum ethtool_stringset { ETH_SS_TEST = 0, @@ -604,6 +605,7 @@ enum ethtool_stringset { ETH_SS_TUNABLES, ETH_SS_PHY_STATS, ETH_SS_PHY_TUNABLES, + ETH_SS_LINK_MODES, }; /** -- cgit v1.2.3 From 0290bd291cc0e0488e35e66bf39efcd7d9d9122b Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Tue, 10 Dec 2019 09:23:51 -0500 Subject: netdev: pass the stuck queue to the timeout handler This allows incrementing the correct timeout statistic without any mess. Down the road, devices can learn to reset just the specific queue. The patch was generated with the following script: use strict; use warnings; our $^I = '.bak'; my @work = ( ["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"], ["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"], ["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"], ["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"], ["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"], ["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"], ["drivers/net/appletalk/cops.c", "cops_timeout"], ["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"], ["drivers/net/arcnet/arcnet.c", "arcnet_timeout"], ["drivers/net/arcnet/com20020.c", "arcnet_timeout"], ["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"], ["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"], ["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"], ["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"], ["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"], ["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"], ["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"], ["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"], ["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"], ["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"], ["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"], ["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"], ["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"], ["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"], ["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"], ["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"], ["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"], ["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"], ["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"], ["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"], ["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"], ["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"], ["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"], ["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"], ["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"], ["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"], ["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"], ["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"], ["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"], ["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"], ["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"], ["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"], ["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"], ["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"], ["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"], ["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"], ["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"], ["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"], ["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"], ["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"], ["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"], ["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"], ["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"], ["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"], ["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"], ["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"], ["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"], ["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"], ["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"], ["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"], ["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"], ["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"], ["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"], ["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"], ["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"], ["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"], ["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"], ["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"], ["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"], ["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"], ["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"], ["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"], ["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"], ["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"], ["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"], ["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"], ["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"], ["drivers/net/ethernet/jme.c", "jme_tx_timeout"], ["drivers/net/ethernet/korina.c", "korina_tx_timeout"], ["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"], ["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"], ["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"], ["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"], ["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"], ["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"], ["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"], ["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"], ["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"], ["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"], ["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"], ["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"], ["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"], ["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"], ["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"], ["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"], ["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"], ["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"], ["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"], ["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"], ["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"], ["drivers/net/ethernet/realtek/atp.c", "tx_timeout"], ["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"], ["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"], ["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"], ["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"], ["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"], ["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"], ["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"], ["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"], ["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"], ["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"], ["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"], ["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"], ["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"], ["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"], ["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"], ["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"], ["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"], ["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"], ["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"], ["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"], ["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"], ["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"], ["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"], ["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"], ["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"], ["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"], ["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"], ["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"], ["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"], ["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"], ["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"], ["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"], ["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"], ["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"], ["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"], ["drivers/net/slip/slip.c", "sl_tx_timeout"], ["include/linux/usb/usbnet.h", "usbnet_tx_timeout"], ["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"], ["drivers/net/usb/catc.c", "catc_tx_timeout"], ["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"], ["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"], ["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"], ["drivers/net/usb/hso.c", "hso_net_tx_timeout"], ["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"], ["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"], ["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"], ["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"], ["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"], ["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"], ["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"], ["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"], ["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"], ["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"], ["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"], ["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"], ["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"], ["drivers/net/wan/cosa.c", "cosa_net_timeout"], ["drivers/net/wan/farsync.c", "fst_tx_timeout"], ["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"], ["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"], ["drivers/net/wan/x25_asy.c", "x25_asy_timeout"], ["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"], ["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"], ["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"], ["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"], ["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"], ["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"], ["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"], ["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"], ["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"], ["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"], ["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"], ["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"], ["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"], ["drivers/tty/synclink.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"], ["net/atm/lec.c", "lec_tx_timeout"], ["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"] ); for my $p (@work) { my @pair = @$p; my $file = $pair[0]; my $func = $pair[1]; print STDERR $file , ": ", $func,"\n"; our @ARGV = ($file); while () { if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) { print STDERR "found $1+$2 in $file\n"; } if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) { print STDERR "$func found in $file\n"; } print; } } where the list of files and functions is simply from: git grep ndo_tx_timeout, with manual addition of headers in the rare cases where the function is from a header, then manually changing the few places which actually call ndo_tx_timeout. Signed-off-by: Michael S. Tsirkin Acked-by: Heiner Kallweit Acked-by: Jakub Kicinski Acked-by: Shannon Nelson Reviewed-by: Martin Habets changes from v9: fixup a forward declaration changes from v9: more leftovers from v3 change changes from v8: fix up a missing direct call to timeout rebased on net-next changes from v7: fixup leftovers from v3 change changes from v6: fix typo in rtl driver changes from v5: add missing files (allow any net device argument name) changes from v4: add a missing driver header changes from v3: change queue # to unsigned Changes from v2: added headers Changes from v1: Fix errors found by kbuild: generalize the pattern a bit, to pick up a couple of instances missed by the previous version. Signed-off-by: David S. Miller --- include/linux/netdevice.h | 5 +++-- include/linux/usb/usbnet.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9ef20389622d..30745068fb39 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1014,7 +1014,7 @@ int netdev_name_node_alt_destroy(struct net_device *dev, const char *name); * Called when a user wants to change the Maximum Transfer Unit * of a device. * - * void (*ndo_tx_timeout)(struct net_device *dev); + * void (*ndo_tx_timeout)(struct net_device *dev, unsigned int txqueue); * Callback used when the transmitter has not made any progress * for dev->watchdog ticks. * @@ -1281,7 +1281,8 @@ struct net_device_ops { int new_mtu); int (*ndo_neigh_setup)(struct net_device *dev, struct neigh_parms *); - void (*ndo_tx_timeout) (struct net_device *dev); + void (*ndo_tx_timeout) (struct net_device *dev, + unsigned int txqueue); void (*ndo_get_stats64)(struct net_device *dev, struct rtnl_link_stats64 *storage); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index d8860f2d0976..b0bff3083278 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -253,7 +253,7 @@ extern int usbnet_open(struct net_device *net); extern int usbnet_stop(struct net_device *net); extern netdev_tx_t usbnet_start_xmit(struct sk_buff *skb, struct net_device *net); -extern void usbnet_tx_timeout(struct net_device *net); +extern void usbnet_tx_timeout(struct net_device *net, unsigned int txqueue); extern int usbnet_change_mtu(struct net_device *net, int new_mtu); extern int usbnet_get_endpoints(struct usbnet *, struct usb_interface *); -- cgit v1.2.3 From 98e8627efcada18ac043a77b9101b4b4c768090b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Fri, 13 Dec 2019 18:51:07 +0100 Subject: bpf: Move trampoline JIT image allocation to a function MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Refactor the image allocation in the BPF trampoline code into a separate function, so it can be shared with the BPF dispatcher in upcoming commits. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 35903f148be5..5d744828b399 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -475,6 +475,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key); int bpf_trampoline_link_prog(struct bpf_prog *prog); int bpf_trampoline_unlink_prog(struct bpf_prog *prog); void bpf_trampoline_put(struct bpf_trampoline *tr); +void *bpf_jit_alloc_exec_page(void); #else static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key) { -- cgit v1.2.3 From 75ccbef6369e94ecac696a152a998a978d41376b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Fri, 13 Dec 2019 18:51:08 +0100 Subject: bpf: Introduce BPF dispatcher MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The BPF dispatcher is a multi-way branch code generator, mainly targeted for XDP programs. When an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect call has a substantial performance impact, when retpolines are enabled. The dispatcher transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 64 dispatch points. A user creates a dispatcher with its corresponding trampoline with the DEFINE_BPF_DISPATCHER macro. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com --- include/linux/bpf.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5d744828b399..53ae4a50abe4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -470,12 +470,61 @@ struct bpf_trampoline { void *image; u64 selector; }; + +#define BPF_DISPATCHER_MAX 64 /* Fits in 2048B */ + +struct bpf_dispatcher_prog { + struct bpf_prog *prog; + refcount_t users; +}; + +struct bpf_dispatcher { + /* dispatcher mutex */ + struct mutex mutex; + void *func; + struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX]; + int num_progs; + void *image; + u32 image_off; +}; + #ifdef CONFIG_BPF_JIT struct bpf_trampoline *bpf_trampoline_lookup(u64 key); int bpf_trampoline_link_prog(struct bpf_prog *prog); int bpf_trampoline_unlink_prog(struct bpf_prog *prog); void bpf_trampoline_put(struct bpf_trampoline *tr); void *bpf_jit_alloc_exec_page(void); +#define BPF_DISPATCHER_INIT(name) { \ + .mutex = __MUTEX_INITIALIZER(name.mutex), \ + .func = &name##func, \ + .progs = {}, \ + .num_progs = 0, \ + .image = NULL, \ + .image_off = 0 \ +} + +#define DEFINE_BPF_DISPATCHER(name) \ + noinline unsigned int name##func( \ + const void *ctx, \ + const struct bpf_insn *insnsi, \ + unsigned int (*bpf_func)(const void *, \ + const struct bpf_insn *)) \ + { \ + return bpf_func(ctx, insnsi); \ + } \ + EXPORT_SYMBOL(name##func); \ + struct bpf_dispatcher name = BPF_DISPATCHER_INIT(name); +#define DECLARE_BPF_DISPATCHER(name) \ + unsigned int name##func( \ + const void *ctx, \ + const struct bpf_insn *insnsi, \ + unsigned int (*bpf_func)(const void *, \ + const struct bpf_insn *)); \ + extern struct bpf_dispatcher name; +#define BPF_DISPATCHER_FUNC(name) name##func +#define BPF_DISPATCHER_PTR(name) (&name) +void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, + struct bpf_prog *to); #else static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key) { @@ -490,6 +539,13 @@ static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog) return -ENOTSUPP; } static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {} +#define DEFINE_BPF_DISPATCHER(name) +#define DECLARE_BPF_DISPATCHER(name) +#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_nopfunc +#define BPF_DISPATCHER_PTR(name) NULL +static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, + struct bpf_prog *from, + struct bpf_prog *to) {} #endif struct bpf_func_info_aux { -- cgit v1.2.3 From 7e6897f95935973c3253fd756135b5ea58043dc8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Fri, 13 Dec 2019 18:51:09 +0100 Subject: bpf, xdp: Start using the BPF dispatcher for XDP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit adds a BPF dispatcher for XDP. The dispatcher is updated from the XDP control-path, dev_xdp_install(), and used when an XDP program is run via bpf_prog_run_xdp(). Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com --- include/linux/bpf.h | 15 +++++++++++++++ include/linux/filter.h | 40 ++++++++++++++++++++++++---------------- 2 files changed, 39 insertions(+), 16 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 53ae4a50abe4..5970989b99d1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -488,6 +488,14 @@ struct bpf_dispatcher { u32 image_off; }; +static __always_inline unsigned int bpf_dispatcher_nopfunc( + const void *ctx, + const struct bpf_insn *insnsi, + unsigned int (*bpf_func)(const void *, + const struct bpf_insn *)) +{ + return bpf_func(ctx, insnsi); +} #ifdef CONFIG_BPF_JIT struct bpf_trampoline *bpf_trampoline_lookup(u64 key); int bpf_trampoline_link_prog(struct bpf_prog *prog); @@ -997,6 +1005,8 @@ int btf_distill_func_proto(struct bpf_verifier_log *log, int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog); +struct bpf_prog *bpf_prog_by_id(u32 id); + #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -1128,6 +1138,11 @@ static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, static inline void bpf_map_put(struct bpf_map *map) { } + +static inline struct bpf_prog *bpf_prog_by_id(u32 id) +{ + return ERR_PTR(-ENOTSUPP); +} #endif /* CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd, diff --git a/include/linux/filter.h b/include/linux/filter.h index a141cb07e76a..37ac7025031d 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -559,23 +559,26 @@ struct sk_filter { DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); -#define BPF_PROG_RUN(prog, ctx) ({ \ - u32 ret; \ - cant_sleep(); \ - if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ - struct bpf_prog_stats *stats; \ - u64 start = sched_clock(); \ - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ - stats = this_cpu_ptr(prog->aux->stats); \ - u64_stats_update_begin(&stats->syncp); \ - stats->cnt++; \ - stats->nsecs += sched_clock() - start; \ - u64_stats_update_end(&stats->syncp); \ - } else { \ - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ - } \ +#define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \ + u32 ret; \ + cant_sleep(); \ + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ + struct bpf_prog_stats *stats; \ + u64 start = sched_clock(); \ + ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ + stats = this_cpu_ptr(prog->aux->stats); \ + u64_stats_update_begin(&stats->syncp); \ + stats->cnt++; \ + stats->nsecs += sched_clock() - start; \ + u64_stats_update_end(&stats->syncp); \ + } else { \ + ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ + } \ ret; }) +#define BPF_PROG_RUN(prog, ctx) __BPF_PROG_RUN(prog, ctx, \ + bpf_dispatcher_nopfunc) + #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN struct bpf_skb_data_end { @@ -699,6 +702,8 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, return res; } +DECLARE_BPF_DISPATCHER(bpf_dispatcher_xdp) + static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { @@ -708,9 +713,12 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, * already takes rcu_read_lock() when fetching the program, so * it's not necessary here anymore. */ - return BPF_PROG_RUN(prog, xdp); + return __BPF_PROG_RUN(prog, xdp, + BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp)); } +void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); + static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) { return prog->len * sizeof(struct bpf_insn); -- cgit v1.2.3 From 116eb788f57c9c35c40b29cfaa2607020de99a84 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Fri, 13 Dec 2019 18:51:12 +0100 Subject: bpf, x86: Align dispatcher branch targets to 16B MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit >From Intel 64 and IA-32 Architectures Optimization Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch targets should be 16-byte aligned. This commits aligns branch targets according to the Intel manual. The nops used to align branch targets make the dispatcher larger, and therefore the number of supported dispatch points/programs are descreased from 64 to 48. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com --- include/linux/bpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5970989b99d1..d467983e61bb 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -471,7 +471,7 @@ struct bpf_trampoline { u64 selector; }; -#define BPF_DISPATCHER_MAX 64 /* Fits in 2048B */ +#define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */ struct bpf_dispatcher_prog { struct bpf_prog *prog; -- cgit v1.2.3 From 826f66b30c2e3d4df533e4224cb8c734afa0a17c Mon Sep 17 00:00:00 2001 From: Andy Roulin Date: Wed, 11 Dec 2019 14:30:58 -0800 Subject: bonding: move 802.3ad port state flags to uapi The bond slave actor/partner operating state is exported as bitfield to userspace, which lacks a way to interpret it, e.g., iproute2 only prints the state as a number: ad_actor_oper_port_state 15 For userspace to interpret the bitfield, the bitfield definitions should be part of the uapi. The bitfield itself is defined in the 802.3ad standard. This commit moves the 802.3ad bitfield definitions to uapi. Related iproute2 patches, soon to be posted upstream, use the new uapi headers to pretty-print bond slave state, e.g., with ip -d link show ad_actor_oper_port_state_str Signed-off-by: Andy Roulin Acked-by: Roopa Prabhu Acked-by: Jay Vosburgh Signed-off-by: Jakub Kicinski --- include/uapi/linux/if_bonding.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h index 790585f0e61b..6829213a54c5 100644 --- a/include/uapi/linux/if_bonding.h +++ b/include/uapi/linux/if_bonding.h @@ -95,6 +95,16 @@ #define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */ #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */ +/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */ +#define AD_STATE_LACP_ACTIVITY 0x1 +#define AD_STATE_LACP_TIMEOUT 0x2 +#define AD_STATE_AGGREGATION 0x4 +#define AD_STATE_SYNCHRONIZATION 0x8 +#define AD_STATE_COLLECTING 0x10 +#define AD_STATE_DISTRIBUTING 0x20 +#define AD_STATE_DEFAULTED 0x40 +#define AD_STATE_EXPIRED 0x80 + typedef struct ifbond { __s32 bond_mode; __s32 num_slaves; -- cgit v1.2.3 From de1799667b002331609e8ee6b924ccfd51968d99 Mon Sep 17 00:00:00 2001 From: Vivien Didelot Date: Wed, 11 Dec 2019 20:07:10 -0500 Subject: net: bridge: add STP xstats This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk, transition_fwd xstats counters to the bridge ports copied over via netlink, providing useful information for STP. Signed-off-by: Vivien Didelot Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- include/uapi/linux/if_bridge.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index 1b3c2b643a02..4a58e3d7de46 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -156,6 +156,15 @@ struct bridge_vlan_xstats { __u32 pad2; }; +struct bridge_stp_xstats { + __u64 transition_blk; + __u64 transition_fwd; + __u64 rx_bpdu; + __u64 tx_bpdu; + __u64 rx_tcn; + __u64 tx_tcn; +}; + /* Bridge multicast database attributes * [MDBA_MDB] = { * [MDBA_MDB_ENTRY] = { @@ -262,6 +271,7 @@ enum { BRIDGE_XSTATS_VLAN, BRIDGE_XSTATS_MCAST, BRIDGE_XSTATS_PAD, + BRIDGE_XSTATS_STP, __BRIDGE_XSTATS_MAX }; #define BRIDGE_XSTATS_MAX (__BRIDGE_XSTATS_MAX - 1) -- cgit v1.2.3 From 166750bc1dd256b2184b22588fb9fe6d3fbb93ae Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Fri, 13 Dec 2019 17:47:08 -0800 Subject: libbpf: Support libbpf-provided extern variables Add support for extern variables, provided to BPF program by libbpf. Currently the following extern variables are supported: - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte long; - CONFIG_xxx values; a set of values of actual kernel config. Tristate, boolean, strings, and integer values are supported. Set of possible values is determined by declared type of extern variable. Supported types of variables are: - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO, or TRI_MODULE, respectively. - Boolean values. Are represented as bool (_Bool) types. Accepted values are 'y' and 'n' only, turning into true/false values, respectively. - Single-character values. Can be used both as a substritute for bool/tristate, or as a small-range integer: - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm'; - integers in a range [-128, 127] or [0, 255] (depending on signedness of char in target architecture) are recognized and represented with respective values of char type. - Strings. String values are declared as fixed-length char arrays. String of up to that length will be accepted and put in first N bytes of char array, with the rest of bytes zeroed out. If config string value is longer than space alloted, it will be truncated and warning message emitted. Char array is always zero terminated. String literals in config have to be enclosed in double quotes, just like C-style string literals. - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and unsigned variants. Libbpf enforces parsed config value to be in the supported range of corresponding integer type. Integers values in config can be: - decimal integers, with optional + and - signs; - hexadecimal integers, prefixed with 0x or 0X; - octal integers, starting with 0. Config file itself is searched in /boot/config-$(uname -r) location with fallback to /proc/config.gz, unless config path is specified explicitly through bpf_object_open_opts' kernel_config_path option. Both gzipped and plain text formats are supported. Libbpf adds explicit dependency on zlib because of this, but this shouldn't be a problem, given libelf already depends on zlib. All detected extern variables, are put into a separate .extern internal map. It, similarly to .rodata map, is marked as read-only from BPF program side, as well as is frozen on load. This allows BPF verifier to track extern values as constants and perform enhanced branch prediction and dead code elimination. This can be relied upon for doing kernel version/feature detection and using potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF program, while still having a single version of BPF program running on old and new kernels. Selftests are validating this explicitly for unexisting BPF helper. Signed-off-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com --- include/uapi/linux/btf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h index c02dec97e1ce..1a2898c482ee 100644 --- a/include/uapi/linux/btf.h +++ b/include/uapi/linux/btf.h @@ -142,7 +142,8 @@ struct btf_param { enum { BTF_VAR_STATIC = 0, - BTF_VAR_GLOBAL_ALLOCATED, + BTF_VAR_GLOBAL_ALLOCATED = 1, + BTF_VAR_GLOBAL_EXTERN = 2, }; /* BTF_KIND_VAR is followed by a single "struct btf_var" to describe -- cgit v1.2.3 From 9429439f59cd3b82a3e2732ead5363578de97a84 Mon Sep 17 00:00:00 2001 From: Yangbo Lu Date: Thu, 12 Dec 2019 18:08:05 +0800 Subject: ptp_qoriq: export extts_clean_up() function Export extts_clean_up() function so that dpaa2-ptp driver is able to reuse it. Signed-off-by: Yangbo Lu Signed-off-by: David S. Miller --- include/linux/fsl/ptp_qoriq.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h index 992bf9fa1729..b0b743563f43 100644 --- a/include/linux/fsl/ptp_qoriq.h +++ b/include/linux/fsl/ptp_qoriq.h @@ -192,6 +192,7 @@ int ptp_qoriq_settime(struct ptp_clock_info *ptp, const struct timespec64 *ts); int ptp_qoriq_enable(struct ptp_clock_info *ptp, struct ptp_clock_request *rq, int on); +int extts_clean_up(struct ptp_qoriq *ptp_qoriq, int index, bool update_event); #ifdef CONFIG_DEBUG_FS void ptp_qoriq_create_debugfs(struct ptp_qoriq *ptp_qoriq); void ptp_qoriq_remove_debugfs(struct ptp_qoriq *ptp_qoriq); -- cgit v1.2.3 From 1f1c1d7c89ee538f3e36b43098e95973f8fa37db Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Fri, 13 Dec 2019 21:24:27 +0100 Subject: ipv6: Annotate bitwise IPv6 dsfield pointer cast The sparse commit 6002ded74587 ("add a flag to warn on casts to/from bitwise pointers") introduced a check for non-direct casts from/to restricted datatypes (when -Wbitwise-pointer is enabled). This triggered a warning in ipv6_get_dsfield() because sparse doesn't know that the buffer already points to some data in the correct bitwise integer format. This was already fixed in ipv6_change_dsfield() by the __force attribute and can be fixed here the same way. Signed-off-by: Sven Eckelmann Signed-off-by: David S. Miller --- include/net/dsfield.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/dsfield.h b/include/net/dsfield.h index 1a245ee10c95..a59a57ffc546 100644 --- a/include/net/dsfield.h +++ b/include/net/dsfield.h @@ -21,7 +21,7 @@ static inline __u8 ipv4_get_dsfield(const struct iphdr *iph) static inline __u8 ipv6_get_dsfield(const struct ipv6hdr *ipv6h) { - return ntohs(*(const __be16 *)ipv6h) >> 4; + return ntohs(*(__force const __be16 *)ipv6h) >> 4; } -- cgit v1.2.3 From 54e1f08bddbe63a3c0ae44f65df2c8b895003ef4 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Fri, 13 Dec 2019 21:24:28 +0100 Subject: ipv6: Annotate ipv6_addr_is_* bitwise pointer casts The sparse commit 6002ded74587 ("add a flag to warn on casts to/from bitwise pointers") introduced a check for non-direct casts from/to restricted datatypes (when -Wbitwise-pointer is enabled). This triggered a warning in the 64 bit optimized ipv6_addr_is_*() functions because sparse doesn't know that the buffer already points to some data in the correct bitwise integer format. But these were correct and can therefore be marked with __force to signalize sparse an intended cast to a specific bitwise type. Signed-off-by: Sven Eckelmann Signed-off-by: David S. Miller --- include/net/addrconf.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 1bab88184d3c..a088349dd94f 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -437,7 +437,7 @@ static inline void addrconf_addr_solict_mult(const struct in6_addr *addr, static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 - __be64 *p = (__be64 *)addr; + __be64 *p = (__force __be64 *)addr; return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | (p[1] ^ cpu_to_be64(1))) == 0UL; #else return ((addr->s6_addr32[0] ^ htonl(0xff020000)) | @@ -449,7 +449,7 @@ static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr) static inline bool ipv6_addr_is_ll_all_routers(const struct in6_addr *addr) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 - __be64 *p = (__be64 *)addr; + __be64 *p = (__force __be64 *)addr; return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | (p[1] ^ cpu_to_be64(2))) == 0UL; #else return ((addr->s6_addr32[0] ^ htonl(0xff020000)) | @@ -466,7 +466,7 @@ static inline bool ipv6_addr_is_isatap(const struct in6_addr *addr) static inline bool ipv6_addr_is_solict_mult(const struct in6_addr *addr) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 - __be64 *p = (__be64 *)addr; + __be64 *p = (__force __be64 *)addr; return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | ((p[1] ^ cpu_to_be64(0x00000001ff000000UL)) & cpu_to_be64(0xffffffffff000000UL))) == 0UL; @@ -481,7 +481,7 @@ static inline bool ipv6_addr_is_solict_mult(const struct in6_addr *addr) static inline bool ipv6_addr_is_all_snoopers(const struct in6_addr *addr) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 - __be64 *p = (__be64 *)addr; + __be64 *p = (__force __be64 *)addr; return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | (p[1] ^ cpu_to_be64(0x6a))) == 0UL; -- cgit v1.2.3 From a2ec8b5706944d228181c8b91d815f41d6dd8e7b Mon Sep 17 00:00:00 2001 From: Josh Soref Date: Sun, 15 Dec 2019 22:08:02 +0100 Subject: wireguard: global: fix spelling mistakes in comments This fixes two spelling errors in source code comments. Signed-off-by: Josh Soref [Jason: rewrote commit message] Signed-off-by: Jason A. Donenfeld Signed-off-by: David S. Miller --- include/uapi/linux/wireguard.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h index dd8a47c4ad11..ae88be14c947 100644 --- a/include/uapi/linux/wireguard.h +++ b/include/uapi/linux/wireguard.h @@ -18,13 +18,13 @@ * one but not both of: * * WGDEVICE_A_IFINDEX: NLA_U32 - * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1 * * The kernel will then return several messages (NLM_F_MULTI) containing the * following tree of nested items: * * WGDEVICE_A_IFINDEX: NLA_U32 - * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1 * WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGDEVICE_A_LISTEN_PORT: NLA_U16 @@ -77,7 +77,7 @@ * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME: * * WGDEVICE_A_IFINDEX: NLA_U32 - * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1 * WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current * peers should be removed prior to adding the list below. * WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove @@ -121,7 +121,7 @@ * filling in information not contained in the prior. Note that if * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably * should not be specified in fragments that come after, so that the list - * of peers is only cleared the first time but appened after. Likewise for + * of peers is only cleared the first time but appended after. Likewise for * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message * of a peer, it likely should not be specified in subsequent fragments. * -- cgit v1.2.3 From 2f5e70c8ce47396bfa8f5c437574b569c02597bb Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Wed, 20 Nov 2019 12:33:59 +0100 Subject: netfilter: Document ingress hook Amend kerneldoc of struct net_device to fix a "make htmldocs" warning: include/linux/netdevice.h:2045: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device' Reported-by: kbuild test robot Signed-off-by: Lukas Wunner Cc: Daniel Borkmann Signed-off-by: Pablo Neira Ayuso --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 30745068fb39..0b097bbd3663 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1708,6 +1708,7 @@ enum netdev_priv_flags { * @miniq_ingress: ingress/clsact qdisc specific data for * ingress processing * @ingress_queue: XXX: need comments on this one + * @nf_hooks_ingress: netfilter hooks executed for ingress packets * @broadcast: hw bcast address * * @rx_cpu_rmap: CPU reverse-mapping for RX completion interrupts, -- cgit v1.2.3 From d4aef159394d5940bd7158ab789969dab82f7c76 Mon Sep 17 00:00:00 2001 From: Soeren Moch Date: Thu, 12 Dec 2019 00:52:49 +0100 Subject: brcmfmac: add support for BCM4359 SDIO chipset BCM4359 is a 2x2 802.11 abgn+ac Dual-Band HT80 combo chip and it supports Real Simultaneous Dual Band feature. Based on a similar patch by: Wright Feng Signed-off-by: Soeren Moch Acked-by: Chi-Hsien Lin Acked-by: Ulf Hansson Signed-off-by: Kalle Valo --- include/linux/mmc/sdio_ids.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h index 08b25c02b5a1..2e9a6e4634eb 100644 --- a/include/linux/mmc/sdio_ids.h +++ b/include/linux/mmc/sdio_ids.h @@ -41,8 +41,10 @@ #define SDIO_DEVICE_ID_BROADCOM_43455 0xa9bf #define SDIO_DEVICE_ID_BROADCOM_4354 0x4354 #define SDIO_DEVICE_ID_BROADCOM_4356 0x4356 +#define SDIO_DEVICE_ID_BROADCOM_4359 0x4359 #define SDIO_DEVICE_ID_CYPRESS_4373 0x4373 #define SDIO_DEVICE_ID_CYPRESS_43012 43012 +#define SDIO_DEVICE_ID_CYPRESS_89359 0x4355 #define SDIO_VENDOR_ID_INTEL 0x0089 #define SDIO_DEVICE_ID_INTEL_IWMC3200WIMAX 0x1402 -- cgit v1.2.3 From 504723af0d85434be5fb6f2dde0b62644a7f1ead Mon Sep 17 00:00:00 2001 From: Jose Abreu Date: Wed, 18 Dec 2019 11:33:05 +0100 Subject: net: stmmac: Add basic EST support for GMAC5+ Adds the support for EST in GMAC5+ cores. This feature allows to offload scheduling of queues opening time to the IP. Signed-off-by: Jose Abreu Signed-off-by: David S. Miller --- include/linux/stmmac.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index d4bcd9387136..0531afa9b21e 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -109,6 +109,18 @@ struct stmmac_axi { bool axi_rb; }; +#define EST_GCL 1024 +struct stmmac_est { + int enable; + u32 btr_offset[2]; + u32 btr[2]; + u32 ctr[2]; + u32 ter; + u32 gcl_unaligned[EST_GCL]; + u32 gcl[EST_GCL]; + u32 gcl_size; +}; + struct stmmac_rxq_cfg { u8 mode_to_use; u32 chan; @@ -139,6 +151,7 @@ struct plat_stmmacenet_data { struct device_node *phylink_node; struct device_node *mdio_node; struct stmmac_dma_cfg *dma_cfg; + struct stmmac_est *est; int clk_csr; int has_gmac; int enh_desc; -- cgit v1.2.3 From 9586a992fb752b14ac18fc86fe086b0e3372fff4 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 18 Dec 2019 14:55:08 +0000 Subject: net: pkt_cls: Clarify a comment The bit about negating HW backlog left me scratching my head. Clarify the comment. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- include/net/pkt_cls.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index e553fc80eb23..a7c5d492bc04 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -791,9 +791,8 @@ enum tc_prio_command { struct tc_prio_qopt_offload_params { int bands; u8 priomap[TC_PRIO_MAX + 1]; - /* In case that a prio qdisc is offloaded and now is changed to a - * non-offloadedable config, it needs to update the backlog & qlen - * values to negate the HW backlog & qlen values (and only them). + /* At the point of un-offloading the Qdisc, the reported backlog and + * qlen need to be reduced by the portion that is in HW. */ struct gnet_stats_queue *qstats; }; -- cgit v1.2.3 From dcc68b4d8084e1ac9af0d4022d6b1aff6a139a33 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 18 Dec 2019 14:55:13 +0000 Subject: net: sch_ets: Add a new Qdisc Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is PRIO-like in how it is configured, meaning one needs to specify how many bands there are, how many are strict and how many are dwrr, quanta for the latter, and priomap. The new Qdisc operates like the PRIO / DRR combo would when configured as per the standard. The strict classes, if any, are tried for traffic first. When there's no traffic in any of the strict queues, the ETS ones (if any) are treated in the same way as in DRR. Signed-off-by: Petr Machata Acked-by: Jiri Pirko Signed-off-by: David S. Miller --- include/uapi/linux/pkt_sched.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 9f1a72876212..bf5a5b1dfb0b 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -1187,4 +1187,21 @@ enum { #define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1) +/* ETS */ + +#define TCQ_ETS_MAX_BANDS 16 + +enum { + TCA_ETS_UNSPEC, + TCA_ETS_NBANDS, /* u8 */ + TCA_ETS_NSTRICT, /* u8 */ + TCA_ETS_QUANTA, /* nested TCA_ETS_QUANTA_BAND */ + TCA_ETS_QUANTA_BAND, /* u32 */ + TCA_ETS_PRIOMAP, /* nested TCA_ETS_PRIOMAP_BAND */ + TCA_ETS_PRIOMAP_BAND, /* u8 */ + __TCA_ETS_MAX, +}; + +#define TCA_ETS_MAX (__TCA_ETS_MAX - 1) + #endif -- cgit v1.2.3 From d35eb52bd2ac7557b62bda52668f2e64dde2cf90 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 18 Dec 2019 14:55:15 +0000 Subject: net: sch_ets: Make the ETS qdisc offloadable Add hooks at appropriate points to make it possible to offload the ETS Qdisc. Signed-off-by: Petr Machata Acked-by: Jiri Pirko Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller --- include/linux/netdevice.h | 1 + include/net/pkt_cls.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 30745068fb39..7a8ed11f5d45 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -849,6 +849,7 @@ enum tc_setup_type { TC_SETUP_QDISC_GRED, TC_SETUP_QDISC_TAPRIO, TC_SETUP_FT, + TC_SETUP_QDISC_ETS, }; /* These structures hold the attributes of bpf state that are being passed diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index a7c5d492bc04..47b115e2012a 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -823,4 +823,35 @@ struct tc_root_qopt_offload { bool ingress; }; +enum tc_ets_command { + TC_ETS_REPLACE, + TC_ETS_DESTROY, + TC_ETS_STATS, + TC_ETS_GRAFT, +}; + +struct tc_ets_qopt_offload_replace_params { + unsigned int bands; + u8 priomap[TC_PRIO_MAX + 1]; + unsigned int quanta[TCQ_ETS_MAX_BANDS]; /* 0 for strict bands. */ + unsigned int weights[TCQ_ETS_MAX_BANDS]; + struct gnet_stats_queue *qstats; +}; + +struct tc_ets_qopt_offload_graft_params { + u8 band; + u32 child_handle; +}; + +struct tc_ets_qopt_offload { + enum tc_ets_command command; + u32 handle; + u32 parent; + union { + struct tc_ets_qopt_offload_replace_params replace_params; + struct tc_qopt_offload_stats stats; + struct tc_ets_qopt_offload_graft_params graft_params; + }; +}; + #endif -- cgit v1.2.3 From 2a10ab043ac5a658225ee77852db7942de9ac4c5 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 17 Dec 2019 13:39:11 +0000 Subject: net: phy: add genphy_check_and_restart_aneg() Add a helper for restarting autonegotiation(), similar to the clause 45 variant. Use it in __genphy_config_aneg() Signed-off-by: Russell King Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 5032d453ac66..1c4f97d2631d 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1094,6 +1094,7 @@ void phy_attached_info(struct phy_device *phydev); int genphy_read_abilities(struct phy_device *phydev); int genphy_setup_forced(struct phy_device *phydev); int genphy_restart_aneg(struct phy_device *phydev); +int genphy_check_and_restart_aneg(struct phy_device *phydev, bool restart); int genphy_config_eee_advert(struct phy_device *phydev); int __genphy_config_aneg(struct phy_device *phydev, bool changed); int genphy_aneg_done(struct phy_device *phydev); -- cgit v1.2.3 From 0efc286a923874f0c243e5766cce54e9429ed949 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 17 Dec 2019 13:39:16 +0000 Subject: net: phy: provide and use genphy_read_status_fixed() There are two drivers and generic code which contain exactly the same code to read the status of a PHY operating without autonegotiation enabled. Rather than duplicate this code, provide a helper to read this information. Signed-off-by: Russell King Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 1c4f97d2631d..b2105e0d72d3 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1100,6 +1100,7 @@ int __genphy_config_aneg(struct phy_device *phydev, bool changed); int genphy_aneg_done(struct phy_device *phydev); int genphy_update_link(struct phy_device *phydev); int genphy_read_lpa(struct phy_device *phydev); +int genphy_read_status_fixed(struct phy_device *phydev); int genphy_read_status(struct phy_device *phydev); int genphy_suspend(struct phy_device *phydev); int genphy_resume(struct phy_device *phydev); -- cgit v1.2.3 From 8d5a49e9e31ba1ddd34a54b2351d068a90c78707 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 17 Dec 2019 14:12:01 -0800 Subject: net/tls: add helper for testing if socket is RX offloaded There is currently no way for driver to reliably check that the socket it has looked up is in fact RX offloaded. Add a helper. This allows drivers to catch misbehaving firmware. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/net/tls.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/net/tls.h b/include/net/tls.h index df630f5fc723..bf9eb4823933 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -641,6 +641,7 @@ int tls_sw_fallback_init(struct sock *sk, #ifdef CONFIG_TLS_DEVICE void tls_device_init(void); void tls_device_cleanup(void); +void tls_device_sk_destruct(struct sock *sk); int tls_set_device_offload(struct sock *sk, struct tls_context *ctx); void tls_device_free_resources_tx(struct sock *sk); int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx); @@ -649,6 +650,14 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq); void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq); int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx, struct sk_buff *skb, struct strp_msg *rxm); + +static inline bool tls_is_sk_rx_device_offloaded(struct sock *sk) +{ + if (!sk_fullsock(sk) || + smp_load_acquire(&sk->sk_destruct) != tls_device_sk_destruct) + return false; + return tls_get_ctx(sk)->rx_conf == TLS_HW; +} #else static inline void tls_device_init(void) {} static inline void tls_device_cleanup(void) {} -- cgit v1.2.3 From e312b9e706ed6d94f6cc9088fcd9fbd81de4525c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Thu, 19 Dec 2019 07:10:02 +0100 Subject: xsk: Make xskmap flush_list common for all map instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The xskmap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all xskmaps, which simplifies __xsk_map_flush() and xsk_map_alloc(). Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20191219061006.21980-5-bjorn.topel@gmail.com --- include/net/xdp_sock.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e3780e4b74e1..48594740d67c 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -72,7 +72,6 @@ struct xdp_umem { struct xsk_map { struct bpf_map map; - struct list_head __percpu *flush_list; spinlock_t lock; /* Synchronize map updates */ struct xdp_sock *xsk_map[]; }; @@ -139,9 +138,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, struct xdp_sock **map_entry); int xsk_map_inc(struct xsk_map *map); void xsk_map_put(struct xsk_map *map); -int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp, - struct xdp_sock *xs); -void __xsk_map_flush(struct bpf_map *map); +int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp); +void __xsk_map_flush(void); static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key) @@ -369,13 +367,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle, return 0; } -static inline int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp, - struct xdp_sock *xs) +static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) { return -EOPNOTSUPP; } -static inline void __xsk_map_flush(struct bpf_map *map) +static inline void __xsk_map_flush(void) { } -- cgit v1.2.3 From 96360004b8628541f5d05a845ea213267db0b1a2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Thu, 19 Dec 2019 07:10:03 +0100 Subject: xdp: Make devmap flush_list common for all map instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The devmap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all devmaps, which simplifies __dev_map_flush() and dev_map_init_map(). Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com --- include/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d467983e61bb..31191804ca09 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -959,7 +959,7 @@ struct sk_buff; struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); -void __dev_map_flush(struct bpf_map *map); +void __dev_map_flush(void); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -1068,7 +1068,7 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map return NULL; } -static inline void __dev_map_flush(struct bpf_map *map) +static inline void __dev_map_flush(void) { } -- cgit v1.2.3 From cdfafe98cabefeedbbc65af5c191c59745c03298 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Thu, 19 Dec 2019 07:10:04 +0100 Subject: xdp: Make cpumap flush_list common for all map instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cpumap flush list is used to track entries that need to flushed from via the xdp_do_flush_map() function. This list used to be per-map, but there is really no reason for that. Instead make the flush list global for all devmaps, which simplifies __cpu_map_flush() and cpu_map_alloc(). Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20191219061006.21980-7-bjorn.topel@gmail.com --- include/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 31191804ca09..8f3e00c84f39 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -966,7 +966,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog); struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key); -void __cpu_map_flush(struct bpf_map *map); +void __cpu_map_flush(void); int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx); @@ -1097,7 +1097,7 @@ struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) return NULL; } -static inline void __cpu_map_flush(struct bpf_map *map) +static inline void __cpu_map_flush(void) { } -- cgit v1.2.3 From 332f22a60e4c3492d4953cd6f7aaa4e8bd0bba97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Thu, 19 Dec 2019 07:10:05 +0100 Subject: xdp: Remove map_to_flush and map swap detection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that all XDP maps that can be used with bpf_redirect_map() tracks entries to be flushed in a global fashion, there is not need to track that the map has changed and flush from xdp_do_generic_map() anymore. All entries will be flushed in xdp_do_flush_map(). This means that the map_to_flush can be removed, and the corresponding checks. Moving the flush logic to one place, xdp_do_flush_map(), give a bulking behavior and performance boost. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20191219061006.21980-8-bjorn.topel@gmail.com --- include/linux/filter.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index 37ac7025031d..69d6706fc889 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -592,7 +592,6 @@ struct bpf_redirect_info { u32 tgt_index; void *tgt_value; struct bpf_map *map; - struct bpf_map *map_to_flush; u32 kern_flags; }; -- cgit v1.2.3 From 7dd68b3279f1792103d12e69933db3128c6d416e Mon Sep 17 00:00:00 2001 From: Andrey Ignatov Date: Wed, 18 Dec 2019 23:44:35 -0800 Subject: bpf: Support replacing cgroup-bpf program in MULTI mode The common use-case in production is to have multiple cgroup-bpf programs per attach type that cover multiple use-cases. Such programs are attached with BPF_F_ALLOW_MULTI and can be maintained by different people. Order of programs usually matters, for example imagine two egress programs: the first one drops packets and the second one counts packets. If they're swapped the result of counting program will be different. It brings operational challenges with updating cgroup-bpf program(s) attached with BPF_F_ALLOW_MULTI since there is no way to replace a program: * One way to update is to detach all programs first and then attach the new version(s) again in the right order. This introduces an interruption in the work a program is doing and may not be acceptable (e.g. if it's egress firewall); * Another way is attach the new version of a program first and only then detach the old version. This introduces the time interval when two versions of same program are working, what may not be acceptable if a program is not idempotent. It also imposes additional burden on program developers to make sure that two versions of their program can co-exist. Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to replace That way user can replace any program among those attached with BPF_F_ALLOW_MULTI flag without the problems described above. Details of the new API: * If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid descriptor of BPF program, BPF_PROG_ATTACH will return corresponding error (EINVAL or EBADF). * If replace_bpf_fd has valid descriptor of BPF program but such a program is not attached to specified cgroup, BPF_PROG_ATTACH will return ENOENT. BPF_F_REPLACE is introduced to make the user intent clear, since replace_bpf_fd alone can't be used for this (its default value, 0, is a valid fd). BPF_F_REPLACE also makes it possible to extend the API in the future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed). Signed-off-by: Andrey Ignatov Signed-off-by: Alexei Starovoitov Acked-by: Martin KaFai Lau Acked-by: Andrii Narkyiko Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com --- include/linux/bpf-cgroup.h | 4 +++- include/uapi/linux/bpf.h | 10 ++++++++++ 2 files changed, 13 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 169fd25f6bc2..18f6a6da7c3c 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -85,6 +85,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp); void cgroup_bpf_offline(struct cgroup *cgrp); int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, + struct bpf_prog *replace_prog, enum bpf_attach_type type, u32 flags); int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type); @@ -93,7 +94,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */ int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, - enum bpf_attach_type type, u32 flags); + struct bpf_prog *replace_prog, enum bpf_attach_type type, + u32 flags); int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type, u32 flags); int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index dbbcf0b02970..7df436da542d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -231,6 +231,11 @@ enum bpf_attach_type { * When children program makes decision (like picking TCP CA or sock bind) * parent program has a chance to override it. * + * With BPF_F_ALLOW_MULTI a new program is added to the end of the list of + * programs for a cgroup. Though it's possible to replace an old program at + * any position by also specifying BPF_F_REPLACE flag and position itself in + * replace_bpf_fd attribute. Old program at this position will be released. + * * A cgroup with MULTI or OVERRIDE flag allows any attach flags in sub-cgroups. * A cgroup with NONE doesn't allow any programs in sub-cgroups. * Ex1: @@ -249,6 +254,7 @@ enum bpf_attach_type { */ #define BPF_F_ALLOW_OVERRIDE (1U << 0) #define BPF_F_ALLOW_MULTI (1U << 1) +#define BPF_F_REPLACE (1U << 2) /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the * verifier will perform strict alignment checking as if the kernel @@ -442,6 +448,10 @@ union bpf_attr { __u32 attach_bpf_fd; /* eBPF program to attach */ __u32 attach_type; __u32 attach_flags; + __u32 replace_bpf_fd; /* previously attached eBPF + * program to replace if + * BPF_F_REPLACE is used + */ }; struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ -- cgit v1.2.3 From 03896ef1f0cb23d2742ddf486c531c700a2da7d6 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Thu, 19 Dec 2019 13:39:27 +0100 Subject: xsk: Change names of validation functions Change the names of the validation functions to better reflect what they are doing. The uppermost ones are reading entries from the rings and only the bottom ones validate entries. So xskq_cons_read_ is a better prefix name. Also change the xskq_cons_read_ functions to return a bool as the the descriptor or address is already returned by reference in the parameters. Everyone is using the return value as a bool anyway. Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/1576759171-28550-9-git-send-email-magnus.karlsson@intel.com --- include/net/xdp_sock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 48594740d67c..63f005830866 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -118,7 +118,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); /* Used from netdev driver */ bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt); -u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); +bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); void xsk_umem_discard_addr(struct xdp_umem *umem); void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); @@ -197,7 +197,7 @@ static inline bool xsk_umem_has_addrs_rq(struct xdp_umem *umem, u32 cnt) return xsk_umem_has_addrs(umem, cnt - rq->length); } -static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr) +static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr) { struct xdp_umem_fq_reuse *rq = umem->fq_reuse; -- cgit v1.2.3 From f8509aa078de0842ec1817e8026e58620cd05d3b Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Thu, 19 Dec 2019 13:39:28 +0100 Subject: xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to xsk_umem_release_addr Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to better reflect the new naming of the AF_XDP queue manipulation functions. As this functions is used by drivers implementing support for AF_XDP zero-copy, it requires a name change to these drivers. The function xsk_umem_release_addr_rq has also changed name in the same fashion. Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com --- include/net/xdp_sock.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 63f005830866..e86ec48ef627 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -119,7 +119,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); /* Used from netdev driver */ bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt); bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); -void xsk_umem_discard_addr(struct xdp_umem *umem); +void xsk_umem_release_addr(struct xdp_umem *umem); void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); void xsk_umem_consume_tx_done(struct xdp_umem *umem); @@ -208,12 +208,12 @@ static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr) return addr; } -static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem) +static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem) { struct xdp_umem_fq_reuse *rq = umem->fq_reuse; if (!rq->length) - xsk_umem_discard_addr(umem); + xsk_umem_release_addr(umem); else rq->length--; } @@ -258,7 +258,7 @@ static inline u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr) return NULL; } -static inline void xsk_umem_discard_addr(struct xdp_umem *umem) +static inline void xsk_umem_release_addr(struct xdp_umem *umem) { } @@ -332,7 +332,7 @@ static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr) return NULL; } -static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem) +static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem) { } -- cgit v1.2.3 From 48fda74f0a9377ce2145ac5f06b6db0274880826 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Wed, 18 Dec 2019 09:02:14 +0100 Subject: net: dsa: add support for Atheros AR9331 TAG format Add support for tag format used in Atheros AR9331 built-in switch. Reviewed-by: Vivien Didelot Reviewed-by: Andrew Lunn Signed-off-by: Oleksij Rempel Signed-off-by: David S. Miller --- include/net/dsa.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/dsa.h b/include/net/dsa.h index 6767dc3f66c0..da5578db228e 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -43,6 +43,7 @@ struct phylink_link_state; #define DSA_TAG_PROTO_SJA1105_VALUE 13 #define DSA_TAG_PROTO_KSZ8795_VALUE 14 #define DSA_TAG_PROTO_OCELOT_VALUE 15 +#define DSA_TAG_PROTO_AR9331_VALUE 16 enum dsa_tag_protocol { DSA_TAG_PROTO_NONE = DSA_TAG_PROTO_NONE_VALUE, @@ -61,6 +62,7 @@ enum dsa_tag_protocol { DSA_TAG_PROTO_SJA1105 = DSA_TAG_PROTO_SJA1105_VALUE, DSA_TAG_PROTO_KSZ8795 = DSA_TAG_PROTO_KSZ8795_VALUE, DSA_TAG_PROTO_OCELOT = DSA_TAG_PROTO_OCELOT_VALUE, + DSA_TAG_PROTO_AR9331 = DSA_TAG_PROTO_AR9331_VALUE, }; struct packet_type; -- cgit v1.2.3 From e1b5e598e5a51b453328879682b178b4acc15105 Mon Sep 17 00:00:00 2001 From: John Rutherford Date: Thu, 19 Dec 2019 16:03:57 +1100 Subject: tipc: make legacy address flag readable over netlink To enable iproute2/tipc to generate backwards compatible printouts and validate command parameters for nodes using a node address, it needs to be able to read the legacy address flag from the kernel. The legacy address flag records the way in which the node identity was originally specified. The legacy address flag is requested by the netlink message TIPC_NL_ADDR_LEGACY_GET. If the flag is set the attribute TIPC_NLA_NET_ADDR_LEGACY is set in the return message. Signed-off-by: John Rutherford Acked-by: Jon Maloy Signed-off-by: David S. Miller --- include/uapi/linux/tipc_netlink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h index 6c2194ab745b..dc0d23a50e69 100644 --- a/include/uapi/linux/tipc_netlink.h +++ b/include/uapi/linux/tipc_netlink.h @@ -65,6 +65,7 @@ enum { TIPC_NL_UDP_GET_REMOTEIP, TIPC_NL_KEY_SET, TIPC_NL_KEY_FLUSH, + TIPC_NL_ADDR_LEGACY_GET, __TIPC_NL_CMD_MAX, TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1 @@ -176,6 +177,7 @@ enum { TIPC_NLA_NET_ADDR, /* u32 */ TIPC_NLA_NET_NODEID, /* u64 */ TIPC_NLA_NET_NODEID_W1, /* u64 */ + TIPC_NLA_NET_ADDR_LEGACY, /* flag */ __TIPC_NLA_NET_MAX, TIPC_NLA_NET_MAX = __TIPC_NLA_NET_MAX - 1 -- cgit v1.2.3 From f66b53fdbb22ced1a323b22b9de84a61aacd8d18 Mon Sep 17 00:00:00 2001 From: Martin Varghese Date: Sat, 21 Dec 2019 08:50:46 +0530 Subject: openvswitch: New MPLS actions for layer 2 tunnelling The existing PUSH MPLS action inserts MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should encapsulate the ethernet packet. The new mpls action ADD_MPLS inserts MPLS header at the start of the packet or at the start of the l3 header depending on the value of l3 tunnel flag in the ADD_MPLS arguments. POP_MPLS action is extended to support ethertype 0x6558. Signed-off-by: Martin Varghese Acked-by: Pravin B Shelar Signed-off-by: David S. Miller --- include/uapi/linux/openvswitch.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index a87b44cd5590..ae2bff14e7e1 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -672,6 +672,32 @@ struct ovs_action_push_mpls { __be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */ }; +/** + * struct ovs_action_add_mpls - %OVS_ACTION_ATTR_ADD_MPLS action + * argument. + * @mpls_lse: MPLS label stack entry to push. + * @mpls_ethertype: Ethertype to set in the encapsulating ethernet frame. + * @tun_flags: MPLS tunnel attributes. + * + * The only values @mpls_ethertype should ever be given are %ETH_P_MPLS_UC and + * %ETH_P_MPLS_MC, indicating MPLS unicast or multicast. Other are rejected. + */ +struct ovs_action_add_mpls { + __be32 mpls_lse; + __be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */ + __u16 tun_flags; +}; + +#define OVS_MPLS_L3_TUNNEL_FLAG_MASK (1 << 0) /* Flag to specify the place of + * insertion of MPLS header. + * When false, the MPLS header + * will be inserted at the start + * of the packet. + * When true, the MPLS header + * will be inserted at the start + * of the l3 header. + */ + /** * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument. * @vlan_tpid: Tag protocol identifier (TPID) to push. @@ -892,6 +918,10 @@ struct check_pkt_len_arg { * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the packet length and execute a set * of actions if greater than the specified packet length, else execute * another set of actions. + * @OVS_ACTION_ATTR_ADD_MPLS: Push a new MPLS label stack entry at the + * start of the packet or at the start of the l3 header depending on the value + * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS + * argument. * * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all * fields within a header are modifiable, e.g. the IPv4 protocol and fragment @@ -927,6 +957,7 @@ enum ovs_action_attr { OVS_ACTION_ATTR_METER, /* u32 meter ID. */ OVS_ACTION_ATTR_CLONE, /* Nested OVS_CLONE_ATTR_*. */ OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */ + OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ __OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted * from userspace. */ -- cgit v1.2.3 From 6b722237b656d045c0b9bab9966a5e46604258ba Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 23 Dec 2019 15:28:12 +0200 Subject: net: fib_notifier: Add temporary events to the FIB notification chain Subsequent patches are going to simplify the IPv6 route offload API, which will only use three events - replace, delete and append. Introduce a temporary version of replace and delete in order to make the conversion easier to review. Note that append does not need a temporary version, as it is currently not used. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/fib_notifier.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h index 6d59221ff05a..b3c54325caec 100644 --- a/include/net/fib_notifier.h +++ b/include/net/fib_notifier.h @@ -23,6 +23,8 @@ enum fib_event_type { FIB_EVENT_NH_DEL, FIB_EVENT_VIF_ADD, FIB_EVENT_VIF_DEL, + FIB_EVENT_ENTRY_REPLACE_TMP, + FIB_EVENT_ENTRY_DEL_TMP, }; struct fib_notifier_ops { -- cgit v1.2.3 From d2f0c9b11410f9c6a07c126f8a215b0b81cdcf6c Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 23 Dec 2019 15:28:17 +0200 Subject: ipv6: Handle route deletion notification For the purpose of route offload, when a single route is deleted, it is only of interest if it is the first route in the node or if it is sibling to such a route. In the first case, distinguish between several possibilities: 1. Route is the last route in the node. Emit a delete notification 2. Route is followed by a non-multipath route. Emit a replace notification for the non-multipath route. 3. Route is followed by a multipath route. Emit a replace notification for the multipath route. In the second case, only emit a delete notification to ensure the route is no longer used as a valid nexthop. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/ip6_fib.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index f1535f172935..b579faea41e9 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -487,6 +487,7 @@ int call_fib6_multipath_entry_notifiers(struct net *net, struct fib6_info *rt, unsigned int nsiblings, struct netlink_ext_ack *extack); +int call_fib6_entry_notifiers_replace(struct net *net, struct fib6_info *rt); void fib6_rt_update(struct net *net, struct fib6_info *rt, struct nl_info *info); void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info, -- cgit v1.2.3 From caafb2509fac1432849650826953dd88b7cbe374 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 23 Dec 2019 15:28:20 +0200 Subject: ipv6: Remove old route notifications and convert listeners Now that mlxsw is converted to use the new FIB notifications it is possible to delete the old ones and use the new replace / append / delete notifications. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/fib_notifier.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h index b3c54325caec..6d59221ff05a 100644 --- a/include/net/fib_notifier.h +++ b/include/net/fib_notifier.h @@ -23,8 +23,6 @@ enum fib_event_type { FIB_EVENT_NH_DEL, FIB_EVENT_VIF_ADD, FIB_EVENT_VIF_DEL, - FIB_EVENT_ENTRY_REPLACE_TMP, - FIB_EVENT_ENTRY_DEL_TMP, }; struct fib_notifier_ops { -- cgit v1.2.3 From 0e5dafc8a6e540c0145b61545c557c43be70af10 Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Wed, 25 Dec 2019 18:16:09 -0800 Subject: net: phy: Introduce helper functions for time stamping support. Some parts of the networking stack and at least one driver test fields within the 'struct phy_device' in order to query time stamping capabilities and to invoke time stamping methods. This patch adds a functional interface around the time stamping fields. This will allow insulating the callers from future changes to the details of the time stamping implemenation. Signed-off-by: Richard Cochran Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/phy.h | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 7d530b3f8855..0248f5e9939d 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -936,6 +936,66 @@ static inline bool phy_polling_mode(struct phy_device *phydev) return phydev->irq == PHY_POLL; } +/** + * phy_has_hwtstamp - Tests whether a PHY time stamp configuration. + * @phydev: the phy_device struct + */ +static inline bool phy_has_hwtstamp(struct phy_device *phydev) +{ + return phydev && phydev->drv && phydev->drv->hwtstamp; +} + +/** + * phy_has_rxtstamp - Tests whether a PHY supports receive time stamping. + * @phydev: the phy_device struct + */ +static inline bool phy_has_rxtstamp(struct phy_device *phydev) +{ + return phydev && phydev->drv && phydev->drv->rxtstamp; +} + +/** + * phy_has_tsinfo - Tests whether a PHY reports time stamping and/or + * PTP hardware clock capabilities. + * @phydev: the phy_device struct + */ +static inline bool phy_has_tsinfo(struct phy_device *phydev) +{ + return phydev && phydev->drv && phydev->drv->ts_info; +} + +/** + * phy_has_txtstamp - Tests whether a PHY supports transmit time stamping. + * @phydev: the phy_device struct + */ +static inline bool phy_has_txtstamp(struct phy_device *phydev) +{ + return phydev && phydev->drv && phydev->drv->txtstamp; +} + +static inline int phy_hwtstamp(struct phy_device *phydev, struct ifreq *ifr) +{ + return phydev->drv->hwtstamp(phydev, ifr); +} + +static inline bool phy_rxtstamp(struct phy_device *phydev, struct sk_buff *skb, + int type) +{ + return phydev->drv->rxtstamp(phydev, skb, type); +} + +static inline int phy_ts_info(struct phy_device *phydev, + struct ethtool_ts_info *tsinfo) +{ + return phydev->drv->ts_info(phydev, tsinfo); +} + +static inline void phy_txtstamp(struct phy_device *phydev, struct sk_buff *skb, + int type) +{ + phydev->drv->txtstamp(phydev, skb, type); +} + /** * phy_is_internal - Convenience function for testing if a PHY is internal * @phydev: the phy_device struct -- cgit v1.2.3 From 4715f65ffa0520af0680dbfbedbe349f175adaf4 Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Wed, 25 Dec 2019 18:16:15 -0800 Subject: net: Introduce a new MII time stamping interface. Currently the stack supports time stamping in PHY devices. However, there are newer, non-PHY devices that can snoop an MII bus and provide time stamps. In order to support such devices, this patch introduces a new interface to be used by both PHY and non-PHY devices. In addition, the one and only user of the old PHY time stamping API is converted to the new interface. Signed-off-by: Richard Cochran Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/mii_timestamper.h | 58 +++++++++++++++++++++++++++++++++++++++++ include/linux/phy.h | 41 +++++++---------------------- 2 files changed, 68 insertions(+), 31 deletions(-) create mode 100644 include/linux/mii_timestamper.h (limited to 'include') diff --git a/include/linux/mii_timestamper.h b/include/linux/mii_timestamper.h new file mode 100644 index 000000000000..36002386029c --- /dev/null +++ b/include/linux/mii_timestamper.h @@ -0,0 +1,58 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Support for generic time stamping devices on MII buses. + * Copyright (C) 2018 Richard Cochran + */ +#ifndef _LINUX_MII_TIMESTAMPER_H +#define _LINUX_MII_TIMESTAMPER_H + +#include +#include +#include + +struct phy_device; + +/** + * struct mii_timestamper - Callback interface to MII time stamping devices. + * + * @rxtstamp: Requests a Rx timestamp for 'skb'. If the skb is accepted, + * the MII time stamping device promises to deliver it using + * netif_rx() as soon as a timestamp becomes available. One of + * the PTP_CLASS_ values is passed in 'type'. The function + * must return true if the skb is accepted for delivery. + * + * @txtstamp: Requests a Tx timestamp for 'skb'. The MII time stamping + * device promises to deliver it using skb_complete_tx_timestamp() + * as soon as a timestamp becomes available. One of the PTP_CLASS_ + * values is passed in 'type'. + * + * @hwtstamp: Handles SIOCSHWTSTAMP ioctl for hardware time stamping. + * + * @link_state: Allows the device to respond to changes in the link + * state. The caller invokes this function while holding + * the phy_device mutex. + * + * @ts_info: Handles ethtool queries for hardware time stamping. + * + * Drivers for PHY time stamping devices should embed their + * mii_timestamper within a private structure, obtaining a reference + * to it using container_of(). + */ +struct mii_timestamper { + bool (*rxtstamp)(struct mii_timestamper *mii_ts, + struct sk_buff *skb, int type); + + void (*txtstamp)(struct mii_timestamper *mii_ts, + struct sk_buff *skb, int type); + + int (*hwtstamp)(struct mii_timestamper *mii_ts, + struct ifreq *ifreq); + + void (*link_state)(struct mii_timestamper *mii_ts, + struct phy_device *phydev); + + int (*ts_info)(struct mii_timestamper *mii_ts, + struct ethtool_ts_info *ts_info); +}; + +#endif diff --git a/include/linux/phy.h b/include/linux/phy.h index 0248f5e9939d..30e599c454db 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -441,6 +442,7 @@ struct phy_device { struct sfp_bus *sfp_bus; struct phylink *phylink; struct net_device *attached_dev; + struct mii_timestamper *mii_ts; u8 mdix; u8 mdix_ctrl; @@ -546,29 +548,6 @@ struct phy_driver { */ int (*match_phy_device)(struct phy_device *phydev); - /* Handles ethtool queries for hardware time stamping. */ - int (*ts_info)(struct phy_device *phydev, struct ethtool_ts_info *ti); - - /* Handles SIOCSHWTSTAMP ioctl for hardware time stamping. */ - int (*hwtstamp)(struct phy_device *phydev, struct ifreq *ifr); - - /* - * Requests a Rx timestamp for 'skb'. If the skb is accepted, - * the phy driver promises to deliver it using netif_rx() as - * soon as a timestamp becomes available. One of the - * PTP_CLASS_ values is passed in 'type'. The function must - * return true if the skb is accepted for delivery. - */ - bool (*rxtstamp)(struct phy_device *dev, struct sk_buff *skb, int type); - - /* - * Requests a Tx timestamp for 'skb'. The phy driver promises - * to deliver it using skb_complete_tx_timestamp() as soon as a - * timestamp becomes available. One of the PTP_CLASS_ values - * is passed in 'type'. - */ - void (*txtstamp)(struct phy_device *dev, struct sk_buff *skb, int type); - /* Some devices (e.g. qnap TS-119P II) require PHY register changes to * enable Wake on LAN, so set_wol is provided to be called in the * ethernet driver's set_wol function. */ @@ -942,7 +921,7 @@ static inline bool phy_polling_mode(struct phy_device *phydev) */ static inline bool phy_has_hwtstamp(struct phy_device *phydev) { - return phydev && phydev->drv && phydev->drv->hwtstamp; + return phydev && phydev->mii_ts && phydev->mii_ts->hwtstamp; } /** @@ -951,7 +930,7 @@ static inline bool phy_has_hwtstamp(struct phy_device *phydev) */ static inline bool phy_has_rxtstamp(struct phy_device *phydev) { - return phydev && phydev->drv && phydev->drv->rxtstamp; + return phydev && phydev->mii_ts && phydev->mii_ts->rxtstamp; } /** @@ -961,7 +940,7 @@ static inline bool phy_has_rxtstamp(struct phy_device *phydev) */ static inline bool phy_has_tsinfo(struct phy_device *phydev) { - return phydev && phydev->drv && phydev->drv->ts_info; + return phydev && phydev->mii_ts && phydev->mii_ts->ts_info; } /** @@ -970,30 +949,30 @@ static inline bool phy_has_tsinfo(struct phy_device *phydev) */ static inline bool phy_has_txtstamp(struct phy_device *phydev) { - return phydev && phydev->drv && phydev->drv->txtstamp; + return phydev && phydev->mii_ts && phydev->mii_ts->txtstamp; } static inline int phy_hwtstamp(struct phy_device *phydev, struct ifreq *ifr) { - return phydev->drv->hwtstamp(phydev, ifr); + return phydev->mii_ts->hwtstamp(phydev->mii_ts, ifr); } static inline bool phy_rxtstamp(struct phy_device *phydev, struct sk_buff *skb, int type) { - return phydev->drv->rxtstamp(phydev, skb, type); + return phydev->mii_ts->rxtstamp(phydev->mii_ts, skb, type); } static inline int phy_ts_info(struct phy_device *phydev, struct ethtool_ts_info *tsinfo) { - return phydev->drv->ts_info(phydev, tsinfo); + return phydev->mii_ts->ts_info(phydev->mii_ts, tsinfo); } static inline void phy_txtstamp(struct phy_device *phydev, struct sk_buff *skb, int type) { - phydev->drv->txtstamp(phydev, skb, type); + phydev->mii_ts->txtstamp(phydev->mii_ts, skb, type); } /** -- cgit v1.2.3 From 767ff483731502a0fc34f34a3a0851aca175eb71 Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Wed, 25 Dec 2019 18:16:16 -0800 Subject: net: Add a layer for non-PHY MII time stamping drivers. While PHY time stamping drivers can simply attach their interface directly to the PHY instance, stand alone drivers require support in order to manage their services. Non-PHY MII time stamping drivers have a control interface over another bus like I2C, SPI, UART, or via a memory mapped peripheral. The controller device will be associated with one or more time stamping channels, each of which sits snoops in on a MII bus. This patch provides a glue layer that will enable time stamping channels to find their controlling device. Signed-off-by: Richard Cochran Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/mii_timestamper.h | 63 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) (limited to 'include') diff --git a/include/linux/mii_timestamper.h b/include/linux/mii_timestamper.h index 36002386029c..fa940bbaf8ae 100644 --- a/include/linux/mii_timestamper.h +++ b/include/linux/mii_timestamper.h @@ -33,10 +33,15 @@ struct phy_device; * the phy_device mutex. * * @ts_info: Handles ethtool queries for hardware time stamping. + * @device: Remembers the device to which the instance belongs. * * Drivers for PHY time stamping devices should embed their * mii_timestamper within a private structure, obtaining a reference * to it using container_of(). + * + * Drivers for non-PHY time stamping devices should return a pointer + * to a mii_timestamper from the probe_channel() callback of their + * mii_timestamping_ctrl interface. */ struct mii_timestamper { bool (*rxtstamp)(struct mii_timestamper *mii_ts, @@ -53,6 +58,64 @@ struct mii_timestamper { int (*ts_info)(struct mii_timestamper *mii_ts, struct ethtool_ts_info *ts_info); + + struct device *device; +}; + +/** + * struct mii_timestamping_ctrl - MII time stamping controller interface. + * + * @probe_channel: Callback into the controller driver announcing the + * presence of the 'port' channel. The 'device' field + * had been passed to register_mii_tstamp_controller(). + * The driver must return either a pointer to a valid + * MII timestamper instance or PTR_ERR. + * + * @release_channel: Releases an instance obtained via .probe_channel. + */ +struct mii_timestamping_ctrl { + struct mii_timestamper *(*probe_channel)(struct device *device, + unsigned int port); + void (*release_channel)(struct device *device, + struct mii_timestamper *mii_ts); }; +#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING + +int register_mii_tstamp_controller(struct device *device, + struct mii_timestamping_ctrl *ctrl); + +void unregister_mii_tstamp_controller(struct device *device); + +struct mii_timestamper *register_mii_timestamper(struct device_node *node, + unsigned int port); + +void unregister_mii_timestamper(struct mii_timestamper *mii_ts); + +#else + +static inline +int register_mii_tstamp_controller(struct device *device, + struct mii_timestamping_ctrl *ctrl) +{ + return -EOPNOTSUPP; +} + +static inline void unregister_mii_tstamp_controller(struct device *device) +{ +} + +static inline +struct mii_timestamper *register_mii_timestamper(struct device_node *node, + unsigned int port) +{ + return NULL; +} + +static inline void unregister_mii_timestamper(struct mii_timestamper *mii_ts) +{ +} + +#endif + #endif -- cgit v1.2.3 From b6fd7b96366769651ab23988607ce9c5c9042cdb Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Wed, 25 Dec 2019 18:16:19 -0800 Subject: net: Introduce peer to peer one step PTP time stamping. The 1588 standard defines one step operation for both Sync and PDelay_Resp messages. Up until now, hardware with P2P one step has been rare, and kernel support was lacking. This patch adds support of the mode in anticipation of new hardware developments. Signed-off-by: Richard Cochran Signed-off-by: David S. Miller --- include/uapi/linux/net_tstamp.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index e5b39721c6e4..f96e650d0af9 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -90,6 +90,14 @@ enum hwtstamp_tx_types { * queue. */ HWTSTAMP_TX_ONESTEP_SYNC, + + /* + * Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time + * stamp insertion directly into PDelay_Resp packets. In this + * case, neither transmitted Sync nor PDelay_Resp packets will + * receive a time stamp via the socket error queue. + */ + HWTSTAMP_TX_ONESTEP_P2P, }; /* possible values for hwtstamp_config->rx_filter */ -- cgit v1.2.3 From c14ceb0ec727187f71a487a592ffa91767fed66e Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 18 Dec 2019 12:05:21 +0100 Subject: netfilter: nft_meta: add support for slave device ifindex matching Allow to match on vrf slave ifindex or name. In case there was no slave interface involved, store 0 in the destination register just like existing iif/oif matching. sdif(name) is restricted to the ipv4/ipv6 input and forward hooks, as it depends on ip(6) stack parsing/storing info in skb->cb[]. Cc: Martin Willi Cc: David Ahern Cc: Shrijeet Mukherjee Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index bb9b049310df..e237ecbdcd8a 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -805,6 +805,8 @@ enum nft_exthdr_attributes { * @NFT_META_TIME_NS: time since epoch (in nanoseconds) * @NFT_META_TIME_DAY: day of week (from 0 = Sunday to 6 = Saturday) * @NFT_META_TIME_HOUR: hour of day (in seconds) + * @NFT_META_SDIF: slave device interface index + * @NFT_META_SDIFNAME: slave device interface name */ enum nft_meta_keys { NFT_META_LEN, @@ -840,6 +842,8 @@ enum nft_meta_keys { NFT_META_TIME_NS, NFT_META_TIME_DAY, NFT_META_TIME_HOUR, + NFT_META_SDIF, + NFT_META_SDIFNAME, }; /** -- cgit v1.2.3 From f643ee295c1c63bc117fb052d4da681354d6f732 Mon Sep 17 00:00:00 2001 From: Kevin Kou Date: Thu, 26 Dec 2019 12:29:17 +0000 Subject: sctp: move trace_sctp_probe_path into sctp_outq_sack The original patch bringed in the "SCTP ACK tracking trace event" feature was committed at Dec.20, 2017, it replaced jprobe usage with trace events, and bringed in two trace events, one is TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path). The original patch intended to trigger the trace_sctp_probe_path in TRACE_EVENT(sctp_probe) as below code, +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, &asoc->peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), But I found it did not work when I did testing, and trace_sctp_probe_path had no output, I finally found that there is trace buffer lock operation(trace_event_buffer_reserve) in include/trace/trace_events.h: static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ struct trace_event_file *trace_file = __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ struct trace_event_buffer fbuffer; \ struct trace_event_raw_##call *entry; \ int __data_size; \ \ if (trace_trigger_soft_disabled(trace_file)) \ return; \ \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ \ trace_event_buffer_commit(&fbuffer); \ } The reason caused no output of trace_sctp_probe_path is that trace_sctp_probe_path written in TP_fast_assign part of TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the trace_event_buffer_reserve() when compiler expands Macro, entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ so trace_sctp_probe_path finally can not acquire trace_event_buffer and return no output, that is to say the nest of tracepoint entry function is not allowed. The function call flow is: trace_sctp_probe() -> trace_event_raw_event_sctp_probe() -> lock buffer -> trace_sctp_probe_path() -> trace_event_raw_event_sctp_probe_path() --nested -> buffer has been locked and return no output. This patch is to remove trace_sctp_probe_path from the TP_fast_assign part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function, and trigger sctp_probe_path_trace in sctp_outq_sack. After this patch, you can enable both events individually, # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Signed-off-by: Kevin Kou Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller --- include/trace/events/sctp.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include') diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h index 7475c7be165a..d4aac3436595 100644 --- a/include/trace/events/sctp.h +++ b/include/trace/events/sctp.h @@ -75,15 +75,6 @@ TRACE_EVENT(sctp_probe, __entry->pathmtu = asoc->pathmtu; __entry->rwnd = asoc->peer.rwnd; __entry->unack_data = asoc->unack_data; - - if (trace_sctp_probe_path_enabled()) { - struct sctp_transport *sp; - - list_for_each_entry(sp, &asoc->peer.transport_addr_list, - transports) { - trace_sctp_probe_path(sp, asoc); - } - } ), TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d " -- cgit v1.2.3 From c1e469902640ed0a82d0b8df5951199c17a49e3c Mon Sep 17 00:00:00 2001 From: Andy Roulin Date: Thu, 26 Dec 2019 05:41:57 -0800 Subject: bonding: rename AD_STATE_* to LACP_STATE_* As the LACP actor/partner state is now part of the uapi, rename the 3ad state defines with LACP prefix. The LACP prefix is preferred over BOND_3AD as the LACP standard moved to 802.1AX. Fixes: 826f66b30c2e3 ("bonding: move 802.3ad port state flags to uapi") Signed-off-by: Andy Roulin Signed-off-by: David S. Miller --- include/uapi/linux/if_bonding.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h index 6829213a54c5..45f3750aa861 100644 --- a/include/uapi/linux/if_bonding.h +++ b/include/uapi/linux/if_bonding.h @@ -96,14 +96,14 @@ #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */ /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */ -#define AD_STATE_LACP_ACTIVITY 0x1 -#define AD_STATE_LACP_TIMEOUT 0x2 -#define AD_STATE_AGGREGATION 0x4 -#define AD_STATE_SYNCHRONIZATION 0x8 -#define AD_STATE_COLLECTING 0x10 -#define AD_STATE_DISTRIBUTING 0x20 -#define AD_STATE_DEFAULTED 0x40 -#define AD_STATE_EXPIRED 0x80 +#define LACP_STATE_LACP_ACTIVITY 0x1 +#define LACP_STATE_LACP_TIMEOUT 0x2 +#define LACP_STATE_AGGREGATION 0x4 +#define LACP_STATE_SYNCHRONIZATION 0x8 +#define LACP_STATE_COLLECTING 0x10 +#define LACP_STATE_DISTRIBUTING 0x20 +#define LACP_STATE_DEFAULTED 0x40 +#define LACP_STATE_EXPIRED 0x80 typedef struct ifbond { __s32 bond_mode; -- cgit v1.2.3 From 2b4a8990b7df55875745a80a609a1ceaaf51f322 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:18 +0100 Subject: ethtool: introduce ethtool netlink interface Basic genetlink and init infrastructure for the netlink interface, register genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to make the build optional. Add initial overall interface description into Documentation/networking/ethtool-netlink.rst, further patches will add more detailed information. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/ethtool_netlink.h | 9 +++++++++ include/uapi/linux/ethtool_netlink.h | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100644 include/linux/ethtool_netlink.h create mode 100644 include/uapi/linux/ethtool_netlink.h (limited to 'include') diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h new file mode 100644 index 000000000000..f27e92b5f344 --- /dev/null +++ b/include/linux/ethtool_netlink.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef _LINUX_ETHTOOL_NETLINK_H_ +#define _LINUX_ETHTOOL_NETLINK_H_ + +#include +#include + +#endif /* _LINUX_ETHTOOL_NETLINK_H_ */ diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h new file mode 100644 index 000000000000..3c93276ba066 --- /dev/null +++ b/include/uapi/linux/ethtool_netlink.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ +/* + * include/uapi/linux/ethtool_netlink.h - netlink interface for ethtool + * + * See Documentation/networking/ethtool-netlink.txt in kernel source tree for + * doucumentation of the interface. + */ + +#ifndef _UAPI_LINUX_ETHTOOL_NETLINK_H_ +#define _UAPI_LINUX_ETHTOOL_NETLINK_H_ + +#include + +/* message types - userspace to kernel */ +enum { + ETHTOOL_MSG_USER_NONE, + + /* add new constants above here */ + __ETHTOOL_MSG_USER_CNT, + ETHTOOL_MSG_USER_MAX = __ETHTOOL_MSG_USER_CNT - 1 +}; + +/* message types - kernel to userspace */ +enum { + ETHTOOL_MSG_KERNEL_NONE, + + /* add new constants above here */ + __ETHTOOL_MSG_KERNEL_CNT, + ETHTOOL_MSG_KERNEL_MAX = __ETHTOOL_MSG_KERNEL_CNT - 1 +}; + +/* generic netlink info */ +#define ETHTOOL_GENL_NAME "ethtool" +#define ETHTOOL_GENL_VERSION 1 + +#endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */ -- cgit v1.2.3 From 041b1c5d4a53e97fc9e029ae32469552ca12cb9b Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:23 +0100 Subject: ethtool: helper functions for netlink interface Add common request/reply header definition and helpers to parse request header and fill reply header. Provide ethnl_update_* helpers to update structure members from request attributes (to be used for *_SET requests). Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 3c93276ba066..82fc3b5f41c9 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -29,6 +29,27 @@ enum { ETHTOOL_MSG_KERNEL_MAX = __ETHTOOL_MSG_KERNEL_CNT - 1 }; +/* request header */ + +/* use compact bitsets in reply */ +#define ETHTOOL_FLAG_COMPACT_BITSETS (1 << 0) +/* provide optional reply for SET or ACT requests */ +#define ETHTOOL_FLAG_OMIT_REPLY (1 << 1) + +#define ETHTOOL_FLAG_ALL (ETHTOOL_FLAG_COMPACT_BITSETS | \ + ETHTOOL_FLAG_OMIT_REPLY) + +enum { + ETHTOOL_A_HEADER_UNSPEC, + ETHTOOL_A_HEADER_DEV_INDEX, /* u32 */ + ETHTOOL_A_HEADER_DEV_NAME, /* string */ + ETHTOOL_A_HEADER_FLAGS, /* u32 - ETHTOOL_FLAG_* */ + + /* add new constants above here */ + __ETHTOOL_A_HEADER_CNT, + ETHTOOL_A_HEADER_MAX = __ETHTOOL_A_HEADER_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From 10b518d4e6dd5390e40f7d8de0f08753c1195a7e Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:28 +0100 Subject: ethtool: netlink bitset handling The ethtool netlink code uses common framework for passing arbitrary length bit sets to allow future extensions. A bitset can be a list (only one bitmap) or can consist of value and mask pair (used e.g. when client want to modify only some bits). A bitset can use one of two formats: verbose (bit by bit) or compact. Verbose format consists of bitset size (number of bits), list flag and an array of bit nests, telling which bits are part of the list or which bits are in the mask and which of them are to be set. In requests, bits can be identified by index (position) or by name. In replies, kernel provides both index and name. Verbose format is suitable for "one shot" applications like standard ethtool command as it avoids the need to either keep bit names (e.g. link modes) in sync with kernel or having to add an extra roundtrip for string set request (e.g. for private flags). Compact format uses one (list) or two (value/mask) arrays of 32-bit words to store the bitmap(s). It is more suitable for long running applications (ethtool in monitor mode or network management daemons) which can retrieve the names once and then pass only compact bitmaps to save space. Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS flag in request header tells kernel which format to use in reply. Notifications always use compact format. As some code uses arrays of unsigned long for internal representation and some arrays of u32 (or even a single u32), two sets of parse/compose helpers are introduced. To avoid code duplication, helpers for unsigned long arrays are implemented as wrappers around helpers for u32 arrays. There are two reasons for this choice: (1) u32 arrays are more frequent in ethtool code and (2) unsigned long array can be always interpreted as an u32 array on little endian 64-bit and all 32-bit architectures while we would need special handling for odd number of u32 words in the opposite direction. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 82fc3b5f41c9..951203049615 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -50,6 +50,41 @@ enum { ETHTOOL_A_HEADER_MAX = __ETHTOOL_A_HEADER_CNT - 1 }; +/* bit sets */ + +enum { + ETHTOOL_A_BITSET_BIT_UNSPEC, + ETHTOOL_A_BITSET_BIT_INDEX, /* u32 */ + ETHTOOL_A_BITSET_BIT_NAME, /* string */ + ETHTOOL_A_BITSET_BIT_VALUE, /* flag */ + + /* add new constants above here */ + __ETHTOOL_A_BITSET_BIT_CNT, + ETHTOOL_A_BITSET_BIT_MAX = __ETHTOOL_A_BITSET_BIT_CNT - 1 +}; + +enum { + ETHTOOL_A_BITSET_BITS_UNSPEC, + ETHTOOL_A_BITSET_BITS_BIT, /* nest - _A_BITSET_BIT_* */ + + /* add new constants above here */ + __ETHTOOL_A_BITSET_BITS_CNT, + ETHTOOL_A_BITSET_BITS_MAX = __ETHTOOL_A_BITSET_BITS_CNT - 1 +}; + +enum { + ETHTOOL_A_BITSET_UNSPEC, + ETHTOOL_A_BITSET_NOMASK, /* flag */ + ETHTOOL_A_BITSET_SIZE, /* u32 */ + ETHTOOL_A_BITSET_BITS, /* nest - _A_BITSET_BITS_* */ + ETHTOOL_A_BITSET_VALUE, /* binary */ + ETHTOOL_A_BITSET_MASK, /* binary */ + + /* add new constants above here */ + __ETHTOOL_A_BITSET_CNT, + ETHTOOL_A_BITSET_MAX = __ETHTOOL_A_BITSET_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From 6b08d6c146f4c5ed451c45339c10feb06d619db2 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:33 +0100 Subject: ethtool: support for netlink notifications Add infrastructure for ethtool netlink notifications. There is only one multicast group "monitor" which is used to notify userspace about changes and actions performed. Notification messages (types using suffix _NTF) share the format with replies to GET requests. Notifications are supposed to be broadcasted on every configuration change, whether it is done using the netlink interface or ioctl one. Netlink SET requests only trigger a notification if some data is actually changed. To trigger an ethtool notification, both ethtool netlink and external code use ethtool_notify() helper. This helper requires RTNL to be held and may sleep. Handlers sending messages for specific notification message types are registered in ethnl_notify_handlers array. As notifications can be triggered from other code, ethnl_ok flag is used to prevent an attempt to send notification before genetlink family is registered. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/ethtool_netlink.h | 5 +++++ include/linux/netdevice.h | 9 +++++++++ include/uapi/linux/ethtool_netlink.h | 2 ++ 3 files changed, 16 insertions(+) (limited to 'include') diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h index f27e92b5f344..c98f6852c8eb 100644 --- a/include/linux/ethtool_netlink.h +++ b/include/linux/ethtool_netlink.h @@ -5,5 +5,10 @@ #include #include +#include + +enum ethtool_multicast_groups { + ETHNL_MCGRP_MONITOR, +}; #endif /* _LINUX_ETHTOOL_NETLINK_H_ */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 469a297b58c0..f007155ae8f4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4393,6 +4393,15 @@ struct netdev_notifier_bonding_info { void netdev_bonding_info_change(struct net_device *dev, struct netdev_bonding_info *bonding_info); +#if IS_ENABLED(CONFIG_ETHTOOL_NETLINK) +void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data); +#else +static inline void ethtool_notify(struct net_device *dev, unsigned int cmd, + const void *data) +{ +} +#endif + static inline struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features) { diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 951203049615..d530ccb6f7e6 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -89,4 +89,6 @@ enum { #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 +#define ETHTOOL_MCGRP_MONITOR_NAME "monitor" + #endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */ -- cgit v1.2.3 From 71921690f9745fef60a2bad425f30adf8cdc9da0 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:43 +0100 Subject: ethtool: provide string sets with STRSET_GET request Requests a contents of one or more string sets, i.e. indexed arrays of strings; this information is provided by ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all information can be retrieved with one request and mulitple string sets can be requested at once. There are three types of requests: - no NLM_F_DUMP, no device: get "global" stringsets - no NLM_F_DUMP, with device: get string sets related to the device - NLM_F_DUMP, no device: get device related string sets for all devices Client can request either all string sets of given type (global or device related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only set sizes (numbers of strings) are returned. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool.h | 3 ++ include/uapi/linux/ethtool_netlink.h | 56 ++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index f44155840b07..116bcbf09c74 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -606,6 +606,9 @@ enum ethtool_stringset { ETH_SS_PHY_STATS, ETH_SS_PHY_TUNABLES, ETH_SS_LINK_MODES, + + /* add new constants above here */ + ETH_SS_COUNT }; /** diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index d530ccb6f7e6..cabef1fec42a 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -14,6 +14,7 @@ /* message types - userspace to kernel */ enum { ETHTOOL_MSG_USER_NONE, + ETHTOOL_MSG_STRSET_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -23,6 +24,7 @@ enum { /* message types - kernel to userspace */ enum { ETHTOOL_MSG_KERNEL_NONE, + ETHTOOL_MSG_STRSET_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -85,6 +87,60 @@ enum { ETHTOOL_A_BITSET_MAX = __ETHTOOL_A_BITSET_CNT - 1 }; +/* string sets */ + +enum { + ETHTOOL_A_STRING_UNSPEC, + ETHTOOL_A_STRING_INDEX, /* u32 */ + ETHTOOL_A_STRING_VALUE, /* string */ + + /* add new constants above here */ + __ETHTOOL_A_STRING_CNT, + ETHTOOL_A_STRING_MAX = __ETHTOOL_A_STRING_CNT - 1 +}; + +enum { + ETHTOOL_A_STRINGS_UNSPEC, + ETHTOOL_A_STRINGS_STRING, /* nest - _A_STRINGS_* */ + + /* add new constants above here */ + __ETHTOOL_A_STRINGS_CNT, + ETHTOOL_A_STRINGS_MAX = __ETHTOOL_A_STRINGS_CNT - 1 +}; + +enum { + ETHTOOL_A_STRINGSET_UNSPEC, + ETHTOOL_A_STRINGSET_ID, /* u32 */ + ETHTOOL_A_STRINGSET_COUNT, /* u32 */ + ETHTOOL_A_STRINGSET_STRINGS, /* nest - _A_STRINGS_* */ + + /* add new constants above here */ + __ETHTOOL_A_STRINGSET_CNT, + ETHTOOL_A_STRINGSET_MAX = __ETHTOOL_A_STRINGSET_CNT - 1 +}; + +enum { + ETHTOOL_A_STRINGSETS_UNSPEC, + ETHTOOL_A_STRINGSETS_STRINGSET, /* nest - _A_STRINGSET_* */ + + /* add new constants above here */ + __ETHTOOL_A_STRINGSETS_CNT, + ETHTOOL_A_STRINGSETS_MAX = __ETHTOOL_A_STRINGSETS_CNT - 1 +}; + +/* STRSET */ + +enum { + ETHTOOL_A_STRSET_UNSPEC, + ETHTOOL_A_STRSET_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_STRSET_STRINGSETS, /* nest - _A_STRINGSETS_* */ + ETHTOOL_A_STRSET_COUNTS_ONLY, /* flag */ + + /* add new constants above here */ + __ETHTOOL_A_STRSET_CNT, + ETHTOOL_A_STRSET_MAX = __ETHTOOL_A_STRSET_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From 459e0b81b37043545d90629fdfb243444151e77d Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:48 +0100 Subject: ethtool: provide link settings with LINKINFO_GET request Implement LINKINFO_GET netlink request to get basic link settings provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides settings not directly related to autonegotiation and link mode selection: physical port, phy MDIO address, MDI(-X) status, MDI(-X) control and transceiver. LINKINFO_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index cabef1fec42a..1966532993e5 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -15,6 +15,7 @@ enum { ETHTOOL_MSG_USER_NONE, ETHTOOL_MSG_STRSET_GET, + ETHTOOL_MSG_LINKINFO_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -25,6 +26,7 @@ enum { enum { ETHTOOL_MSG_KERNEL_NONE, ETHTOOL_MSG_STRSET_GET_REPLY, + ETHTOOL_MSG_LINKINFO_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -141,6 +143,22 @@ enum { ETHTOOL_A_STRSET_MAX = __ETHTOOL_A_STRSET_CNT - 1 }; +/* LINKINFO */ + +enum { + ETHTOOL_A_LINKINFO_UNSPEC, + ETHTOOL_A_LINKINFO_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_LINKINFO_PORT, /* u8 */ + ETHTOOL_A_LINKINFO_PHYADDR, /* u8 */ + ETHTOOL_A_LINKINFO_TP_MDIX, /* u8 */ + ETHTOOL_A_LINKINFO_TP_MDIX_CTRL, /* u8 */ + ETHTOOL_A_LINKINFO_TRANSCEIVER, /* u8 */ + + /* add new constants above here */ + __ETHTOOL_A_LINKINFO_CNT, + ETHTOOL_A_LINKINFO_MAX = __ETHTOOL_A_LINKINFO_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From a53f3d41e4d3df46aba254ba51e7fbe45470fece Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:55:53 +0100 Subject: ethtool: set link settings with LINKINFO_SET request Implement LINKINFO_SET netlink request to set link settings queried by LINKINFO_GET message. Only physical port, phy MDIO address and MDI(-X) control can be set, attempt to modify MDI(-X) status and transceiver is rejected. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 1966532993e5..5b7806a5bef8 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -16,6 +16,7 @@ enum { ETHTOOL_MSG_USER_NONE, ETHTOOL_MSG_STRSET_GET, ETHTOOL_MSG_LINKINFO_GET, + ETHTOOL_MSG_LINKINFO_SET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, -- cgit v1.2.3 From 73286734c1b0d009fd317c2a74e173cb22233dad Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:56:03 +0100 Subject: ethtool: add LINKINFO_NTF notification Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKINFO_GET request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKINFO_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 5b7806a5bef8..d530fa30de36 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -28,6 +28,7 @@ enum { ETHTOOL_MSG_KERNEL_NONE, ETHTOOL_MSG_STRSET_GET_REPLY, ETHTOOL_MSG_LINKINFO_GET_REPLY, + ETHTOOL_MSG_LINKINFO_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, -- cgit v1.2.3 From f625aa9be8c10f2e4dc677837e240730a25feda7 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:56:08 +0100 Subject: ethtool: provide link mode information with LINKMODES_GET request Implement LINKMODES_GET netlink request to get link modes related information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides supported, advertised and peer advertised link modes, autonegotiation flag, speed and duplex. LINKMODES_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/ethtool_netlink.h | 3 +++ include/uapi/linux/ethtool_netlink.h | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+) (limited to 'include') diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h index c98f6852c8eb..d01b77887f82 100644 --- a/include/linux/ethtool_netlink.h +++ b/include/linux/ethtool_netlink.h @@ -7,6 +7,9 @@ #include #include +#define __ETHTOOL_LINK_MODE_MASK_NWORDS \ + DIV_ROUND_UP(__ETHTOOL_LINK_MODE_MASK_NBITS, 32) + enum ethtool_multicast_groups { ETHNL_MCGRP_MONITOR, }; diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index d530fa30de36..dc1cae052eee 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -17,6 +17,7 @@ enum { ETHTOOL_MSG_STRSET_GET, ETHTOOL_MSG_LINKINFO_GET, ETHTOOL_MSG_LINKINFO_SET, + ETHTOOL_MSG_LINKMODES_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -29,6 +30,7 @@ enum { ETHTOOL_MSG_STRSET_GET_REPLY, ETHTOOL_MSG_LINKINFO_GET_REPLY, ETHTOOL_MSG_LINKINFO_NTF, + ETHTOOL_MSG_LINKMODES_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -161,6 +163,22 @@ enum { ETHTOOL_A_LINKINFO_MAX = __ETHTOOL_A_LINKINFO_CNT - 1 }; +/* LINKMODES */ + +enum { + ETHTOOL_A_LINKMODES_UNSPEC, + ETHTOOL_A_LINKMODES_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_LINKMODES_AUTONEG, /* u8 */ + ETHTOOL_A_LINKMODES_OURS, /* bitset */ + ETHTOOL_A_LINKMODES_PEER, /* bitset */ + ETHTOOL_A_LINKMODES_SPEED, /* u32 */ + ETHTOOL_A_LINKMODES_DUPLEX, /* u8 */ + + /* add new constants above here */ + __ETHTOOL_A_LINKMODES_CNT, + ETHTOOL_A_LINKMODES_MAX = __ETHTOOL_A_LINKMODES_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From bfbcfe2032e70bd8598d680d39ac177d507e39ac Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:56:13 +0100 Subject: ethtool: set link modes related data with LINKMODES_SET request Implement LINKMODES_SET netlink request to set advertised linkmodes and related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do. The request allows setting autonegotiation flag, speed, duplex and advertised link modes. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index dc1cae052eee..cddf978b98df 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -18,6 +18,7 @@ enum { ETHTOOL_MSG_LINKINFO_GET, ETHTOOL_MSG_LINKINFO_SET, ETHTOOL_MSG_LINKMODES_GET, + ETHTOOL_MSG_LINKMODES_SET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, -- cgit v1.2.3 From 1b1b1847c8505df2e1dd8804838526bed22a8bd4 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:56:18 +0100 Subject: ethtool: add LINKMODES_NTF notification Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKMODES_GET request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKMODES_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index cddf978b98df..35948df6d6e3 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -32,6 +32,7 @@ enum { ETHTOOL_MSG_LINKINFO_GET_REPLY, ETHTOOL_MSG_LINKINFO_NTF, ETHTOOL_MSG_LINKMODES_GET_REPLY, + ETHTOOL_MSG_LINKMODES_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, -- cgit v1.2.3 From 3d2b847fb99cf2b28aa046e486636e555bc6ed1c Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Fri, 27 Dec 2019 15:56:23 +0100 Subject: ethtool: provide link state with LINKSTATE_GET request Implement LINKSTATE_GET netlink request to get link state information. At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command is returned. LINKSTATE_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 35948df6d6e3..02f82f42a889 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -19,6 +19,7 @@ enum { ETHTOOL_MSG_LINKINFO_SET, ETHTOOL_MSG_LINKMODES_GET, ETHTOOL_MSG_LINKMODES_SET, + ETHTOOL_MSG_LINKSTATE_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -33,6 +34,7 @@ enum { ETHTOOL_MSG_LINKINFO_NTF, ETHTOOL_MSG_LINKMODES_GET_REPLY, ETHTOOL_MSG_LINKMODES_NTF, + ETHTOOL_MSG_LINKSTATE_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -181,6 +183,18 @@ enum { ETHTOOL_A_LINKMODES_MAX = __ETHTOOL_A_LINKMODES_CNT - 1 }; +/* LINKSTATE */ + +enum { + ETHTOOL_A_LINKSTATE_UNSPEC, + ETHTOOL_A_LINKSTATE_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_LINKSTATE_LINK, /* u8 */ + + /* add new constants above here */ + __ETHTOOL_A_LINKSTATE_CNT, + ETHTOOL_A_LINKSTATE_MAX = __ETHTOOL_A_LINKSTATE_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From 544fed47af4d2174ac0b550e9c8da15c2dfdb117 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 27 Dec 2019 15:02:28 +0200 Subject: ptp: introduce ptp_cancel_worker_sync In order to effectively use the PTP kernel thread for tasks such as timestamping packets, allow the user control over stopping it, which is needed e.g. when the timestamping queues must be drained. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/ptp_clock_kernel.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h index 93cc4f1d444a..c64a1ef87240 100644 --- a/include/linux/ptp_clock_kernel.h +++ b/include/linux/ptp_clock_kernel.h @@ -243,6 +243,13 @@ int ptp_find_pin(struct ptp_clock *ptp, int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay); +/** + * ptp_cancel_worker_sync() - cancel ptp auxiliary clock + * + * @ptp: The clock obtained from ptp_clock_register(). + */ +void ptp_cancel_worker_sync(struct ptp_clock *ptp); + #else static inline struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info, struct device *parent) @@ -260,6 +267,8 @@ static inline int ptp_find_pin(struct ptp_clock *ptp, static inline int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) { return -EOPNOTSUPP; } +static inline void ptp_cancel_worker_sync(struct ptp_clock *ptp) +{ } #endif -- cgit v1.2.3 From 1e762bd278d2a70bc74b9cbee7f1e93bd4704fe2 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 27 Dec 2019 15:02:29 +0200 Subject: net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX timestamping And move the queue of skb's waiting for RX timestamps into the ptp_data structure, since it isn't needed if PTP is not compiled. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/dsa/sja1105.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h index 897e799dbcb9..c0b6a603ea8c 100644 --- a/include/linux/dsa/sja1105.h +++ b/include/linux/dsa/sja1105.h @@ -37,8 +37,6 @@ * the structure defined in struct sja1105_private. */ struct sja1105_tagger_data { - struct sk_buff_head skb_rxtstamp_queue; - struct work_struct rxtstamp_work; struct sk_buff *stampable_skb; /* Protects concurrent access to the meta state machine * from taggers running on multiple ports on SMP systems -- cgit v1.2.3 From 68e039f966cb577c91649a02591646ac3919f8c9 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 1 Jan 2020 00:00:01 +0100 Subject: batman-adv: Update copyright years for 2020 Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich --- include/uapi/linux/batadv_packet.h | 2 +- include/uapi/linux/batman_adv.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/batadv_packet.h b/include/uapi/linux/batadv_packet.h index 2a15f01c2243..0ae34c85ef9e 100644 --- a/include/uapi/linux/batadv_packet.h +++ b/include/uapi/linux/batadv_packet.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */ -/* Copyright (C) 2007-2019 B.A.T.M.A.N. contributors: +/* Copyright (C) 2007-2020 B.A.T.M.A.N. contributors: * * Marek Lindner, Simon Wunderlich */ diff --git a/include/uapi/linux/batman_adv.h b/include/uapi/linux/batman_adv.h index 67f4636758af..617c180ff0c8 100644 --- a/include/uapi/linux/batman_adv.h +++ b/include/uapi/linux/batman_adv.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: MIT */ -/* Copyright (C) 2016-2019 B.A.T.M.A.N. contributors: +/* Copyright (C) 2016-2020 B.A.T.M.A.N. contributors: * * Matthias Schiffer */ -- cgit v1.2.3 From dea53bb80e07b9e1641b865493908c20cb8df2ac Mon Sep 17 00:00:00 2001 From: David Ahern Date: Mon, 30 Dec 2019 14:14:28 -0800 Subject: tcp: Add l3index to tcp_md5sig_key and md5 functions Add l3index to tcp_md5sig_key to represent the L3 domain of a key, and add l3index to tcp_md5_do_add and tcp_md5_do_del to fill in the key. With the key now based on an l3index, add the new parameter to the lookup functions and consider the l3index when looking for a match. The l3index comes from the skb when processing ingress packets leveraging the helpers created for socket lookups, tcp_v4_sdif and inet_iif (and the v6 variants). When the sdif index is set it means the packet ingressed a device that is part of an L3 domain and inet_iif points to the VRF device. For egress, the L3 domain is determined from the socket binding and sk_bound_dev_if. Signed-off-by: David Ahern Signed-off-by: David S. Miller --- include/net/tcp.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index e460ea7f767b..7df37e2fddca 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1532,8 +1532,9 @@ struct tcp_md5sig_key { struct hlist_node node; u8 keylen; u8 family; /* AF_INET or AF_INET6 */ - union tcp_md5_addr addr; u8 prefixlen; + union tcp_md5_addr addr; + int l3index; /* set if key added with L3 scope */ u8 key[TCP_MD5SIG_MAXKEYLEN]; struct rcu_head rcu; }; @@ -1577,34 +1578,33 @@ struct tcp_md5sig_pool { int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key, const struct sock *sk, const struct sk_buff *skb); int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr, - int family, u8 prefixlen, const u8 *newkey, u8 newkeylen, - gfp_t gfp); + int family, u8 prefixlen, int l3index, + const u8 *newkey, u8 newkeylen, gfp_t gfp); int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr, - int family, u8 prefixlen); + int family, u8 prefixlen, int l3index); struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk, const struct sock *addr_sk); #ifdef CONFIG_TCP_MD5SIG #include extern struct static_key_false tcp_md5_needed; -struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, +struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, int l3index, const union tcp_md5_addr *addr, int family); static inline struct tcp_md5sig_key * -tcp_md5_do_lookup(const struct sock *sk, - const union tcp_md5_addr *addr, - int family) +tcp_md5_do_lookup(const struct sock *sk, int l3index, + const union tcp_md5_addr *addr, int family) { if (!static_branch_unlikely(&tcp_md5_needed)) return NULL; - return __tcp_md5_do_lookup(sk, addr, family); + return __tcp_md5_do_lookup(sk, l3index, addr, family); } #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key) #else -static inline struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk, - const union tcp_md5_addr *addr, - int family) +static inline struct tcp_md5sig_key * +tcp_md5_do_lookup(const struct sock *sk, int l3index, + const union tcp_md5_addr *addr, int family) { return NULL; } -- cgit v1.2.3 From 6b102db50cdde3ba2f78631ed21222edf3a5fb51 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Mon, 30 Dec 2019 14:14:29 -0800 Subject: net: Add device index to tcp_md5sig Add support for userspace to specify a device index to limit the scope of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad is renamed to tcpm_ifindex and the new field is only checked if the new TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index must point to an L3 master device (e.g., VRF). The API and error handling are setup to allow the constraint to be relaxed in the future to any device index. Signed-off-by: David Ahern Signed-off-by: David S. Miller --- include/uapi/linux/tcp.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 74af1f759cee..d87184e673ca 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -317,14 +317,15 @@ enum { #define TCP_MD5SIG_MAXKEYLEN 80 /* tcp_md5sig extension flags for TCP_MD5SIG_EXT */ -#define TCP_MD5SIG_FLAG_PREFIX 1 /* address prefix length */ +#define TCP_MD5SIG_FLAG_PREFIX 0x1 /* address prefix length */ +#define TCP_MD5SIG_FLAG_IFINDEX 0x2 /* ifindex set */ struct tcp_md5sig { struct __kernel_sockaddr_storage tcpm_addr; /* address associated */ __u8 tcpm_flags; /* extension flags */ __u8 tcpm_prefixlen; /* address prefix */ __u16 tcpm_keylen; /* key length */ - __u32 __tcpm_pad; /* zero */ + int tcpm_ifindex; /* device index for scope */ __u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */ }; -- cgit v1.2.3 From b39c78b2aa09cae05f3a48c11f67b3add0d604de Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Fri, 3 Jan 2020 11:51:00 +0800 Subject: net: remove the check argument from __skb_gro_checksum_convert The argument is always ignored, so remove it. Signed-off-by: Li RongQing Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/netdevice.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2fd19fb8826d..2741aa35bec6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2826,16 +2826,16 @@ static inline bool __skb_gro_checksum_convert_check(struct sk_buff *skb) } static inline void __skb_gro_checksum_convert(struct sk_buff *skb, - __sum16 check, __wsum pseudo) + __wsum pseudo) { NAPI_GRO_CB(skb)->csum = ~pseudo; NAPI_GRO_CB(skb)->csum_valid = 1; } -#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo) \ +#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo) \ do { \ if (__skb_gro_checksum_convert_check(skb)) \ - __skb_gro_checksum_convert(skb, check, \ + __skb_gro_checksum_convert(skb, \ compute_pseudo(skb, proto)); \ } while (0) -- cgit v1.2.3 From 36278a5d4d354e5d5610aa728831db9e03cc3d8d Mon Sep 17 00:00:00 2001 From: Alain Michaud Date: Wed, 11 Dec 2019 01:54:43 +0000 Subject: Bluetooth: Adding a bt_dev_warn_ratelimited macro. The macro will be used to display rate limited warning messages in the log. Signed-off-by: Alain Michaud Signed-off-by: Marcel Holtmann --- include/net/bluetooth/bluetooth.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index fabee6db0abb..bd2675266859 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -129,6 +129,8 @@ void bt_warn(const char *fmt, ...); __printf(1, 2) void bt_err(const char *fmt, ...); __printf(1, 2) +void bt_warn_ratelimited(const char *fmt, ...); +__printf(1, 2) void bt_err_ratelimited(const char *fmt, ...); #define BT_INFO(fmt, ...) bt_info(fmt "\n", ##__VA_ARGS__) @@ -147,6 +149,8 @@ void bt_err_ratelimited(const char *fmt, ...); #define bt_dev_dbg(hdev, fmt, ...) \ BT_DBG("%s: " fmt, (hdev)->name, ##__VA_ARGS__) +#define bt_dev_warn_ratelimited(hdev, fmt, ...) \ + bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) #define bt_dev_err_ratelimited(hdev, fmt, ...) \ BT_ERR_RATELIMITED("%s: " fmt, (hdev)->name, ##__VA_ARGS__) -- cgit v1.2.3 From 657cc646475b721f5c5bab82e7fd43302c7c8358 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Wed, 11 Dec 2019 11:34:36 +0100 Subject: Bluetooth: Remove usage of BT_ERR_RATELIMITED macro The macro is really not needed and can be replaced with either usage of bt_err_ratelimited or bt_dev_err_ratelimited. Signed-off-by: Marcel Holtmann Signed-off-by: Johan Hedberg --- include/net/bluetooth/bluetooth.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index bd2675266859..e42bb8e03c09 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -138,8 +138,6 @@ void bt_err_ratelimited(const char *fmt, ...); #define BT_ERR(fmt, ...) bt_err(fmt "\n", ##__VA_ARGS__) #define BT_DBG(fmt, ...) pr_debug(fmt "\n", ##__VA_ARGS__) -#define BT_ERR_RATELIMITED(fmt, ...) bt_err_ratelimited(fmt "\n", ##__VA_ARGS__) - #define bt_dev_info(hdev, fmt, ...) \ BT_INFO("%s: " fmt, (hdev)->name, ##__VA_ARGS__) #define bt_dev_warn(hdev, fmt, ...) \ @@ -152,7 +150,7 @@ void bt_err_ratelimited(const char *fmt, ...); #define bt_dev_warn_ratelimited(hdev, fmt, ...) \ bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) #define bt_dev_err_ratelimited(hdev, fmt, ...) \ - BT_ERR_RATELIMITED("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + bt_err_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) /* Connection and socket states */ enum { -- cgit v1.2.3 From 1efd927d660e6ab02a9cd32fbbe3c7dc47980132 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 2 Jan 2020 15:00:55 -0800 Subject: Bluetooth: Add support for LE PHY Update Complete event This handles LE PHY Update Complete event and store both tx_phy and rx_phy into hci_conn. Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Marcel Holtmann --- include/net/bluetooth/hci.h | 8 ++++++++ include/net/bluetooth/hci_core.h | 2 ++ 2 files changed, 10 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index 5bc1e30dedde..07b6ecedc6ce 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -2186,6 +2186,14 @@ struct hci_ev_le_direct_adv_info { __s8 rssi; } __packed; +#define HCI_EV_LE_PHY_UPDATE_COMPLETE 0x0c +struct hci_ev_le_phy_update_complete { + __u8 status; + __u16 handle; + __u8 tx_phy; + __u8 rx_phy; +} __packed; + #define HCI_EV_LE_EXT_ADV_REPORT 0x0d struct hci_ev_le_ext_adv_report { __le16 evt_type; diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index b689aceb636b..faebe3859931 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -493,6 +493,8 @@ struct hci_conn { __u16 le_supv_timeout; __u8 le_adv_data[HCI_MAX_AD_LENGTH]; __u8 le_adv_data_len; + __u8 le_tx_phy; + __u8 le_rx_phy; __s8 rssi; __s8 tx_power; __s8 max_tx_power; -- cgit v1.2.3 From 14a65084f9310ba6a4017c365f9c9820b099dde5 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Fri, 3 Jan 2020 18:11:30 +0100 Subject: net: ethernet: sxgbe: Rename Samsung to lowercase Fix up inconsistent usage of upper and lowercase letters in "Samsung" name. "SAMSUNG" is not an abbreviation but a regular trademarked name. Therefore it should be written with lowercase letters starting with capital letter. Although advertisement materials usually use uppercase "SAMSUNG", the lowercase version is used in all legal aspects (e.g. on Wikipedia and in privacy/legal statements on https://www.samsung.com/semiconductor/privacy-global/). Signed-off-by: Krzysztof Kozlowski Signed-off-by: David S. Miller --- include/linux/sxgbe_platform.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/sxgbe_platform.h b/include/linux/sxgbe_platform.h index 85ec745767bd..966146f7267a 100644 --- a/include/linux/sxgbe_platform.h +++ b/include/linux/sxgbe_platform.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * 10G controller driver for Samsung EXYNOS SoCs + * 10G controller driver for Samsung Exynos SoCs * * Copyright (C) 2013 Samsung Electronics Co., Ltd. * http://www.samsung.com -- cgit v1.2.3 From c114574ebfdf42f826776f717c8056a00fa94881 Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 3 Jan 2020 20:43:17 +0000 Subject: net: phy: add PHY_INTERFACE_MODE_10GBASER Recent discussion has revealed that the use of PHY_INTERFACE_MODE_10GKR is incorrect. Add a 10GBASE-R definition, document both the -R and -KR versions, and the fact that 10GKR was used incorrectly. Reviewed-by: Andrew Lunn Signed-off-by: Russell King Signed-off-by: David S. Miller --- include/linux/phy.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 30e599c454db..5932bb8e9c35 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -100,9 +100,11 @@ typedef enum { PHY_INTERFACE_MODE_2500BASEX, PHY_INTERFACE_MODE_RXAUI, PHY_INTERFACE_MODE_XAUI, - /* 10GBASE-KR, XFI, SFI - single lane 10G Serdes */ - PHY_INTERFACE_MODE_10GKR, + /* 10GBASE-R, XFI, SFI - single lane 10G Serdes */ + PHY_INTERFACE_MODE_10GBASER, PHY_INTERFACE_MODE_USXGMII, + /* 10GBASE-KR - with Clause 73 AN */ + PHY_INTERFACE_MODE_10GKR, PHY_INTERFACE_MODE_MAX, } phy_interface_t; @@ -176,10 +178,12 @@ static inline const char *phy_modes(phy_interface_t interface) return "rxaui"; case PHY_INTERFACE_MODE_XAUI: return "xaui"; - case PHY_INTERFACE_MODE_10GKR: - return "10gbase-kr"; + case PHY_INTERFACE_MODE_10GBASER: + return "10gbase-r"; case PHY_INTERFACE_MODE_USXGMII: return "usxgmii"; + case PHY_INTERFACE_MODE_10GKR: + return "10gbase-kr"; default: return "unknown"; } -- cgit v1.2.3 From 0a51826c6e05c5b6cc423b376b81c311e9e485b0 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Jan 2020 02:37:09 +0200 Subject: net: dsa: sja1105: Always send through management routes in slot 0 I finally found out how the 4 management route slots are supposed to be used, but.. it's not worth it. The description from the comment I've just deleted in this commit is still true: when more than 1 management slot is active at the same time, the switch will match frames incoming [from the CPU port] on the lowest numbered management slot that matches the frame's DMAC. My issue was that one was not supposed to statically assign each port a slot. Yes, there are 4 slots and also 4 non-CPU ports, but that is a mere coincidence. Instead, the switch can be used like this: every management frame gets a slot at the right of the most recently assigned slot: Send mgmt frame 1 through S0: S0 x x x Send mgmt frame 2 through S1: S0 S1 x x Send mgmt frame 3 through S2: S0 S1 S2 x Send mgmt frame 4 through S3: S0 S1 S2 S3 The difference compared to the old usage is that the transmission of frames 1-4 doesn't need to wait until the completion of the management route. It is safe to use a slot to the right of the most recently used one, because by protocol nobody will program a slot to your left and "steal" your route towards the correct egress port. So there is a potential throughput benefit here. But mgmt frame 5 has no more free slot to use, so it has to wait until _all_ of S0, S1, S2, S3 are full, in order to use S0 again. And that's actually exactly the problem: I was looking for something that would bring more predictable transmission latency, but this is exactly the opposite: 3 out of 4 frames would be transmitted quicker, but the 4th would draw the short straw and have a worse worst-case latency than before. Useless. Things are made even worse by PTP TX timestamping, which is something I won't go deeply into here. Suffice to say that the fact there is a driver-level lock on the SPI bus offsets any potential throughput gains that parallelism might bring. So there's no going back to the multi-slot scheme, remove the "mgmt_slot" variable from sja1105_port and the dummy static assignment made at probe time. While passing by, also remove the assignment to casc_port altogether. Don't pretend that we support cascaded setups. Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/sja1105.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h index c0b6a603ea8c..317e05b2584b 100644 --- a/include/linux/dsa/sja1105.h +++ b/include/linux/dsa/sja1105.h @@ -56,7 +56,6 @@ struct sja1105_port { struct sja1105_tagger_data *data; struct dsa_port *dp; bool hwts_tx_en; - int mgmt_slot; }; #endif /* _NET_DSA_SJA1105_H */ -- cgit v1.2.3 From a68578c20a9667463ee3000402b21644ea62d753 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Jan 2020 02:37:10 +0200 Subject: net: dsa: Make deferred_xmit private to sja1105 There are 3 things that are wrong with the DSA deferred xmit mechanism: 1. Its introduction has made the DSA hotpath ever so slightly more inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs to be initialized to false for every transmitted frame, in order to figure out whether the driver requested deferral or not (a very rare occasion, rare even for the only driver that does use this mechanism: sja1105). That was necessary to avoid kfree_skb from freeing the skb. 2. Because L2 PTP is a link-local protocol like STP, it requires management routes and deferred xmit with this switch. But as opposed to STP, the deferred work mechanism needs to schedule the packet rather quickly for the TX timstamp to be collected in time and sent to user space. But there is no provision for controlling the scheduling priority of this deferred xmit workqueue. Too bad this is a rather specific requirement for a feature that nobody else uses (more below). 3. Perhaps most importantly, it makes the DSA core adhere a bit too much to the NXP company-wide policy "Innovate Where It Doesn't Matter". The sja1105 is probably the only DSA switch that requires some frames sent from the CPU to be routed to the slave port via an out-of-band configuration (register write) rather than in-band (DSA tag). And there are indeed very good reasons to not want to do that: if that out-of-band register is at the other end of a slow bus such as SPI, then you limit that Ethernet flow's throughput to effectively the throughput of the SPI bus. So hardware vendors should definitely not be encouraged to design this way. We do _not_ want more widespread use of this mechanism. Luckily we have a solution for each of the 3 issues: For 1, we can just remove that variable in the skb->cb and counteract the effect of kfree_skb with skb_get, much to the same effect. The advantage, of course, being that anybody who doesn't use deferred xmit doesn't need to do any extra operation in the hotpath. For 2, we can create a kernel thread for each port's deferred xmit work. If the user switch ports are named swp0, swp1, swp2, the kernel threads will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15 character length limit on kernel thread names). With this, the user can change the scheduling priority with chrt $(pidof swp2_xmit). For 3, we can actually move the entire implementation to the sja1105 driver. So this patch deletes the generic implementation from the DSA core and adds a new one, more adequate to the requirements of PTP TX timestamping, in sja1105_main.c. Suggested-by: Florian Fainelli Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/sja1105.h | 3 +++ include/net/dsa.h | 9 --------- 2 files changed, 3 insertions(+), 9 deletions(-) (limited to 'include') diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h index 317e05b2584b..fa5735c353cd 100644 --- a/include/linux/dsa/sja1105.h +++ b/include/linux/dsa/sja1105.h @@ -53,6 +53,9 @@ struct sja1105_skb_cb { ((struct sja1105_skb_cb *)DSA_SKB_CB_PRIV(skb)) struct sja1105_port { + struct kthread_worker *xmit_worker; + struct kthread_work xmit_work; + struct sk_buff_head xmit_queue; struct sja1105_tagger_data *data; struct dsa_port *dp; bool hwts_tx_en; diff --git a/include/net/dsa.h b/include/net/dsa.h index da5578db228e..23b1c58656d4 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -90,7 +90,6 @@ struct dsa_device_ops { struct dsa_skb_cb { struct sk_buff *clone; - bool deferred_xmit; }; struct __dsa_skb_cb { @@ -192,9 +191,6 @@ struct dsa_port { struct phylink *pl; struct phylink_config pl_config; - struct work_struct xmit_work; - struct sk_buff_head xmit_queue; - struct list_head list; /* @@ -564,11 +560,6 @@ struct dsa_switch_ops { bool (*port_rxtstamp)(struct dsa_switch *ds, int port, struct sk_buff *skb, unsigned int type); - /* - * Deferred frame Tx - */ - netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port, - struct sk_buff *skb); /* Devlink parameters */ int (*devlink_param_get)(struct dsa_switch *ds, u32 id, struct devlink_param_gset_ctx *ctx); -- cgit v1.2.3 From 6c930994503d9b5bd34b1329427dd7d3d6d37cd4 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Jan 2020 03:34:09 +0200 Subject: mii: Add helpers for parsing SGMII auto-negotiation Typically a MAC PCS auto-configures itself after it receives the negotiated copper-side link settings from the PHY, but some MAC devices are more special and need manual interpretation of the SGMII AN result. In other cases, the PCS exposes the entire tx_config_reg base page as it is transmitted on the wire during auto-negotiation, so it makes sense to be able to decode the equivalent lp_advertised bit mask from the raw u16 (of course, "lp" considering the PCS to be the local PHY). Therefore, add the bit definitions for the SGMII registers 4 and 5 (local device ability, link partner ability), as well as a link_mode conversion helper that can be used to feed the AN results into phy_resolve_aneg_linkmode. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/mii.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/mii.h | 12 ++++++++++++ 2 files changed, 62 insertions(+) (limited to 'include') diff --git a/include/linux/mii.h b/include/linux/mii.h index 4ce8901a1af6..18c6208f56fc 100644 --- a/include/linux/mii.h +++ b/include/linux/mii.h @@ -372,6 +372,56 @@ static inline u32 mii_lpa_to_ethtool_lpa_x(u32 lpa) return result | mii_adv_to_ethtool_adv_x(lpa); } +/** + * mii_lpa_mod_linkmode_adv_sgmii + * @lp_advertising: pointer to destination link mode. + * @lpa: value of the MII_LPA register + * + * A small helper function that translates MII_LPA bits to + * linkmode advertisement settings for SGMII. + * Leaves other bits unchanged. + */ +static inline void +mii_lpa_mod_linkmode_lpa_sgmii(unsigned long *lp_advertising, u32 lpa) +{ + u32 speed_duplex = lpa & LPA_SGMII_DPX_SPD_MASK; + + linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT, lp_advertising, + speed_duplex == LPA_SGMII_1000HALF); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, lp_advertising, + speed_duplex == LPA_SGMII_1000FULL); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, lp_advertising, + speed_duplex == LPA_SGMII_100HALF); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, lp_advertising, + speed_duplex == LPA_SGMII_100FULL); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, lp_advertising, + speed_duplex == LPA_SGMII_10HALF); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, lp_advertising, + speed_duplex == LPA_SGMII_10FULL); +} + +/** + * mii_lpa_to_linkmode_adv_sgmii + * @advertising: pointer to destination link mode. + * @lpa: value of the MII_LPA register + * + * A small helper function that translates MII_ADVERTISE bits + * to linkmode advertisement settings when in SGMII mode. + * Clears the old value of advertising. + */ +static inline void mii_lpa_to_linkmode_lpa_sgmii(unsigned long *lp_advertising, + u32 lpa) +{ + linkmode_zero(lp_advertising); + + mii_lpa_mod_linkmode_lpa_sgmii(lp_advertising, lpa); +} + /** * mii_adv_mod_linkmode_adv_t * @advertising:pointer to destination link mode. diff --git a/include/uapi/linux/mii.h b/include/uapi/linux/mii.h index 51b48e4be1f2..0b9c3beda345 100644 --- a/include/uapi/linux/mii.h +++ b/include/uapi/linux/mii.h @@ -131,6 +131,18 @@ #define NWAYTEST_LOOPBACK 0x0100 /* Enable loopback for N-way */ #define NWAYTEST_RESV2 0xfe00 /* Unused... */ +/* MAC and PHY tx_config_Reg[15:0] for SGMII in-band auto-negotiation.*/ +#define ADVERTISE_SGMII 0x0001 /* MAC can do SGMII */ +#define LPA_SGMII 0x0001 /* PHY can do SGMII */ +#define LPA_SGMII_DPX_SPD_MASK 0x1C00 /* SGMII duplex and speed bits */ +#define LPA_SGMII_10HALF 0x0000 /* Can do 10mbps half-duplex */ +#define LPA_SGMII_10FULL 0x1000 /* Can do 10mbps full-duplex */ +#define LPA_SGMII_100HALF 0x0400 /* Can do 100mbps half-duplex */ +#define LPA_SGMII_100FULL 0x1400 /* Can do 100mbps full-duplex */ +#define LPA_SGMII_1000HALF 0x0800 /* Can do 1000mbps half-duplex */ +#define LPA_SGMII_1000FULL 0x1800 /* Can do 1000mbps full-duplex */ +#define LPA_SGMII_LINK 0x8000 /* PHY link with copper-side partner */ + /* 1000BASE-T Control register */ #define ADVERTISE_1000FULL 0x0200 /* Advertise 1000BASE-T full duplex */ #define ADVERTISE_1000HALF 0x0100 /* Advertise 1000BASE-T half duplex */ -- cgit v1.2.3 From 1511ed0a0167f523a84b4e727372a5d2ce1b6c2f Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Jan 2020 03:34:11 +0200 Subject: net: phylink: add support for polling MAC PCS Some MAC PCS blocks are unable to provide interrupts when their status changes. As we already have support in phylink for polling status, use this to provide a hook for MACs to enable polling mode. The patch idea was picked up from Russell King's suggestion on the macb phylink patch thread here [0] but the implementation was changed. Instead of introducing a new phylink_start_poll() function, which would make the implementation cumbersome for common PHYLINK implementations for multiple types of devices, like DSA, just add a boolean property to the phylink_config structure, which is just as backwards-compatible. https://lkml.org/lkml/2019/12/16/603 Suggested-by: Russell King Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/phylink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index fed5488e3c75..523209e70947 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -63,10 +63,12 @@ enum phylink_op_type { * struct phylink_config - PHYLINK configuration structure * @dev: a pointer to a struct device associated with the MAC * @type: operation type of PHYLINK instance + * @pcs_poll: MAC PCS cannot provide link change interrupt */ struct phylink_config { struct device *dev; enum phylink_op_type type; + bool pcs_poll; }; /** -- cgit v1.2.3 From 787cac3f5a650fd3184a41c5a27a2fe9ded833aa Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Jan 2020 03:34:12 +0200 Subject: net: dsa: Pass pcs_poll flag from driver to PHYLINK The DSA drivers that implement .phylink_mac_link_state should normally register an interrupt for the PCS, from which they should call phylink_mac_change(). However not all switches implement this, and those who don't should set this flag in dsa_switch in the .setup callback, so that PHYLINK will poll for a few ms until the in-band AN link timer expires and the PCS state settles. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/net/dsa.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/net/dsa.h b/include/net/dsa.h index 23b1c58656d4..0c39fed8cd99 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -279,6 +279,11 @@ struct dsa_switch { */ bool vlan_filtering; + /* MAC PCS does not provide link state change interrupt, and requires + * polling. Flag passed on to PHYLINK. + */ + bool pcs_poll; + size_t num_ports; }; -- cgit v1.2.3 From 6517798dd3432a0002109809bf74e4fcf9bb0c7d Mon Sep 17 00:00:00 2001 From: Claudiu Manoil Date: Mon, 6 Jan 2020 03:34:13 +0200 Subject: enetc: Make MDIO accessors more generic and export to include/linux/fsl Within the LS1028A SoC, the register map for the ENETC MDIO controller is instantiated a few times: for the central (external) MDIO controller, for the internal bus of each standalone ENETC port, and for the internal bus of the Felix switch. Refactoring is needed to support multiple MDIO buses from multiple drivers. The enetc_hw structure is made an opaque type and a smaller enetc_mdio_priv is created. 'mdio_base' - MDIO registers base address - is being parameterized, to be able to work with different MDIO register bases. The ENETC MDIO bus operations are exported from the fsl-enetc-mdio kernel object, the same that registers the central MDIO controller (the dedicated PF). The ENETC main driver has been changed to select it, and use its exported helpers to further register its private MDIO bus. The DSA Felix driver will do the same. Signed-off-by: Claudiu Manoil Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/fsl/enetc_mdio.h | 55 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 include/linux/fsl/enetc_mdio.h (limited to 'include') diff --git a/include/linux/fsl/enetc_mdio.h b/include/linux/fsl/enetc_mdio.h new file mode 100644 index 000000000000..4875dd38af7e --- /dev/null +++ b/include/linux/fsl/enetc_mdio.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) */ +/* Copyright 2019 NXP */ + +#ifndef _FSL_ENETC_MDIO_H_ +#define _FSL_ENETC_MDIO_H_ + +#include + +/* PCS registers */ +#define ENETC_PCS_LINK_TIMER1 0x12 +#define ENETC_PCS_LINK_TIMER1_VAL 0x06a0 +#define ENETC_PCS_LINK_TIMER2 0x13 +#define ENETC_PCS_LINK_TIMER2_VAL 0x0003 +#define ENETC_PCS_IF_MODE 0x14 +#define ENETC_PCS_IF_MODE_SGMII_EN BIT(0) +#define ENETC_PCS_IF_MODE_USE_SGMII_AN BIT(1) +#define ENETC_PCS_IF_MODE_SGMII_SPEED(x) (((x) << 2) & GENMASK(3, 2)) + +/* Not a mistake, the SerDes PLL needs to be set at 3.125 GHz by Reset + * Configuration Word (RCW, outside Linux control) for 2.5G SGMII mode. The PCS + * still thinks it's at gigabit. + */ +enum enetc_pcs_speed { + ENETC_PCS_SPEED_10 = 0, + ENETC_PCS_SPEED_100 = 1, + ENETC_PCS_SPEED_1000 = 2, + ENETC_PCS_SPEED_2500 = 2, +}; + +struct enetc_hw; + +struct enetc_mdio_priv { + struct enetc_hw *hw; + int mdio_base; +}; + +#if IS_REACHABLE(CONFIG_FSL_ENETC_MDIO) + +int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum); +int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum, u16 value); +struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs); + +#else + +static inline int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum) +{ return -EINVAL; } +static inline int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum, + u16 value) +{ return -EINVAL; } +struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs) +{ return ERR_PTR(-EINVAL); } + +#endif + +#endif -- cgit v1.2.3 From ee50d07c9fc8155b5a3c6c29eae1459a12cf2fb4 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Jan 2020 03:34:15 +0200 Subject: net: mscc: ocelot: make phy_mode a member of the common struct ocelot_port The Ocelot switchdev driver and the Felix DSA one need it for different reasons. Felix (or at least the VSC9959 instantiation in NXP LS1028A) is integrated with the traditional NXP Layerscape PCS design which does not support runtime configuration of SerDes protocol. So it needs to pre-validate the phy-mode from the device tree and prevent PHYLINK from attempting to change it. For this, it needs to cache it in a private variable. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/soc/mscc/ocelot.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 64cbbbe74a36..068f96b1a83e 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -420,6 +420,8 @@ struct ocelot_port { u8 ptp_cmd; struct sk_buff_head tx_skbs; u8 ts_id; + + phy_interface_t phy_mode; }; struct ocelot { -- cgit v1.2.3 From 964ee5c82b770c2d8a5ccefeee3384c1061ce3ae Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Jan 2020 03:34:16 +0200 Subject: net: mscc: ocelot: export ANA, DEV and QSYS registers to include/soc/mscc Since the Felix DSA driver is implementing its own PHYLINK instance due to SoC differences, it needs access to the few registers that are common, mainly for flow control. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/soc/mscc/ocelot_ana.h | 625 +++++++++++++++++++++++++++++++++++++++++ include/soc/mscc/ocelot_dev.h | 275 ++++++++++++++++++ include/soc/mscc/ocelot_qsys.h | 270 ++++++++++++++++++ 3 files changed, 1170 insertions(+) create mode 100644 include/soc/mscc/ocelot_ana.h create mode 100644 include/soc/mscc/ocelot_dev.h create mode 100644 include/soc/mscc/ocelot_qsys.h (limited to 'include') diff --git a/include/soc/mscc/ocelot_ana.h b/include/soc/mscc/ocelot_ana.h new file mode 100644 index 000000000000..841c6ec22b64 --- /dev/null +++ b/include/soc/mscc/ocelot_ana.h @@ -0,0 +1,625 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ +/* + * Microsemi Ocelot Switch driver + * + * Copyright (c) 2017 Microsemi Corporation + */ + +#ifndef _MSCC_OCELOT_ANA_H_ +#define _MSCC_OCELOT_ANA_H_ + +#define ANA_ANAGEFIL_B_DOM_EN BIT(22) +#define ANA_ANAGEFIL_B_DOM_VAL BIT(21) +#define ANA_ANAGEFIL_AGE_LOCKED BIT(20) +#define ANA_ANAGEFIL_PID_EN BIT(19) +#define ANA_ANAGEFIL_PID_VAL(x) (((x) << 14) & GENMASK(18, 14)) +#define ANA_ANAGEFIL_PID_VAL_M GENMASK(18, 14) +#define ANA_ANAGEFIL_PID_VAL_X(x) (((x) & GENMASK(18, 14)) >> 14) +#define ANA_ANAGEFIL_VID_EN BIT(13) +#define ANA_ANAGEFIL_VID_VAL(x) ((x) & GENMASK(12, 0)) +#define ANA_ANAGEFIL_VID_VAL_M GENMASK(12, 0) + +#define ANA_STORMLIMIT_CFG_RSZ 0x4 + +#define ANA_STORMLIMIT_CFG_STORM_RATE(x) (((x) << 3) & GENMASK(6, 3)) +#define ANA_STORMLIMIT_CFG_STORM_RATE_M GENMASK(6, 3) +#define ANA_STORMLIMIT_CFG_STORM_RATE_X(x) (((x) & GENMASK(6, 3)) >> 3) +#define ANA_STORMLIMIT_CFG_STORM_UNIT BIT(2) +#define ANA_STORMLIMIT_CFG_STORM_MODE(x) ((x) & GENMASK(1, 0)) +#define ANA_STORMLIMIT_CFG_STORM_MODE_M GENMASK(1, 0) + +#define ANA_AUTOAGE_AGE_FAST BIT(21) +#define ANA_AUTOAGE_AGE_PERIOD(x) (((x) << 1) & GENMASK(20, 1)) +#define ANA_AUTOAGE_AGE_PERIOD_M GENMASK(20, 1) +#define ANA_AUTOAGE_AGE_PERIOD_X(x) (((x) & GENMASK(20, 1)) >> 1) +#define ANA_AUTOAGE_AUTOAGE_LOCKED BIT(0) + +#define ANA_MACTOPTIONS_REDUCED_TABLE BIT(1) +#define ANA_MACTOPTIONS_SHADOW BIT(0) + +#define ANA_AGENCTRL_FID_MASK(x) (((x) << 12) & GENMASK(23, 12)) +#define ANA_AGENCTRL_FID_MASK_M GENMASK(23, 12) +#define ANA_AGENCTRL_FID_MASK_X(x) (((x) & GENMASK(23, 12)) >> 12) +#define ANA_AGENCTRL_IGNORE_DMAC_FLAGS BIT(11) +#define ANA_AGENCTRL_IGNORE_SMAC_FLAGS BIT(10) +#define ANA_AGENCTRL_FLOOD_SPECIAL BIT(9) +#define ANA_AGENCTRL_FLOOD_IGNORE_VLAN BIT(8) +#define ANA_AGENCTRL_MIRROR_CPU BIT(7) +#define ANA_AGENCTRL_LEARN_CPU_COPY BIT(6) +#define ANA_AGENCTRL_LEARN_FWD_KILL BIT(5) +#define ANA_AGENCTRL_LEARN_IGNORE_VLAN BIT(4) +#define ANA_AGENCTRL_CPU_CPU_KILL_ENA BIT(3) +#define ANA_AGENCTRL_GREEN_COUNT_MODE BIT(2) +#define ANA_AGENCTRL_YELLOW_COUNT_MODE BIT(1) +#define ANA_AGENCTRL_RED_COUNT_MODE BIT(0) + +#define ANA_FLOODING_RSZ 0x4 + +#define ANA_FLOODING_FLD_UNICAST(x) (((x) << 12) & GENMASK(17, 12)) +#define ANA_FLOODING_FLD_UNICAST_M GENMASK(17, 12) +#define ANA_FLOODING_FLD_UNICAST_X(x) (((x) & GENMASK(17, 12)) >> 12) +#define ANA_FLOODING_FLD_BROADCAST(x) (((x) << 6) & GENMASK(11, 6)) +#define ANA_FLOODING_FLD_BROADCAST_M GENMASK(11, 6) +#define ANA_FLOODING_FLD_BROADCAST_X(x) (((x) & GENMASK(11, 6)) >> 6) +#define ANA_FLOODING_FLD_MULTICAST(x) ((x) & GENMASK(5, 0)) +#define ANA_FLOODING_FLD_MULTICAST_M GENMASK(5, 0) + +#define ANA_FLOODING_IPMC_FLD_MC4_CTRL(x) (((x) << 18) & GENMASK(23, 18)) +#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_M GENMASK(23, 18) +#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_X(x) (((x) & GENMASK(23, 18)) >> 18) +#define ANA_FLOODING_IPMC_FLD_MC4_DATA(x) (((x) << 12) & GENMASK(17, 12)) +#define ANA_FLOODING_IPMC_FLD_MC4_DATA_M GENMASK(17, 12) +#define ANA_FLOODING_IPMC_FLD_MC4_DATA_X(x) (((x) & GENMASK(17, 12)) >> 12) +#define ANA_FLOODING_IPMC_FLD_MC6_CTRL(x) (((x) << 6) & GENMASK(11, 6)) +#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_M GENMASK(11, 6) +#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_X(x) (((x) & GENMASK(11, 6)) >> 6) +#define ANA_FLOODING_IPMC_FLD_MC6_DATA(x) ((x) & GENMASK(5, 0)) +#define ANA_FLOODING_IPMC_FLD_MC6_DATA_M GENMASK(5, 0) + +#define ANA_SFLOW_CFG_RSZ 0x4 + +#define ANA_SFLOW_CFG_SF_RATE(x) (((x) << 2) & GENMASK(13, 2)) +#define ANA_SFLOW_CFG_SF_RATE_M GENMASK(13, 2) +#define ANA_SFLOW_CFG_SF_RATE_X(x) (((x) & GENMASK(13, 2)) >> 2) +#define ANA_SFLOW_CFG_SF_SAMPLE_RX BIT(1) +#define ANA_SFLOW_CFG_SF_SAMPLE_TX BIT(0) + +#define ANA_PORT_MODE_RSZ 0x4 + +#define ANA_PORT_MODE_REDTAG_PARSE_CFG BIT(3) +#define ANA_PORT_MODE_VLAN_PARSE_CFG(x) (((x) << 1) & GENMASK(2, 1)) +#define ANA_PORT_MODE_VLAN_PARSE_CFG_M GENMASK(2, 1) +#define ANA_PORT_MODE_VLAN_PARSE_CFG_X(x) (((x) & GENMASK(2, 1)) >> 1) +#define ANA_PORT_MODE_L3_PARSE_CFG BIT(0) + +#define ANA_CUT_THRU_CFG_RSZ 0x4 + +#define ANA_PGID_PGID_RSZ 0x4 + +#define ANA_PGID_PGID_PGID(x) ((x) & GENMASK(11, 0)) +#define ANA_PGID_PGID_PGID_M GENMASK(11, 0) +#define ANA_PGID_PGID_CPUQ_DST_PGID(x) (((x) << 27) & GENMASK(29, 27)) +#define ANA_PGID_PGID_CPUQ_DST_PGID_M GENMASK(29, 27) +#define ANA_PGID_PGID_CPUQ_DST_PGID_X(x) (((x) & GENMASK(29, 27)) >> 27) + +#define ANA_TABLES_MACHDATA_VID(x) (((x) << 16) & GENMASK(28, 16)) +#define ANA_TABLES_MACHDATA_VID_M GENMASK(28, 16) +#define ANA_TABLES_MACHDATA_VID_X(x) (((x) & GENMASK(28, 16)) >> 16) +#define ANA_TABLES_MACHDATA_MACHDATA(x) ((x) & GENMASK(15, 0)) +#define ANA_TABLES_MACHDATA_MACHDATA_M GENMASK(15, 0) + +#define ANA_TABLES_STREAMDATA_SSID_VALID BIT(16) +#define ANA_TABLES_STREAMDATA_SSID(x) (((x) << 9) & GENMASK(15, 9)) +#define ANA_TABLES_STREAMDATA_SSID_M GENMASK(15, 9) +#define ANA_TABLES_STREAMDATA_SSID_X(x) (((x) & GENMASK(15, 9)) >> 9) +#define ANA_TABLES_STREAMDATA_SFID_VALID BIT(8) +#define ANA_TABLES_STREAMDATA_SFID(x) ((x) & GENMASK(7, 0)) +#define ANA_TABLES_STREAMDATA_SFID_M GENMASK(7, 0) + +#define ANA_TABLES_MACACCESS_MAC_CPU_COPY BIT(15) +#define ANA_TABLES_MACACCESS_SRC_KILL BIT(14) +#define ANA_TABLES_MACACCESS_IGNORE_VLAN BIT(13) +#define ANA_TABLES_MACACCESS_AGED_FLAG BIT(12) +#define ANA_TABLES_MACACCESS_VALID BIT(11) +#define ANA_TABLES_MACACCESS_ENTRYTYPE(x) (((x) << 9) & GENMASK(10, 9)) +#define ANA_TABLES_MACACCESS_ENTRYTYPE_M GENMASK(10, 9) +#define ANA_TABLES_MACACCESS_ENTRYTYPE_X(x) (((x) & GENMASK(10, 9)) >> 9) +#define ANA_TABLES_MACACCESS_DEST_IDX(x) (((x) << 3) & GENMASK(8, 3)) +#define ANA_TABLES_MACACCESS_DEST_IDX_M GENMASK(8, 3) +#define ANA_TABLES_MACACCESS_DEST_IDX_X(x) (((x) & GENMASK(8, 3)) >> 3) +#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD(x) ((x) & GENMASK(2, 0)) +#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD_M GENMASK(2, 0) +#define MACACCESS_CMD_IDLE 0 +#define MACACCESS_CMD_LEARN 1 +#define MACACCESS_CMD_FORGET 2 +#define MACACCESS_CMD_AGE 3 +#define MACACCESS_CMD_GET_NEXT 4 +#define MACACCESS_CMD_INIT 5 +#define MACACCESS_CMD_READ 6 +#define MACACCESS_CMD_WRITE 7 + +#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK(x) (((x) << 2) & GENMASK(13, 2)) +#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_M GENMASK(13, 2) +#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_X(x) (((x) & GENMASK(13, 2)) >> 2) +#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD(x) ((x) & GENMASK(1, 0)) +#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD_M GENMASK(1, 0) +#define ANA_TABLES_VLANACCESS_CMD_IDLE 0x0 +#define ANA_TABLES_VLANACCESS_CMD_WRITE 0x2 +#define ANA_TABLES_VLANACCESS_CMD_INIT 0x3 + +#define ANA_TABLES_VLANTIDX_VLAN_SEC_FWD_ENA BIT(17) +#define ANA_TABLES_VLANTIDX_VLAN_FLOOD_DIS BIT(16) +#define ANA_TABLES_VLANTIDX_VLAN_PRIV_VLAN BIT(15) +#define ANA_TABLES_VLANTIDX_VLAN_LEARN_DISABLED BIT(14) +#define ANA_TABLES_VLANTIDX_VLAN_MIRROR BIT(13) +#define ANA_TABLES_VLANTIDX_VLAN_SRC_CHK BIT(12) +#define ANA_TABLES_VLANTIDX_V_INDEX(x) ((x) & GENMASK(11, 0)) +#define ANA_TABLES_VLANTIDX_V_INDEX_M GENMASK(11, 0) + +#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK(x) (((x) << 2) & GENMASK(8, 2)) +#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_M GENMASK(8, 2) +#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_X(x) (((x) & GENMASK(8, 2)) >> 2) +#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD(x) ((x) & GENMASK(1, 0)) +#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD_M GENMASK(1, 0) + +#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI(x) (((x) << 21) & GENMASK(28, 21)) +#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_M GENMASK(28, 21) +#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_X(x) (((x) & GENMASK(28, 21)) >> 21) +#define ANA_TABLES_ISDXTIDX_ISDX_MSTI(x) (((x) << 15) & GENMASK(20, 15)) +#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_M GENMASK(20, 15) +#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_X(x) (((x) & GENMASK(20, 15)) >> 15) +#define ANA_TABLES_ISDXTIDX_ISDX_ES0_KEY_ENA BIT(14) +#define ANA_TABLES_ISDXTIDX_ISDX_FORCE_ENA BIT(10) +#define ANA_TABLES_ISDXTIDX_ISDX_INDEX(x) ((x) & GENMASK(7, 0)) +#define ANA_TABLES_ISDXTIDX_ISDX_INDEX_M GENMASK(7, 0) + +#define ANA_TABLES_ENTRYLIM_RSZ 0x4 + +#define ANA_TABLES_ENTRYLIM_ENTRYLIM(x) (((x) << 14) & GENMASK(17, 14)) +#define ANA_TABLES_ENTRYLIM_ENTRYLIM_M GENMASK(17, 14) +#define ANA_TABLES_ENTRYLIM_ENTRYLIM_X(x) (((x) & GENMASK(17, 14)) >> 14) +#define ANA_TABLES_ENTRYLIM_ENTRYSTAT(x) ((x) & GENMASK(13, 0)) +#define ANA_TABLES_ENTRYLIM_ENTRYSTAT_M GENMASK(13, 0) + +#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM(x) (((x) << 4) & GENMASK(31, 4)) +#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_M GENMASK(31, 4) +#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_X(x) (((x) & GENMASK(31, 4)) >> 4) +#define ANA_TABLES_STREAMACCESS_SEQ_GEN_REC_ENA BIT(3) +#define ANA_TABLES_STREAMACCESS_GEN_REC_TYPE BIT(2) +#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD(x) ((x) & GENMASK(1, 0)) +#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD_M GENMASK(1, 0) + +#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS(x) (((x) << 30) & GENMASK(31, 30)) +#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_M GENMASK(31, 30) +#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_X(x) (((x) & GENMASK(31, 30)) >> 30) +#define ANA_TABLES_STREAMTIDX_S_INDEX(x) (((x) << 16) & GENMASK(22, 16)) +#define ANA_TABLES_STREAMTIDX_S_INDEX_M GENMASK(22, 16) +#define ANA_TABLES_STREAMTIDX_S_INDEX_X(x) (((x) & GENMASK(22, 16)) >> 16) +#define ANA_TABLES_STREAMTIDX_FORCE_SF_BEHAVIOUR BIT(14) +#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN(x) (((x) << 8) & GENMASK(13, 8)) +#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_M GENMASK(13, 8) +#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_X(x) (((x) & GENMASK(13, 8)) >> 8) +#define ANA_TABLES_STREAMTIDX_RESET_ON_ROGUE BIT(7) +#define ANA_TABLES_STREAMTIDX_REDTAG_POP BIT(6) +#define ANA_TABLES_STREAMTIDX_STREAM_SPLIT BIT(5) +#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2(x) ((x) & GENMASK(4, 0)) +#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2_M GENMASK(4, 0) + +#define ANA_TABLES_SEQ_MASK_SPLIT_MASK(x) (((x) << 16) & GENMASK(22, 16)) +#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_M GENMASK(22, 16) +#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_X(x) (((x) & GENMASK(22, 16)) >> 16) +#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK(x) ((x) & GENMASK(6, 0)) +#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK_M GENMASK(6, 0) + +#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK(x) (((x) << 1) & GENMASK(7, 1)) +#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_M GENMASK(7, 1) +#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_X(x) (((x) & GENMASK(7, 1)) >> 1) +#define ANA_TABLES_SFID_MASK_IGR_SRCPORT_MATCH_ENA BIT(0) + +#define ANA_TABLES_SFIDACCESS_IGR_PRIO_MATCH_ENA BIT(22) +#define ANA_TABLES_SFIDACCESS_IGR_PRIO(x) (((x) << 19) & GENMASK(21, 19)) +#define ANA_TABLES_SFIDACCESS_IGR_PRIO_M GENMASK(21, 19) +#define ANA_TABLES_SFIDACCESS_IGR_PRIO_X(x) (((x) & GENMASK(21, 19)) >> 19) +#define ANA_TABLES_SFIDACCESS_FORCE_BLOCK BIT(18) +#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN(x) (((x) << 2) & GENMASK(17, 2)) +#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_M GENMASK(17, 2) +#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_X(x) (((x) & GENMASK(17, 2)) >> 2) +#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(x) ((x) & GENMASK(1, 0)) +#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD_M GENMASK(1, 0) + +#define ANA_TABLES_SFIDTIDX_SGID_VALID BIT(26) +#define ANA_TABLES_SFIDTIDX_SGID(x) (((x) << 18) & GENMASK(25, 18)) +#define ANA_TABLES_SFIDTIDX_SGID_M GENMASK(25, 18) +#define ANA_TABLES_SFIDTIDX_SGID_X(x) (((x) & GENMASK(25, 18)) >> 18) +#define ANA_TABLES_SFIDTIDX_POL_ENA BIT(17) +#define ANA_TABLES_SFIDTIDX_POL_IDX(x) (((x) << 8) & GENMASK(16, 8)) +#define ANA_TABLES_SFIDTIDX_POL_IDX_M GENMASK(16, 8) +#define ANA_TABLES_SFIDTIDX_POL_IDX_X(x) (((x) & GENMASK(16, 8)) >> 8) +#define ANA_TABLES_SFIDTIDX_SFID_INDEX(x) ((x) & GENMASK(7, 0)) +#define ANA_TABLES_SFIDTIDX_SFID_INDEX_M GENMASK(7, 0) + +#define ANA_MSTI_STATE_RSZ 0x4 + +#define ANA_OAM_UPM_LM_CNT_RSZ 0x4 + +#define ANA_SG_ACCESS_CTRL_SGID(x) ((x) & GENMASK(7, 0)) +#define ANA_SG_ACCESS_CTRL_SGID_M GENMASK(7, 0) +#define ANA_SG_ACCESS_CTRL_CONFIG_CHANGE BIT(28) + +#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0)) +#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0) +#define ANA_SG_CONFIG_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(18, 16)) +#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_M GENMASK(18, 16) +#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(18, 16)) >> 16) +#define ANA_SG_CONFIG_REG_3_GATE_ENABLE BIT(20) +#define ANA_SG_CONFIG_REG_3_INIT_IPS(x) (((x) << 24) & GENMASK(27, 24)) +#define ANA_SG_CONFIG_REG_3_INIT_IPS_M GENMASK(27, 24) +#define ANA_SG_CONFIG_REG_3_INIT_IPS_X(x) (((x) & GENMASK(27, 24)) >> 24) +#define ANA_SG_CONFIG_REG_3_INIT_GATE_STATE BIT(28) + +#define ANA_SG_GCL_GS_CONFIG_RSZ 0x4 + +#define ANA_SG_GCL_GS_CONFIG_IPS(x) ((x) & GENMASK(3, 0)) +#define ANA_SG_GCL_GS_CONFIG_IPS_M GENMASK(3, 0) +#define ANA_SG_GCL_GS_CONFIG_GATE_STATE BIT(4) + +#define ANA_SG_GCL_TI_CONFIG_RSZ 0x4 + +#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0)) +#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB_M GENMASK(15, 0) +#define ANA_SG_STATUS_REG_3_GATE_STATE BIT(16) +#define ANA_SG_STATUS_REG_3_IPS(x) (((x) << 20) & GENMASK(23, 20)) +#define ANA_SG_STATUS_REG_3_IPS_M GENMASK(23, 20) +#define ANA_SG_STATUS_REG_3_IPS_X(x) (((x) & GENMASK(23, 20)) >> 20) +#define ANA_SG_STATUS_REG_3_CONFIG_PENDING BIT(24) + +#define ANA_PORT_VLAN_CFG_GSZ 0x100 + +#define ANA_PORT_VLAN_CFG_VLAN_VID_AS_ISDX BIT(21) +#define ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA BIT(20) +#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT(x) (((x) << 18) & GENMASK(19, 18)) +#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_M GENMASK(19, 18) +#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_X(x) (((x) & GENMASK(19, 18)) >> 18) +#define ANA_PORT_VLAN_CFG_VLAN_INNER_TAG_ENA BIT(17) +#define ANA_PORT_VLAN_CFG_VLAN_TAG_TYPE BIT(16) +#define ANA_PORT_VLAN_CFG_VLAN_DEI BIT(15) +#define ANA_PORT_VLAN_CFG_VLAN_PCP(x) (((x) << 12) & GENMASK(14, 12)) +#define ANA_PORT_VLAN_CFG_VLAN_PCP_M GENMASK(14, 12) +#define ANA_PORT_VLAN_CFG_VLAN_PCP_X(x) (((x) & GENMASK(14, 12)) >> 12) +#define ANA_PORT_VLAN_CFG_VLAN_VID(x) ((x) & GENMASK(11, 0)) +#define ANA_PORT_VLAN_CFG_VLAN_VID_M GENMASK(11, 0) + +#define ANA_PORT_DROP_CFG_GSZ 0x100 + +#define ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA BIT(6) +#define ANA_PORT_DROP_CFG_DROP_S_TAGGED_ENA BIT(5) +#define ANA_PORT_DROP_CFG_DROP_C_TAGGED_ENA BIT(4) +#define ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA BIT(3) +#define ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA BIT(2) +#define ANA_PORT_DROP_CFG_DROP_NULL_MAC_ENA BIT(1) +#define ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA BIT(0) + +#define ANA_PORT_QOS_CFG_GSZ 0x100 + +#define ANA_PORT_QOS_CFG_DP_DEFAULT_VAL BIT(8) +#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL(x) (((x) << 5) & GENMASK(7, 5)) +#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_M GENMASK(7, 5) +#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_X(x) (((x) & GENMASK(7, 5)) >> 5) +#define ANA_PORT_QOS_CFG_QOS_DSCP_ENA BIT(4) +#define ANA_PORT_QOS_CFG_QOS_PCP_ENA BIT(3) +#define ANA_PORT_QOS_CFG_DSCP_TRANSLATE_ENA BIT(2) +#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG(x) ((x) & GENMASK(1, 0)) +#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG_M GENMASK(1, 0) + +#define ANA_PORT_VCAP_CFG_GSZ 0x100 + +#define ANA_PORT_VCAP_CFG_S1_ENA BIT(14) +#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA(x) (((x) << 11) & GENMASK(13, 11)) +#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_M GENMASK(13, 11) +#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_X(x) (((x) & GENMASK(13, 11)) >> 11) +#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA(x) (((x) << 8) & GENMASK(10, 8)) +#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_M GENMASK(10, 8) +#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_X(x) (((x) & GENMASK(10, 8)) >> 8) +#define ANA_PORT_VCAP_CFG_PAG_VAL(x) ((x) & GENMASK(7, 0)) +#define ANA_PORT_VCAP_CFG_PAG_VAL_M GENMASK(7, 0) + +#define ANA_PORT_VCAP_S1_KEY_CFG_GSZ 0x100 +#define ANA_PORT_VCAP_S1_KEY_CFG_RSZ 0x4 + +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG(x) (((x) << 4) & GENMASK(6, 4)) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_M GENMASK(6, 4) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_X(x) (((x) & GENMASK(6, 4)) >> 4) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG(x) (((x) << 2) & GENMASK(3, 2)) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_M GENMASK(3, 2) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_X(x) (((x) & GENMASK(3, 2)) >> 2) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG(x) ((x) & GENMASK(1, 0)) +#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG_M GENMASK(1, 0) + +#define ANA_PORT_VCAP_S2_CFG_GSZ 0x100 + +#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA(x) (((x) << 17) & GENMASK(18, 17)) +#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_M GENMASK(18, 17) +#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_X(x) (((x) & GENMASK(18, 17)) >> 17) +#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA(x) (((x) << 15) & GENMASK(16, 15)) +#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_M GENMASK(16, 15) +#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_X(x) (((x) & GENMASK(16, 15)) >> 15) +#define ANA_PORT_VCAP_S2_CFG_S2_ENA BIT(14) +#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS(x) (((x) << 12) & GENMASK(13, 12)) +#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_M GENMASK(13, 12) +#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_X(x) (((x) & GENMASK(13, 12)) >> 12) +#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS(x) (((x) << 10) & GENMASK(11, 10)) +#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_M GENMASK(11, 10) +#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_X(x) (((x) & GENMASK(11, 10)) >> 10) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS(x) (((x) << 8) & GENMASK(9, 8)) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_M GENMASK(9, 8) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_X(x) (((x) & GENMASK(9, 8)) >> 8) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS(x) (((x) << 6) & GENMASK(7, 6)) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_M GENMASK(7, 6) +#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_X(x) (((x) & GENMASK(7, 6)) >> 6) +#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG(x) (((x) << 2) & GENMASK(5, 2)) +#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_M GENMASK(5, 2) +#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_X(x) (((x) & GENMASK(5, 2)) >> 2) +#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS(x) ((x) & GENMASK(1, 0)) +#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS_M GENMASK(1, 0) + +#define ANA_PORT_PCP_DEI_MAP_GSZ 0x100 +#define ANA_PORT_PCP_DEI_MAP_RSZ 0x4 + +#define ANA_PORT_PCP_DEI_MAP_DP_PCP_DEI_VAL BIT(3) +#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL(x) ((x) & GENMASK(2, 0)) +#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL_M GENMASK(2, 0) + +#define ANA_PORT_CPU_FWD_CFG_GSZ 0x100 + +#define ANA_PORT_CPU_FWD_CFG_CPU_VRAP_REDIR_ENA BIT(7) +#define ANA_PORT_CPU_FWD_CFG_CPU_MLD_REDIR_ENA BIT(6) +#define ANA_PORT_CPU_FWD_CFG_CPU_IGMP_REDIR_ENA BIT(5) +#define ANA_PORT_CPU_FWD_CFG_CPU_IPMC_CTRL_COPY_ENA BIT(4) +#define ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA BIT(3) +#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_DROP_ENA BIT(2) +#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_REDIR_ENA BIT(1) +#define ANA_PORT_CPU_FWD_CFG_CPU_OAM_ENA BIT(0) + +#define ANA_PORT_CPU_FWD_BPDU_CFG_GSZ 0x100 + +#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16)) +#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_M GENMASK(31, 16) +#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(x) ((x) & GENMASK(15, 0)) +#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA_M GENMASK(15, 0) + +#define ANA_PORT_CPU_FWD_GARP_CFG_GSZ 0x100 + +#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16)) +#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_M GENMASK(31, 16) +#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA(x) ((x) & GENMASK(15, 0)) +#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA_M GENMASK(15, 0) + +#define ANA_PORT_CPU_FWD_CCM_CFG_GSZ 0x100 + +#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16)) +#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_M GENMASK(31, 16) +#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA(x) ((x) & GENMASK(15, 0)) +#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA_M GENMASK(15, 0) + +#define ANA_PORT_PORT_CFG_GSZ 0x100 + +#define ANA_PORT_PORT_CFG_SRC_MIRROR_ENA BIT(15) +#define ANA_PORT_PORT_CFG_LIMIT_DROP BIT(14) +#define ANA_PORT_PORT_CFG_LIMIT_CPU BIT(13) +#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_DROP BIT(12) +#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_CPU BIT(11) +#define ANA_PORT_PORT_CFG_LEARNDROP BIT(10) +#define ANA_PORT_PORT_CFG_LEARNCPU BIT(9) +#define ANA_PORT_PORT_CFG_LEARNAUTO BIT(8) +#define ANA_PORT_PORT_CFG_LEARN_ENA BIT(7) +#define ANA_PORT_PORT_CFG_RECV_ENA BIT(6) +#define ANA_PORT_PORT_CFG_PORTID_VAL(x) (((x) << 2) & GENMASK(5, 2)) +#define ANA_PORT_PORT_CFG_PORTID_VAL_M GENMASK(5, 2) +#define ANA_PORT_PORT_CFG_PORTID_VAL_X(x) (((x) & GENMASK(5, 2)) >> 2) +#define ANA_PORT_PORT_CFG_USE_B_DOM_TBL BIT(1) +#define ANA_PORT_PORT_CFG_LSR_MODE BIT(0) + +#define ANA_PORT_POL_CFG_GSZ 0x100 + +#define ANA_PORT_POL_CFG_POL_CPU_REDIR_8021 BIT(19) +#define ANA_PORT_POL_CFG_POL_CPU_REDIR_IP BIT(18) +#define ANA_PORT_POL_CFG_PORT_POL_ENA BIT(17) +#define ANA_PORT_POL_CFG_QUEUE_POL_ENA(x) (((x) << 9) & GENMASK(16, 9)) +#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_M GENMASK(16, 9) +#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_X(x) (((x) & GENMASK(16, 9)) >> 9) +#define ANA_PORT_POL_CFG_POL_ORDER(x) ((x) & GENMASK(8, 0)) +#define ANA_PORT_POL_CFG_POL_ORDER_M GENMASK(8, 0) + +#define ANA_PORT_PTP_CFG_GSZ 0x100 + +#define ANA_PORT_PTP_CFG_PTP_BACKPLANE_MODE BIT(0) + +#define ANA_PORT_PTP_DLY1_CFG_GSZ 0x100 + +#define ANA_PORT_PTP_DLY2_CFG_GSZ 0x100 + +#define ANA_PORT_SFID_CFG_GSZ 0x100 +#define ANA_PORT_SFID_CFG_RSZ 0x4 + +#define ANA_PORT_SFID_CFG_SFID_VALID BIT(8) +#define ANA_PORT_SFID_CFG_SFID(x) ((x) & GENMASK(7, 0)) +#define ANA_PORT_SFID_CFG_SFID_M GENMASK(7, 0) + +#define ANA_PFC_PFC_CFG_GSZ 0x40 + +#define ANA_PFC_PFC_CFG_RX_PFC_ENA(x) (((x) << 2) & GENMASK(9, 2)) +#define ANA_PFC_PFC_CFG_RX_PFC_ENA_M GENMASK(9, 2) +#define ANA_PFC_PFC_CFG_RX_PFC_ENA_X(x) (((x) & GENMASK(9, 2)) >> 2) +#define ANA_PFC_PFC_CFG_FC_LINK_SPEED(x) ((x) & GENMASK(1, 0)) +#define ANA_PFC_PFC_CFG_FC_LINK_SPEED_M GENMASK(1, 0) + +#define ANA_PFC_PFC_TIMER_GSZ 0x40 +#define ANA_PFC_PFC_TIMER_RSZ 0x4 + +#define ANA_IPT_OAM_MEP_CFG_GSZ 0x8 + +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P(x) (((x) << 6) & GENMASK(10, 6)) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_M GENMASK(10, 6) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_X(x) (((x) & GENMASK(10, 6)) >> 6) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX(x) (((x) << 1) & GENMASK(5, 1)) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_M GENMASK(5, 1) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_X(x) (((x) & GENMASK(5, 1)) >> 1) +#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_ENA BIT(0) + +#define ANA_IPT_IPT_GSZ 0x8 + +#define ANA_IPT_IPT_IPT_CFG(x) (((x) << 15) & GENMASK(16, 15)) +#define ANA_IPT_IPT_IPT_CFG_M GENMASK(16, 15) +#define ANA_IPT_IPT_IPT_CFG_X(x) (((x) & GENMASK(16, 15)) >> 15) +#define ANA_IPT_IPT_ISDX_P(x) (((x) << 7) & GENMASK(14, 7)) +#define ANA_IPT_IPT_ISDX_P_M GENMASK(14, 7) +#define ANA_IPT_IPT_ISDX_P_X(x) (((x) & GENMASK(14, 7)) >> 7) +#define ANA_IPT_IPT_PPT_IDX(x) ((x) & GENMASK(6, 0)) +#define ANA_IPT_IPT_PPT_IDX_M GENMASK(6, 0) + +#define ANA_PPT_PPT_RSZ 0x4 + +#define ANA_FID_MAP_FID_MAP_RSZ 0x4 + +#define ANA_FID_MAP_FID_MAP_FID_C_VAL(x) (((x) << 6) & GENMASK(11, 6)) +#define ANA_FID_MAP_FID_MAP_FID_C_VAL_M GENMASK(11, 6) +#define ANA_FID_MAP_FID_MAP_FID_C_VAL_X(x) (((x) & GENMASK(11, 6)) >> 6) +#define ANA_FID_MAP_FID_MAP_FID_B_VAL(x) ((x) & GENMASK(5, 0)) +#define ANA_FID_MAP_FID_MAP_FID_B_VAL_M GENMASK(5, 0) + +#define ANA_AGGR_CFG_AC_RND_ENA BIT(7) +#define ANA_AGGR_CFG_AC_DMAC_ENA BIT(6) +#define ANA_AGGR_CFG_AC_SMAC_ENA BIT(5) +#define ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA BIT(4) +#define ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA BIT(3) +#define ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA BIT(2) +#define ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA BIT(1) +#define ANA_AGGR_CFG_AC_ISDX_ENA BIT(0) + +#define ANA_CPUQ_CFG_CPUQ_MLD(x) (((x) << 27) & GENMASK(29, 27)) +#define ANA_CPUQ_CFG_CPUQ_MLD_M GENMASK(29, 27) +#define ANA_CPUQ_CFG_CPUQ_MLD_X(x) (((x) & GENMASK(29, 27)) >> 27) +#define ANA_CPUQ_CFG_CPUQ_IGMP(x) (((x) << 24) & GENMASK(26, 24)) +#define ANA_CPUQ_CFG_CPUQ_IGMP_M GENMASK(26, 24) +#define ANA_CPUQ_CFG_CPUQ_IGMP_X(x) (((x) & GENMASK(26, 24)) >> 24) +#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL(x) (((x) << 21) & GENMASK(23, 21)) +#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_M GENMASK(23, 21) +#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_X(x) (((x) & GENMASK(23, 21)) >> 21) +#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE(x) (((x) << 18) & GENMASK(20, 18)) +#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_M GENMASK(20, 18) +#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_X(x) (((x) & GENMASK(20, 18)) >> 18) +#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE(x) (((x) << 15) & GENMASK(17, 15)) +#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_M GENMASK(17, 15) +#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_X(x) (((x) & GENMASK(17, 15)) >> 15) +#define ANA_CPUQ_CFG_CPUQ_SRC_COPY(x) (((x) << 12) & GENMASK(14, 12)) +#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_M GENMASK(14, 12) +#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_X(x) (((x) & GENMASK(14, 12)) >> 12) +#define ANA_CPUQ_CFG_CPUQ_MAC_COPY(x) (((x) << 9) & GENMASK(11, 9)) +#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_M GENMASK(11, 9) +#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_X(x) (((x) & GENMASK(11, 9)) >> 9) +#define ANA_CPUQ_CFG_CPUQ_LRN(x) (((x) << 6) & GENMASK(8, 6)) +#define ANA_CPUQ_CFG_CPUQ_LRN_M GENMASK(8, 6) +#define ANA_CPUQ_CFG_CPUQ_LRN_X(x) (((x) & GENMASK(8, 6)) >> 6) +#define ANA_CPUQ_CFG_CPUQ_MIRROR(x) (((x) << 3) & GENMASK(5, 3)) +#define ANA_CPUQ_CFG_CPUQ_MIRROR_M GENMASK(5, 3) +#define ANA_CPUQ_CFG_CPUQ_MIRROR_X(x) (((x) & GENMASK(5, 3)) >> 3) +#define ANA_CPUQ_CFG_CPUQ_SFLOW(x) ((x) & GENMASK(2, 0)) +#define ANA_CPUQ_CFG_CPUQ_SFLOW_M GENMASK(2, 0) + +#define ANA_CPUQ_8021_CFG_RSZ 0x4 + +#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL(x) (((x) << 6) & GENMASK(8, 6)) +#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_M GENMASK(8, 6) +#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_X(x) (((x) & GENMASK(8, 6)) >> 6) +#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL(x) (((x) << 3) & GENMASK(5, 3)) +#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_M GENMASK(5, 3) +#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_X(x) (((x) & GENMASK(5, 3)) >> 3) +#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL(x) ((x) & GENMASK(2, 0)) +#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL_M GENMASK(2, 0) + +#define ANA_DSCP_CFG_RSZ 0x4 + +#define ANA_DSCP_CFG_DP_DSCP_VAL BIT(11) +#define ANA_DSCP_CFG_QOS_DSCP_VAL(x) (((x) << 8) & GENMASK(10, 8)) +#define ANA_DSCP_CFG_QOS_DSCP_VAL_M GENMASK(10, 8) +#define ANA_DSCP_CFG_QOS_DSCP_VAL_X(x) (((x) & GENMASK(10, 8)) >> 8) +#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL(x) (((x) << 2) & GENMASK(7, 2)) +#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_M GENMASK(7, 2) +#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_X(x) (((x) & GENMASK(7, 2)) >> 2) +#define ANA_DSCP_CFG_DSCP_TRUST_ENA BIT(1) +#define ANA_DSCP_CFG_DSCP_REWR_ENA BIT(0) + +#define ANA_DSCP_REWR_CFG_RSZ 0x4 + +#define ANA_VCAP_RNG_TYPE_CFG_RSZ 0x4 + +#define ANA_VCAP_RNG_VAL_CFG_RSZ 0x4 + +#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL(x) (((x) << 16) & GENMASK(31, 16)) +#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_M GENMASK(31, 16) +#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL(x) ((x) & GENMASK(15, 0)) +#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL_M GENMASK(15, 0) + +#define ANA_VRAP_CFG_VRAP_VLAN_AWARE_ENA BIT(12) +#define ANA_VRAP_CFG_VRAP_VID(x) ((x) & GENMASK(11, 0)) +#define ANA_VRAP_CFG_VRAP_VID_M GENMASK(11, 0) + +#define ANA_DISCARD_CFG_DROP_TAGGING_ISDX0 BIT(3) +#define ANA_DISCARD_CFG_DROP_CTRLPROT_ISDX0 BIT(2) +#define ANA_DISCARD_CFG_DROP_TAGGING_S2_ENA BIT(1) +#define ANA_DISCARD_CFG_DROP_CTRLPROT_S2_ENA BIT(0) + +#define ANA_FID_CFG_VID_MC_ENA BIT(0) + +#define ANA_POL_PIR_CFG_GSZ 0x20 + +#define ANA_POL_PIR_CFG_PIR_RATE(x) (((x) << 6) & GENMASK(20, 6)) +#define ANA_POL_PIR_CFG_PIR_RATE_M GENMASK(20, 6) +#define ANA_POL_PIR_CFG_PIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6) +#define ANA_POL_PIR_CFG_PIR_BURST(x) ((x) & GENMASK(5, 0)) +#define ANA_POL_PIR_CFG_PIR_BURST_M GENMASK(5, 0) + +#define ANA_POL_CIR_CFG_GSZ 0x20 + +#define ANA_POL_CIR_CFG_CIR_RATE(x) (((x) << 6) & GENMASK(20, 6)) +#define ANA_POL_CIR_CFG_CIR_RATE_M GENMASK(20, 6) +#define ANA_POL_CIR_CFG_CIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6) +#define ANA_POL_CIR_CFG_CIR_BURST(x) ((x) & GENMASK(5, 0)) +#define ANA_POL_CIR_CFG_CIR_BURST_M GENMASK(5, 0) + +#define ANA_POL_MODE_CFG_GSZ 0x20 + +#define ANA_POL_MODE_CFG_IPG_SIZE(x) (((x) << 5) & GENMASK(9, 5)) +#define ANA_POL_MODE_CFG_IPG_SIZE_M GENMASK(9, 5) +#define ANA_POL_MODE_CFG_IPG_SIZE_X(x) (((x) & GENMASK(9, 5)) >> 5) +#define ANA_POL_MODE_CFG_FRM_MODE(x) (((x) << 3) & GENMASK(4, 3)) +#define ANA_POL_MODE_CFG_FRM_MODE_M GENMASK(4, 3) +#define ANA_POL_MODE_CFG_FRM_MODE_X(x) (((x) & GENMASK(4, 3)) >> 3) +#define ANA_POL_MODE_CFG_DLB_COUPLED BIT(2) +#define ANA_POL_MODE_CFG_CIR_ENA BIT(1) +#define ANA_POL_MODE_CFG_OVERSHOOT_ENA BIT(0) + +#define ANA_POL_PIR_STATE_GSZ 0x20 + +#define ANA_POL_CIR_STATE_GSZ 0x20 + +#define ANA_POL_STATE_GSZ 0x20 + +#define ANA_POL_FLOWC_RSZ 0x4 + +#define ANA_POL_FLOWC_POL_FLOWC BIT(0) + +#define ANA_POL_HYST_POL_FC_HYST(x) (((x) << 4) & GENMASK(9, 4)) +#define ANA_POL_HYST_POL_FC_HYST_M GENMASK(9, 4) +#define ANA_POL_HYST_POL_FC_HYST_X(x) (((x) & GENMASK(9, 4)) >> 4) +#define ANA_POL_HYST_POL_STOP_HYST(x) ((x) & GENMASK(3, 0)) +#define ANA_POL_HYST_POL_STOP_HYST_M GENMASK(3, 0) + +#define ANA_POL_MISC_CFG_POL_CLOSE_ALL BIT(1) +#define ANA_POL_MISC_CFG_POL_LEAK_DIS BIT(0) + +#endif diff --git a/include/soc/mscc/ocelot_dev.h b/include/soc/mscc/ocelot_dev.h new file mode 100644 index 000000000000..0a50d53bbd3f --- /dev/null +++ b/include/soc/mscc/ocelot_dev.h @@ -0,0 +1,275 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ +/* + * Microsemi Ocelot Switch driver + * + * Copyright (c) 2017 Microsemi Corporation + */ + +#ifndef _MSCC_OCELOT_DEV_H_ +#define _MSCC_OCELOT_DEV_H_ + +#define DEV_CLOCK_CFG 0x0 + +#define DEV_CLOCK_CFG_MAC_TX_RST BIT(7) +#define DEV_CLOCK_CFG_MAC_RX_RST BIT(6) +#define DEV_CLOCK_CFG_PCS_TX_RST BIT(5) +#define DEV_CLOCK_CFG_PCS_RX_RST BIT(4) +#define DEV_CLOCK_CFG_PORT_RST BIT(3) +#define DEV_CLOCK_CFG_PHY_RST BIT(2) +#define DEV_CLOCK_CFG_LINK_SPEED(x) ((x) & GENMASK(1, 0)) +#define DEV_CLOCK_CFG_LINK_SPEED_M GENMASK(1, 0) + +#define DEV_PORT_MISC 0x4 + +#define DEV_PORT_MISC_FWD_ERROR_ENA BIT(4) +#define DEV_PORT_MISC_FWD_PAUSE_ENA BIT(3) +#define DEV_PORT_MISC_FWD_CTRL_ENA BIT(2) +#define DEV_PORT_MISC_DEV_LOOP_ENA BIT(1) +#define DEV_PORT_MISC_HDX_FAST_DIS BIT(0) + +#define DEV_EVENTS 0x8 + +#define DEV_EEE_CFG 0xc + +#define DEV_EEE_CFG_EEE_ENA BIT(22) +#define DEV_EEE_CFG_EEE_TIMER_AGE(x) (((x) << 15) & GENMASK(21, 15)) +#define DEV_EEE_CFG_EEE_TIMER_AGE_M GENMASK(21, 15) +#define DEV_EEE_CFG_EEE_TIMER_AGE_X(x) (((x) & GENMASK(21, 15)) >> 15) +#define DEV_EEE_CFG_EEE_TIMER_WAKEUP(x) (((x) << 8) & GENMASK(14, 8)) +#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_M GENMASK(14, 8) +#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_X(x) (((x) & GENMASK(14, 8)) >> 8) +#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF(x) (((x) << 1) & GENMASK(7, 1)) +#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_M GENMASK(7, 1) +#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_X(x) (((x) & GENMASK(7, 1)) >> 1) +#define DEV_EEE_CFG_PORT_LPI BIT(0) + +#define DEV_RX_PATH_DELAY 0x10 + +#define DEV_TX_PATH_DELAY 0x14 + +#define DEV_PTP_PREDICT_CFG 0x18 + +#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG(x) (((x) << 4) & GENMASK(11, 4)) +#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_M GENMASK(11, 4) +#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_X(x) (((x) & GENMASK(11, 4)) >> 4) +#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG(x) ((x) & GENMASK(3, 0)) +#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG_M GENMASK(3, 0) + +#define DEV_MAC_ENA_CFG 0x1c + +#define DEV_MAC_ENA_CFG_RX_ENA BIT(4) +#define DEV_MAC_ENA_CFG_TX_ENA BIT(0) + +#define DEV_MAC_MODE_CFG 0x20 + +#define DEV_MAC_MODE_CFG_FC_WORD_SYNC_ENA BIT(8) +#define DEV_MAC_MODE_CFG_GIGA_MODE_ENA BIT(4) +#define DEV_MAC_MODE_CFG_FDX_ENA BIT(0) + +#define DEV_MAC_MAXLEN_CFG 0x24 + +#define DEV_MAC_TAGS_CFG 0x28 + +#define DEV_MAC_TAGS_CFG_TAG_ID(x) (((x) << 16) & GENMASK(31, 16)) +#define DEV_MAC_TAGS_CFG_TAG_ID_M GENMASK(31, 16) +#define DEV_MAC_TAGS_CFG_TAG_ID_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA BIT(2) +#define DEV_MAC_TAGS_CFG_PB_ENA BIT(1) +#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA BIT(0) + +#define DEV_MAC_ADV_CHK_CFG 0x2c + +#define DEV_MAC_ADV_CHK_CFG_LEN_DROP_ENA BIT(0) + +#define DEV_MAC_IFG_CFG 0x30 + +#define DEV_MAC_IFG_CFG_RESTORE_OLD_IPG_CHECK BIT(17) +#define DEV_MAC_IFG_CFG_REDUCED_TX_IFG BIT(16) +#define DEV_MAC_IFG_CFG_TX_IFG(x) (((x) << 8) & GENMASK(12, 8)) +#define DEV_MAC_IFG_CFG_TX_IFG_M GENMASK(12, 8) +#define DEV_MAC_IFG_CFG_TX_IFG_X(x) (((x) & GENMASK(12, 8)) >> 8) +#define DEV_MAC_IFG_CFG_RX_IFG2(x) (((x) << 4) & GENMASK(7, 4)) +#define DEV_MAC_IFG_CFG_RX_IFG2_M GENMASK(7, 4) +#define DEV_MAC_IFG_CFG_RX_IFG2_X(x) (((x) & GENMASK(7, 4)) >> 4) +#define DEV_MAC_IFG_CFG_RX_IFG1(x) ((x) & GENMASK(3, 0)) +#define DEV_MAC_IFG_CFG_RX_IFG1_M GENMASK(3, 0) + +#define DEV_MAC_HDX_CFG 0x34 + +#define DEV_MAC_HDX_CFG_BYPASS_COL_SYNC BIT(26) +#define DEV_MAC_HDX_CFG_OB_ENA BIT(25) +#define DEV_MAC_HDX_CFG_WEXC_DIS BIT(24) +#define DEV_MAC_HDX_CFG_SEED(x) (((x) << 16) & GENMASK(23, 16)) +#define DEV_MAC_HDX_CFG_SEED_M GENMASK(23, 16) +#define DEV_MAC_HDX_CFG_SEED_X(x) (((x) & GENMASK(23, 16)) >> 16) +#define DEV_MAC_HDX_CFG_SEED_LOAD BIT(12) +#define DEV_MAC_HDX_CFG_RETRY_AFTER_EXC_COL_ENA BIT(8) +#define DEV_MAC_HDX_CFG_LATE_COL_POS(x) ((x) & GENMASK(6, 0)) +#define DEV_MAC_HDX_CFG_LATE_COL_POS_M GENMASK(6, 0) + +#define DEV_MAC_DBG_CFG 0x38 + +#define DEV_MAC_DBG_CFG_TBI_MODE BIT(4) +#define DEV_MAC_DBG_CFG_IFG_CRS_EXT_CHK_ENA BIT(0) + +#define DEV_MAC_FC_MAC_LOW_CFG 0x3c + +#define DEV_MAC_FC_MAC_HIGH_CFG 0x40 + +#define DEV_MAC_STICKY 0x44 + +#define DEV_MAC_STICKY_RX_IPG_SHRINK_STICKY BIT(9) +#define DEV_MAC_STICKY_RX_PREAM_SHRINK_STICKY BIT(8) +#define DEV_MAC_STICKY_RX_CARRIER_EXT_STICKY BIT(7) +#define DEV_MAC_STICKY_RX_CARRIER_EXT_ERR_STICKY BIT(6) +#define DEV_MAC_STICKY_RX_JUNK_STICKY BIT(5) +#define DEV_MAC_STICKY_TX_RETRANSMIT_STICKY BIT(4) +#define DEV_MAC_STICKY_TX_JAM_STICKY BIT(3) +#define DEV_MAC_STICKY_TX_FIFO_OFLW_STICKY BIT(2) +#define DEV_MAC_STICKY_TX_FRM_LEN_OVR_STICKY BIT(1) +#define DEV_MAC_STICKY_TX_ABORT_STICKY BIT(0) + +#define PCS1G_CFG 0x48 + +#define PCS1G_CFG_LINK_STATUS_TYPE BIT(4) +#define PCS1G_CFG_AN_LINK_CTRL_ENA BIT(1) +#define PCS1G_CFG_PCS_ENA BIT(0) + +#define PCS1G_MODE_CFG 0x4c + +#define PCS1G_MODE_CFG_UNIDIR_MODE_ENA BIT(4) +#define PCS1G_MODE_CFG_SGMII_MODE_ENA BIT(0) + +#define PCS1G_SD_CFG 0x50 + +#define PCS1G_SD_CFG_SD_SEL BIT(8) +#define PCS1G_SD_CFG_SD_POL BIT(4) +#define PCS1G_SD_CFG_SD_ENA BIT(0) + +#define PCS1G_ANEG_CFG 0x54 + +#define PCS1G_ANEG_CFG_ADV_ABILITY(x) (((x) << 16) & GENMASK(31, 16)) +#define PCS1G_ANEG_CFG_ADV_ABILITY_M GENMASK(31, 16) +#define PCS1G_ANEG_CFG_ADV_ABILITY_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define PCS1G_ANEG_CFG_SW_RESOLVE_ENA BIT(8) +#define PCS1G_ANEG_CFG_ANEG_RESTART_ONE_SHOT BIT(1) +#define PCS1G_ANEG_CFG_ANEG_ENA BIT(0) + +#define PCS1G_ANEG_NP_CFG 0x58 + +#define PCS1G_ANEG_NP_CFG_NP_TX(x) (((x) << 16) & GENMASK(31, 16)) +#define PCS1G_ANEG_NP_CFG_NP_TX_M GENMASK(31, 16) +#define PCS1G_ANEG_NP_CFG_NP_TX_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define PCS1G_ANEG_NP_CFG_NP_LOADED_ONE_SHOT BIT(0) + +#define PCS1G_LB_CFG 0x5c + +#define PCS1G_LB_CFG_RA_ENA BIT(4) +#define PCS1G_LB_CFG_GMII_PHY_LB_ENA BIT(1) +#define PCS1G_LB_CFG_TBI_HOST_LB_ENA BIT(0) + +#define PCS1G_DBG_CFG 0x60 + +#define PCS1G_DBG_CFG_UDLT BIT(0) + +#define PCS1G_CDET_CFG 0x64 + +#define PCS1G_CDET_CFG_CDET_ENA BIT(0) + +#define PCS1G_ANEG_STATUS 0x68 + +#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY(x) (((x) << 16) & GENMASK(31, 16)) +#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_M GENMASK(31, 16) +#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_X(x) (((x) & GENMASK(31, 16)) >> 16) +#define PCS1G_ANEG_STATUS_PR BIT(4) +#define PCS1G_ANEG_STATUS_PAGE_RX_STICKY BIT(3) +#define PCS1G_ANEG_STATUS_ANEG_COMPLETE BIT(0) + +#define PCS1G_ANEG_NP_STATUS 0x6c + +#define PCS1G_LINK_STATUS 0x70 + +#define PCS1G_LINK_STATUS_DELAY_VAR(x) (((x) << 12) & GENMASK(15, 12)) +#define PCS1G_LINK_STATUS_DELAY_VAR_M GENMASK(15, 12) +#define PCS1G_LINK_STATUS_DELAY_VAR_X(x) (((x) & GENMASK(15, 12)) >> 12) +#define PCS1G_LINK_STATUS_SIGNAL_DETECT BIT(8) +#define PCS1G_LINK_STATUS_LINK_STATUS BIT(4) +#define PCS1G_LINK_STATUS_SYNC_STATUS BIT(0) + +#define PCS1G_LINK_DOWN_CNT 0x74 + +#define PCS1G_STICKY 0x78 + +#define PCS1G_STICKY_LINK_DOWN_STICKY BIT(4) +#define PCS1G_STICKY_OUT_OF_SYNC_STICKY BIT(0) + +#define PCS1G_DEBUG_STATUS 0x7c + +#define PCS1G_LPI_CFG 0x80 + +#define PCS1G_LPI_CFG_QSGMII_MS_SEL BIT(20) +#define PCS1G_LPI_CFG_RX_LPI_OUT_DIS BIT(17) +#define PCS1G_LPI_CFG_LPI_TESTMODE BIT(16) +#define PCS1G_LPI_CFG_LPI_RX_WTIM(x) (((x) << 4) & GENMASK(5, 4)) +#define PCS1G_LPI_CFG_LPI_RX_WTIM_M GENMASK(5, 4) +#define PCS1G_LPI_CFG_LPI_RX_WTIM_X(x) (((x) & GENMASK(5, 4)) >> 4) +#define PCS1G_LPI_CFG_TX_ASSERT_LPIDLE BIT(0) + +#define PCS1G_LPI_WAKE_ERROR_CNT 0x84 + +#define PCS1G_LPI_STATUS 0x88 + +#define PCS1G_LPI_STATUS_RX_LPI_FAIL BIT(16) +#define PCS1G_LPI_STATUS_RX_LPI_EVENT_STICKY BIT(12) +#define PCS1G_LPI_STATUS_RX_QUIET BIT(9) +#define PCS1G_LPI_STATUS_RX_LPI_MODE BIT(8) +#define PCS1G_LPI_STATUS_TX_LPI_EVENT_STICKY BIT(4) +#define PCS1G_LPI_STATUS_TX_QUIET BIT(1) +#define PCS1G_LPI_STATUS_TX_LPI_MODE BIT(0) + +#define PCS1G_TSTPAT_MODE_CFG 0x8c + +#define PCS1G_TSTPAT_STATUS 0x90 + +#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT(x) (((x) << 8) & GENMASK(15, 8)) +#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_M GENMASK(15, 8) +#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_X(x) (((x) & GENMASK(15, 8)) >> 8) +#define PCS1G_TSTPAT_STATUS_JTP_ERR BIT(4) +#define PCS1G_TSTPAT_STATUS_JTP_LOCK BIT(0) + +#define DEV_PCS_FX100_CFG 0x94 + +#define DEV_PCS_FX100_CFG_SD_SEL BIT(26) +#define DEV_PCS_FX100_CFG_SD_POL BIT(25) +#define DEV_PCS_FX100_CFG_SD_ENA BIT(24) +#define DEV_PCS_FX100_CFG_LOOPBACK_ENA BIT(20) +#define DEV_PCS_FX100_CFG_SWAP_MII_ENA BIT(16) +#define DEV_PCS_FX100_CFG_RXBITSEL(x) (((x) << 12) & GENMASK(15, 12)) +#define DEV_PCS_FX100_CFG_RXBITSEL_M GENMASK(15, 12) +#define DEV_PCS_FX100_CFG_RXBITSEL_X(x) (((x) & GENMASK(15, 12)) >> 12) +#define DEV_PCS_FX100_CFG_SIGDET_CFG(x) (((x) << 9) & GENMASK(10, 9)) +#define DEV_PCS_FX100_CFG_SIGDET_CFG_M GENMASK(10, 9) +#define DEV_PCS_FX100_CFG_SIGDET_CFG_X(x) (((x) & GENMASK(10, 9)) >> 9) +#define DEV_PCS_FX100_CFG_LINKHYST_TM_ENA BIT(8) +#define DEV_PCS_FX100_CFG_LINKHYSTTIMER(x) (((x) << 4) & GENMASK(7, 4)) +#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_M GENMASK(7, 4) +#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_X(x) (((x) & GENMASK(7, 4)) >> 4) +#define DEV_PCS_FX100_CFG_UNIDIR_MODE_ENA BIT(3) +#define DEV_PCS_FX100_CFG_FEFCHK_ENA BIT(2) +#define DEV_PCS_FX100_CFG_FEFGEN_ENA BIT(1) +#define DEV_PCS_FX100_CFG_PCS_ENA BIT(0) + +#define DEV_PCS_FX100_STATUS 0x98 + +#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP(x) (((x) << 8) & GENMASK(11, 8)) +#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_M GENMASK(11, 8) +#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_X(x) (((x) & GENMASK(11, 8)) >> 8) +#define DEV_PCS_FX100_STATUS_PCS_ERROR_STICKY BIT(7) +#define DEV_PCS_FX100_STATUS_FEF_FOUND_STICKY BIT(6) +#define DEV_PCS_FX100_STATUS_SSD_ERROR_STICKY BIT(5) +#define DEV_PCS_FX100_STATUS_SYNC_LOST_STICKY BIT(4) +#define DEV_PCS_FX100_STATUS_FEF_STATUS BIT(2) +#define DEV_PCS_FX100_STATUS_SIGNAL_DETECT BIT(1) +#define DEV_PCS_FX100_STATUS_SYNC_STATUS BIT(0) + +#endif diff --git a/include/soc/mscc/ocelot_qsys.h b/include/soc/mscc/ocelot_qsys.h new file mode 100644 index 000000000000..d8c63aa761be --- /dev/null +++ b/include/soc/mscc/ocelot_qsys.h @@ -0,0 +1,270 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ +/* + * Microsemi Ocelot Switch driver + * + * Copyright (c) 2017 Microsemi Corporation + */ + +#ifndef _MSCC_OCELOT_QSYS_H_ +#define _MSCC_OCELOT_QSYS_H_ + +#define QSYS_PORT_MODE_RSZ 0x4 + +#define QSYS_PORT_MODE_DEQUEUE_DIS BIT(1) +#define QSYS_PORT_MODE_DEQUEUE_LATE BIT(0) + +#define QSYS_SWITCH_PORT_MODE_RSZ 0x4 + +#define QSYS_SWITCH_PORT_MODE_PORT_ENA BIT(14) +#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG(x) (((x) << 11) & GENMASK(13, 11)) +#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_M GENMASK(13, 11) +#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_X(x) (((x) & GENMASK(13, 11)) >> 11) +#define QSYS_SWITCH_PORT_MODE_YEL_RSRVD BIT(10) +#define QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE BIT(9) +#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA(x) (((x) << 1) & GENMASK(8, 1)) +#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_M GENMASK(8, 1) +#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_X(x) (((x) & GENMASK(8, 1)) >> 1) +#define QSYS_SWITCH_PORT_MODE_TX_PFC_MODE BIT(0) + +#define QSYS_STAT_CNT_CFG_TX_GREEN_CNT_MODE BIT(5) +#define QSYS_STAT_CNT_CFG_TX_YELLOW_CNT_MODE BIT(4) +#define QSYS_STAT_CNT_CFG_DROP_GREEN_CNT_MODE BIT(3) +#define QSYS_STAT_CNT_CFG_DROP_YELLOW_CNT_MODE BIT(2) +#define QSYS_STAT_CNT_CFG_DROP_COUNT_ONCE BIT(1) +#define QSYS_STAT_CNT_CFG_DROP_COUNT_EGRESS BIT(0) + +#define QSYS_EEE_CFG_RSZ 0x4 + +#define QSYS_EEE_THRES_EEE_HIGH_BYTES(x) (((x) << 8) & GENMASK(15, 8)) +#define QSYS_EEE_THRES_EEE_HIGH_BYTES_M GENMASK(15, 8) +#define QSYS_EEE_THRES_EEE_HIGH_BYTES_X(x) (((x) & GENMASK(15, 8)) >> 8) +#define QSYS_EEE_THRES_EEE_HIGH_FRAMES(x) ((x) & GENMASK(7, 0)) +#define QSYS_EEE_THRES_EEE_HIGH_FRAMES_M GENMASK(7, 0) + +#define QSYS_SW_STATUS_RSZ 0x4 + +#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT(x) (((x) << 8) & GENMASK(12, 8)) +#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_M GENMASK(12, 8) +#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_X(x) (((x) & GENMASK(12, 8)) >> 8) +#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK(x) ((x) & GENMASK(7, 0)) +#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M GENMASK(7, 0) + +#define QSYS_QMAP_GSZ 0x4 + +#define QSYS_QMAP_SE_BASE(x) (((x) << 5) & GENMASK(12, 5)) +#define QSYS_QMAP_SE_BASE_M GENMASK(12, 5) +#define QSYS_QMAP_SE_BASE_X(x) (((x) & GENMASK(12, 5)) >> 5) +#define QSYS_QMAP_SE_IDX_SEL(x) (((x) << 2) & GENMASK(4, 2)) +#define QSYS_QMAP_SE_IDX_SEL_M GENMASK(4, 2) +#define QSYS_QMAP_SE_IDX_SEL_X(x) (((x) & GENMASK(4, 2)) >> 2) +#define QSYS_QMAP_SE_INP_SEL(x) ((x) & GENMASK(1, 0)) +#define QSYS_QMAP_SE_INP_SEL_M GENMASK(1, 0) + +#define QSYS_ISDX_SGRP_GSZ 0x4 + +#define QSYS_TIMED_FRAME_ENTRY_GSZ 0x4 + +#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT(x) (((x) << 9) & GENMASK(18, 9)) +#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_M GENMASK(18, 9) +#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_X(x) (((x) & GENMASK(18, 9)) >> 9) +#define QSYS_TFRM_MISC_TIMED_CANCEL_1SHOT BIT(8) +#define QSYS_TFRM_MISC_TIMED_SLOT_MODE_MC BIT(7) +#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT(x) ((x) & GENMASK(6, 0)) +#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT_M GENMASK(6, 0) + +#define QSYS_RED_PROFILE_RSZ 0x4 + +#define QSYS_RED_PROFILE_WM_RED_LOW(x) (((x) << 8) & GENMASK(15, 8)) +#define QSYS_RED_PROFILE_WM_RED_LOW_M GENMASK(15, 8) +#define QSYS_RED_PROFILE_WM_RED_LOW_X(x) (((x) & GENMASK(15, 8)) >> 8) +#define QSYS_RED_PROFILE_WM_RED_HIGH(x) ((x) & GENMASK(7, 0)) +#define QSYS_RED_PROFILE_WM_RED_HIGH_M GENMASK(7, 0) + +#define QSYS_RES_CFG_GSZ 0x8 + +#define QSYS_RES_STAT_GSZ 0x8 + +#define QSYS_RES_STAT_INUSE(x) (((x) << 12) & GENMASK(23, 12)) +#define QSYS_RES_STAT_INUSE_M GENMASK(23, 12) +#define QSYS_RES_STAT_INUSE_X(x) (((x) & GENMASK(23, 12)) >> 12) +#define QSYS_RES_STAT_MAXUSE(x) ((x) & GENMASK(11, 0)) +#define QSYS_RES_STAT_MAXUSE_M GENMASK(11, 0) + +#define QSYS_EVENTS_CORE_EV_FDC(x) (((x) << 2) & GENMASK(4, 2)) +#define QSYS_EVENTS_CORE_EV_FDC_M GENMASK(4, 2) +#define QSYS_EVENTS_CORE_EV_FDC_X(x) (((x) & GENMASK(4, 2)) >> 2) +#define QSYS_EVENTS_CORE_EV_FRD(x) ((x) & GENMASK(1, 0)) +#define QSYS_EVENTS_CORE_EV_FRD_M GENMASK(1, 0) + +#define QSYS_QMAXSDU_CFG_0_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_1_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_2_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_3_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_4_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_5_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_6_RSZ 0x4 + +#define QSYS_QMAXSDU_CFG_7_RSZ 0x4 + +#define QSYS_PREEMPTION_CFG_RSZ 0x4 + +#define QSYS_PREEMPTION_CFG_P_QUEUES(x) ((x) & GENMASK(7, 0)) +#define QSYS_PREEMPTION_CFG_P_QUEUES_M GENMASK(7, 0) +#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE(x) (((x) << 8) & GENMASK(9, 8)) +#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_M GENMASK(9, 8) +#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_X(x) (((x) & GENMASK(9, 8)) >> 8) +#define QSYS_PREEMPTION_CFG_STRICT_IPG(x) (((x) << 12) & GENMASK(13, 12)) +#define QSYS_PREEMPTION_CFG_STRICT_IPG_M GENMASK(13, 12) +#define QSYS_PREEMPTION_CFG_STRICT_IPG_X(x) (((x) & GENMASK(13, 12)) >> 12) +#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE(x) (((x) << 16) & GENMASK(31, 16)) +#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_M GENMASK(31, 16) +#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_X(x) (((x) & GENMASK(31, 16)) >> 16) + +#define QSYS_CIR_CFG_GSZ 0x80 + +#define QSYS_CIR_CFG_CIR_RATE(x) (((x) << 6) & GENMASK(20, 6)) +#define QSYS_CIR_CFG_CIR_RATE_M GENMASK(20, 6) +#define QSYS_CIR_CFG_CIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6) +#define QSYS_CIR_CFG_CIR_BURST(x) ((x) & GENMASK(5, 0)) +#define QSYS_CIR_CFG_CIR_BURST_M GENMASK(5, 0) + +#define QSYS_EIR_CFG_GSZ 0x80 + +#define QSYS_EIR_CFG_EIR_RATE(x) (((x) << 7) & GENMASK(21, 7)) +#define QSYS_EIR_CFG_EIR_RATE_M GENMASK(21, 7) +#define QSYS_EIR_CFG_EIR_RATE_X(x) (((x) & GENMASK(21, 7)) >> 7) +#define QSYS_EIR_CFG_EIR_BURST(x) (((x) << 1) & GENMASK(6, 1)) +#define QSYS_EIR_CFG_EIR_BURST_M GENMASK(6, 1) +#define QSYS_EIR_CFG_EIR_BURST_X(x) (((x) & GENMASK(6, 1)) >> 1) +#define QSYS_EIR_CFG_EIR_MARK_ENA BIT(0) + +#define QSYS_SE_CFG_GSZ 0x80 + +#define QSYS_SE_CFG_SE_DWRR_CNT(x) (((x) << 6) & GENMASK(9, 6)) +#define QSYS_SE_CFG_SE_DWRR_CNT_M GENMASK(9, 6) +#define QSYS_SE_CFG_SE_DWRR_CNT_X(x) (((x) & GENMASK(9, 6)) >> 6) +#define QSYS_SE_CFG_SE_RR_ENA BIT(5) +#define QSYS_SE_CFG_SE_AVB_ENA BIT(4) +#define QSYS_SE_CFG_SE_FRM_MODE(x) (((x) << 2) & GENMASK(3, 2)) +#define QSYS_SE_CFG_SE_FRM_MODE_M GENMASK(3, 2) +#define QSYS_SE_CFG_SE_FRM_MODE_X(x) (((x) & GENMASK(3, 2)) >> 2) +#define QSYS_SE_CFG_SE_EXC_ENA BIT(1) +#define QSYS_SE_CFG_SE_EXC_FWD BIT(0) + +#define QSYS_SE_DWRR_CFG_GSZ 0x80 +#define QSYS_SE_DWRR_CFG_RSZ 0x4 + +#define QSYS_SE_CONNECT_GSZ 0x80 + +#define QSYS_SE_CONNECT_SE_OUTP_IDX(x) (((x) << 17) & GENMASK(24, 17)) +#define QSYS_SE_CONNECT_SE_OUTP_IDX_M GENMASK(24, 17) +#define QSYS_SE_CONNECT_SE_OUTP_IDX_X(x) (((x) & GENMASK(24, 17)) >> 17) +#define QSYS_SE_CONNECT_SE_INP_IDX(x) (((x) << 9) & GENMASK(16, 9)) +#define QSYS_SE_CONNECT_SE_INP_IDX_M GENMASK(16, 9) +#define QSYS_SE_CONNECT_SE_INP_IDX_X(x) (((x) & GENMASK(16, 9)) >> 9) +#define QSYS_SE_CONNECT_SE_OUTP_CON(x) (((x) << 5) & GENMASK(8, 5)) +#define QSYS_SE_CONNECT_SE_OUTP_CON_M GENMASK(8, 5) +#define QSYS_SE_CONNECT_SE_OUTP_CON_X(x) (((x) & GENMASK(8, 5)) >> 5) +#define QSYS_SE_CONNECT_SE_INP_CNT(x) (((x) << 1) & GENMASK(4, 1)) +#define QSYS_SE_CONNECT_SE_INP_CNT_M GENMASK(4, 1) +#define QSYS_SE_CONNECT_SE_INP_CNT_X(x) (((x) & GENMASK(4, 1)) >> 1) +#define QSYS_SE_CONNECT_SE_TERMINAL BIT(0) + +#define QSYS_SE_DLB_SENSE_GSZ 0x80 + +#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO(x) (((x) << 11) & GENMASK(13, 11)) +#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_M GENMASK(13, 11) +#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_X(x) (((x) & GENMASK(13, 11)) >> 11) +#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT(x) (((x) << 7) & GENMASK(10, 7)) +#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_M GENMASK(10, 7) +#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_X(x) (((x) & GENMASK(10, 7)) >> 7) +#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT(x) (((x) << 3) & GENMASK(6, 3)) +#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_M GENMASK(6, 3) +#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_X(x) (((x) & GENMASK(6, 3)) >> 3) +#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_ENA BIT(2) +#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_ENA BIT(1) +#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_ENA BIT(0) + +#define QSYS_CIR_STATE_GSZ 0x80 + +#define QSYS_CIR_STATE_CIR_LVL(x) (((x) << 4) & GENMASK(25, 4)) +#define QSYS_CIR_STATE_CIR_LVL_M GENMASK(25, 4) +#define QSYS_CIR_STATE_CIR_LVL_X(x) (((x) & GENMASK(25, 4)) >> 4) +#define QSYS_CIR_STATE_SHP_TIME(x) ((x) & GENMASK(3, 0)) +#define QSYS_CIR_STATE_SHP_TIME_M GENMASK(3, 0) + +#define QSYS_EIR_STATE_GSZ 0x80 + +#define QSYS_SE_STATE_GSZ 0x80 + +#define QSYS_SE_STATE_SE_OUTP_LVL(x) (((x) << 1) & GENMASK(2, 1)) +#define QSYS_SE_STATE_SE_OUTP_LVL_M GENMASK(2, 1) +#define QSYS_SE_STATE_SE_OUTP_LVL_X(x) (((x) & GENMASK(2, 1)) >> 1) +#define QSYS_SE_STATE_SE_WAS_YEL BIT(0) + +#define QSYS_HSCH_MISC_CFG_SE_CONNECT_VLD BIT(8) +#define QSYS_HSCH_MISC_CFG_FRM_ADJ(x) (((x) << 3) & GENMASK(7, 3)) +#define QSYS_HSCH_MISC_CFG_FRM_ADJ_M GENMASK(7, 3) +#define QSYS_HSCH_MISC_CFG_FRM_ADJ_X(x) (((x) & GENMASK(7, 3)) >> 3) +#define QSYS_HSCH_MISC_CFG_LEAK_DIS BIT(2) +#define QSYS_HSCH_MISC_CFG_QSHP_EXC_ENA BIT(1) +#define QSYS_HSCH_MISC_CFG_PFC_BYP_UPD BIT(0) + +#define QSYS_TAG_CONFIG_RSZ 0x4 + +#define QSYS_TAG_CONFIG_ENABLE BIT(0) +#define QSYS_TAG_CONFIG_LINK_SPEED(x) (((x) << 4) & GENMASK(5, 4)) +#define QSYS_TAG_CONFIG_LINK_SPEED_M GENMASK(5, 4) +#define QSYS_TAG_CONFIG_LINK_SPEED_X(x) (((x) & GENMASK(5, 4)) >> 4) +#define QSYS_TAG_CONFIG_INIT_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8)) +#define QSYS_TAG_CONFIG_INIT_GATE_STATE_M GENMASK(15, 8) +#define QSYS_TAG_CONFIG_INIT_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8) +#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES(x) (((x) << 16) & GENMASK(23, 16)) +#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_M GENMASK(23, 16) +#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_X(x) (((x) & GENMASK(23, 16)) >> 16) + +#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(x) ((x) & GENMASK(7, 0)) +#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M GENMASK(7, 0) +#define QSYS_TAS_PARAM_CFG_CTRL_ALWAYS_GUARD_BAND_SCH_Q BIT(8) +#define QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE BIT(16) + +#define QSYS_PORT_MAX_SDU_RSZ 0x4 + +#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0)) +#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0) +#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(31, 16)) +#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_M GENMASK(31, 16) +#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(31, 16)) >> 16) + +#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM(x) ((x) & GENMASK(5, 0)) +#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM_M GENMASK(5, 0) +#define QSYS_GCL_CFG_REG_1_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8)) +#define QSYS_GCL_CFG_REG_1_GATE_STATE_M GENMASK(15, 8) +#define QSYS_GCL_CFG_REG_1_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8) + +#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0)) +#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0) +#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(31, 16)) +#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_M GENMASK(31, 16) +#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(31, 16)) >> 16) + +#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0)) +#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB_M GENMASK(15, 0) +#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE(x) (((x) << 16) & GENMASK(23, 16)) +#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_M GENMASK(23, 16) +#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_X(x) (((x) & GENMASK(23, 16)) >> 16) +#define QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING BIT(24) + +#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(x) ((x) & GENMASK(5, 0)) +#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M GENMASK(5, 0) +#define QSYS_GCL_STATUS_REG_1_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8)) +#define QSYS_GCL_STATUS_REG_1_GATE_STATE_M GENMASK(15, 8) +#define QSYS_GCL_STATUS_REG_1_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8) + +#endif -- cgit v1.2.3 From 8007880a2ca97c34e7ccd1fcf12daf854b792544 Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Sat, 14 Dec 2019 10:51:17 +0200 Subject: net/mlx5: limit the function in local scope The function mlx5_buf_alloc_node is only used by the function in the local scope. So it is appropriate to limit this function in the local scope. Signed-off-by: Zhu Yanjun Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 27200dea0297..59cff380f41a 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -928,8 +928,6 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); -int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size, - struct mlx5_frag_buf *buf, int node); int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_frag_buf *buf); void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf); -- cgit v1.2.3 From dcfea72e79b0aa7a057c8f6024169d86a1bbc84b Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 8 Jan 2020 16:59:02 -0500 Subject: net: introduce skb_list_walk_safe for skb segment walking As part of the continual effort to remove direct usage of skb->next and skb->prev, this patch adds a helper for iterating through the singly-linked variant of skb lists, which are used for lists of GSO packet. The name "skb_list_..." has been chosen to match the existing function, "kfree_skb_list, which also operates on these singly-linked lists, and the "..._walk_safe" part is the same idiom as elsewhere in the kernel. This patch removes the helper from wireguard and puts it into linux/skbuff.h, while making it a bit more robust for general usage. In particular, parenthesis are added around the macro argument usage, and it now accounts for trying to iterate through an already-null skb pointer, which will simply run the iteration zero times. This latter enhancement means it can be used to replace both do { ... } while and while (...) open-coded idioms. This should take care of these three possible usages, which match all current methods of iterations. skb_list_walk_safe(segs, skb, next) { ... } skb_list_walk_safe(skb, skb, next) { ... } skb_list_walk_safe(segs, skb, segs) { ... } Gcc appears to generate efficient code for each of these. Signed-off-by: Jason A. Donenfeld Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index e9133bcf0544..64e5b1be9ff5 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1478,6 +1478,11 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb) skb->next = NULL; } +/* Iterate through singly-linked GSO fragments of an skb. */ +#define skb_list_walk_safe(first, skb, next) \ + for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb); \ + (skb) = (next), (next) = (skb) ? (skb)->next : NULL) + static inline void skb_list_del_init(struct sk_buff *skb) { __list_del_entry(&skb->list); -- cgit v1.2.3 From 6181e5cb752e5de9f56fbcee3f0206a2c51f1478 Mon Sep 17 00:00:00 2001 From: Vikas Gupta Date: Thu, 2 Jan 2020 21:18:09 +0530 Subject: devlink: add support for reporter recovery completion It is possible that a reporter recovery completion do not finish successfully when recovery is triggered via devlink_health_reporter_recover as recovery could be processed in different context. In such scenario an error is returned by driver when recover hook is invoked and successful recovery completion is intimated later. Expose devlink recover done API to update recovery stats. Signed-off-by: Vikas Gupta Signed-off-by: David S. Miller --- include/net/devlink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 47f87b2fcf63..453f45cc1519 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1000,6 +1000,8 @@ int devlink_health_report(struct devlink_health_reporter *reporter, void devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, enum devlink_health_reporter_state state); +void +devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter); bool devlink_is_reload_failed(const struct devlink *devlink); -- cgit v1.2.3 From 4d776482ecc689bdd68627985ac4cb5a6f325953 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 7 Jan 2020 21:06:05 -0800 Subject: net: dsa: Get information about stacked DSA protocol It is possible to stack multiple DSA switches in a way that they are not part of the tree (disjoint) but the DSA master of a switch is a DSA slave of another. When that happens switch drivers may have to know this is the case so as to determine whether their tagging protocol has a remove chance of working. This is useful for specific switch drivers such as b53 where devices have been known to be stacked in the wild without the Broadcom tag protocol supporting that feature. This allows b53 to continue supporting those devices by forcing the disabling of Broadcom tags on the outermost switches if necessary. The get_tag_protocol() function is therefore updated to gain an additional enum dsa_tag_protocol argument which denotes the current tagging protocol used by the DSA master we are attached to, else DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- include/net/dsa.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/dsa.h b/include/net/dsa.h index 0c39fed8cd99..63495e3443ac 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -380,7 +380,8 @@ typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid, bool is_static, void *data); struct dsa_switch_ops { enum dsa_tag_protocol (*get_tag_protocol)(struct dsa_switch *ds, - int port); + int port, + enum dsa_tag_protocol mprot); int (*setup)(struct dsa_switch *ds); void (*teardown)(struct dsa_switch *ds); -- cgit v1.2.3 From 27ae7997a66174cb8afd6a75b3989f5e0c1b9e5a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 8 Jan 2020 16:35:03 -0800 Subject: bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS This patch allows the kernel's struct ops (i.e. func ptr) to be implemented in BPF. The first use case in this series is the "struct tcp_congestion_ops" which will be introduced in a latter patch. This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS. The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular func ptr of a kernel struct. The attr->attach_btf_id is the btf id of a kernel struct. The attr->expected_attach_type is the member "index" of that kernel struct. The first member of a struct starts with member index 0. That will avoid ambiguity when a kernel struct has multiple func ptrs with the same func signature. For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written to implement the "init" func ptr of the "struct tcp_congestion_ops". The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops" of the _running_ kernel. The attr->expected_attach_type is 3. The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved by arch_prepare_bpf_trampoline that will be done in the next patch when introducing BPF_MAP_TYPE_STRUCT_OPS. "struct bpf_struct_ops" is introduced as a common interface for the kernel struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel struct will need to implement an instance of the "struct bpf_struct_ops". The supporting kernel struct also needs to implement a bpf_verifier_ops. During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right bpf_verifier_ops by searching the attr->attach_btf_id. A new "btf_struct_access" is also added to the bpf_verifier_ops such that the supporting kernel struct can optionally provide its own specific check on accessing the func arg (e.g. provide limited write access). After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called to initialize some values (e.g. the btf id of the supporting kernel struct) and it can only be done once the btf_vmlinux is available. The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog if the return type of the prog->aux->attach_func_proto is "void". Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Acked-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com --- include/linux/bpf.h | 30 ++++++++++++++++++++++++++++++ include/linux/bpf_types.h | 4 ++++ include/linux/btf.h | 34 ++++++++++++++++++++++++++++++++++ include/uapi/linux/bpf.h | 1 + 4 files changed, 69 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b14e51d56a82..50f3b20ae284 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -349,6 +349,10 @@ struct bpf_verifier_ops { const struct bpf_insn *src, struct bpf_insn *dst, struct bpf_prog *prog, u32 *target_size); + int (*btf_struct_access)(struct bpf_verifier_log *log, + const struct btf_type *t, int off, int size, + enum bpf_access_type atype, + u32 *next_btf_id); }; struct bpf_prog_offload_ops { @@ -668,6 +672,32 @@ struct bpf_array_aux { struct work_struct work; }; +struct btf_type; +struct btf_member; + +#define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64 +struct bpf_struct_ops { + const struct bpf_verifier_ops *verifier_ops; + int (*init)(struct btf *btf); + int (*check_member)(const struct btf_type *t, + const struct btf_member *member); + const struct btf_type *type; + const char *name; + struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; + u32 type_id; +}; + +#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) +const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id); +void bpf_struct_ops_init(struct btf *btf); +#else +static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) +{ + return NULL; +} +static inline void bpf_struct_ops_init(struct btf *btf) { } +#endif + struct bpf_array { struct bpf_map map; u32 elem_size; diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 93740b3614d7..fadd243ffa2d 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -65,6 +65,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2, BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport, struct sk_reuseport_md, struct sk_reuseport_kern) #endif +#if defined(CONFIG_BPF_JIT) +BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops, + void *, void *) +#endif BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) diff --git a/include/linux/btf.h b/include/linux/btf.h index 79d4abc2556a..f74a09a7120b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -53,6 +53,18 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, u32 expected_offset, u32 expected_size); int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); bool btf_type_is_void(const struct btf_type *t); +s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); +const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, + u32 id, u32 *res_id); +const struct btf_type *btf_type_resolve_ptr(const struct btf *btf, + u32 id, u32 *res_id); +const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf, + u32 id, u32 *res_id); + +#define for_each_member(i, struct_type, member) \ + for (i = 0, member = btf_type_member(struct_type); \ + i < btf_type_vlen(struct_type); \ + i++, member++) static inline bool btf_type_is_ptr(const struct btf_type *t) { @@ -84,6 +96,28 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t) return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO; } +static inline u16 btf_type_vlen(const struct btf_type *t) +{ + return BTF_INFO_VLEN(t->info); +} + +static inline bool btf_type_kflag(const struct btf_type *t) +{ + return BTF_INFO_KFLAG(t->info); +} + +static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type, + const struct btf_member *member) +{ + return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset) + : 0; +} + +static inline const struct btf_member *btf_type_member(const struct btf_type *t) +{ + return (const struct btf_member *)(t + 1); +} + #ifdef CONFIG_BPF_SYSCALL const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const char *btf_name_by_offset(const struct btf *btf, u32 offset); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7df436da542d..c1eeb3e0e116 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -174,6 +174,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_PROG_TYPE_TRACING, + BPF_PROG_TYPE_STRUCT_OPS, }; enum bpf_attach_type { -- cgit v1.2.3 From 85d33df357b634649ddbe0a20fd2d0fc5732c3cb Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 8 Jan 2020 16:35:05 -0800 Subject: bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value is a kernel struct with its func ptr implemented in bpf prog. This new map is the interface to register/unregister/introspect a bpf implemented kernel struct. The kernel struct is actually embedded inside another new struct (or called the "value" struct in the code). For example, "struct tcp_congestion_ops" is embbeded in: struct bpf_struct_ops_tcp_congestion_ops { refcount_t refcnt; enum bpf_struct_ops_state state; struct tcp_congestion_ops data; /* <-- kernel subsystem struct here */ } The map value is "struct bpf_struct_ops_tcp_congestion_ops". The "bpftool map dump" will then be able to show the state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g. number of tcp_sock in the tcp_congestion_ops case). This "value" struct is created automatically by a macro. Having a separate "value" struct will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding "void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some initialization works before registering the struct_ops to the kernel subsystem). The libbpf will take care of finding and populating the "struct bpf_struct_ops_XYZ" from "struct XYZ". Register a struct_ops to a kernel subsystem: 1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s) 2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the running kernel. Instead of reusing the attr->btf_value_type_id, btf_vmlinux_value_type_id s added such that attr->btf_fd can still be used as the "user" btf which could store other useful sysadmin/debug info that may be introduced in the furture, e.g. creation-date/compiler-details/map-creator...etc. 3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described in the running kernel btf. Populate the value of this object. The function ptr should be populated with the prog fds. 4. Call BPF_MAP_UPDATE with the object created in (3) as the map value. The key is always "0". During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's args as an array of u64 is generated. BPF_MAP_UPDATE also allows the specific struct_ops to do some final checks in "st_ops->init_member()" (e.g. ensure all mandatory func ptrs are implemented). If everything looks good, it will register this kernel struct to the kernel subsystem. The map will not allow further update from this point. Unregister a struct_ops from the kernel subsystem: BPF_MAP_DELETE with key "0". Introspect a struct_ops: BPF_MAP_LOOKUP_ELEM with key "0". The map value returned will have the prog _id_ populated as the func ptr. The map value state (enum bpf_struct_ops_state) will transit from: INIT (map created) => INUSE (map updated, i.e. reg) => TOBEFREE (map value deleted, i.e. unreg) The kernel subsystem needs to call bpf_struct_ops_get() and bpf_struct_ops_put() to manage the "refcnt" in the "struct bpf_struct_ops_XYZ". This patch uses a separate refcnt for the purose of tracking the subsystem usage. Another approach is to reuse the map->refcnt and then "show" (i.e. during map_lookup) the subsystem's usage by doing map->refcnt - map->usercnt to filter out the map-fd/pinned-map usage. However, that will also tie down the future semantics of map->refcnt and map->usercnt. The very first subsystem's refcnt (during reg()) holds one count to map->refcnt. When the very last subsystem's refcnt is gone, it will also release the map->refcnt. All bpf_prog will be freed when the map->refcnt reaches 0 (i.e. during map_free()). Here is how the bpftool map command will look like: [root@arch-fb-vm1 bpf]# bpftool map show 6: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 6 [root@arch-fb-vm1 bpf]# bpftool map dump id 6 [{ "value": { "refcnt": { "refs": { "counter": 1 } }, "state": 1, "data": { "list": { "next": 0, "prev": 0 }, "key": 0, "flags": 2, "init": 24, "release": 0, "ssthresh": 25, "cong_avoid": 30, "set_state": 27, "cwnd_event": 28, "in_ack_event": 26, "undo_cwnd": 29, "pkts_acked": 0, "min_tso_segs": 0, "sndbuf_expand": 0, "cong_control": 0, "get_info": 0, "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0 ], "owner": 0 } } } ] Misc Notes: * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup. It does an inplace update on "*value" instead returning a pointer to syscall.c. Otherwise, it needs a separate copy of "zero" value for the BPF_STRUCT_OPS_STATE_INIT to avoid races. * The bpf_struct_ops_map_delete_elem() is also called without preempt_disable() from map_delete_elem(). It is because the "->unreg()" may requires sleepable context, e.g. the "tcp_unregister_congestion_control()". * "const" is added to some of the existing "struct btf_func_model *" function arg to avoid a compiler warning caused by this patch. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Acked-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com --- include/linux/bpf.h | 49 +++++++++++++++++++++++++++++++++++++++++++++-- include/linux/bpf_types.h | 3 +++ include/linux/btf.h | 13 +++++++++++++ include/uapi/linux/bpf.h | 7 ++++++- 4 files changed, 69 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 50f3b20ae284..a7bfe8a388c6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -17,6 +17,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -106,6 +107,7 @@ struct bpf_map { struct btf *btf; struct bpf_map_memory memory; char name[BPF_OBJ_NAME_LEN]; + u32 btf_vmlinux_value_type_id; bool unpriv_array; bool frozen; /* write-once; write-protected by freeze_mutex */ /* 22 bytes hole */ @@ -183,7 +185,8 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map) static inline bool bpf_map_support_seq_show(const struct bpf_map *map) { - return map->btf && map->ops->map_seq_show_elem; + return (map->btf_value_type_id || map->btf_vmlinux_value_type_id) && + map->ops->map_seq_show_elem; } int map_check_no_btf(const struct bpf_map *map, @@ -441,7 +444,8 @@ struct btf_func_model { * fentry = a set of program to run before calling original function * fexit = a set of program to run after original function */ -int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags, +int arch_prepare_bpf_trampoline(void *image, void *image_end, + const struct btf_func_model *m, u32 flags, struct bpf_prog **fentry_progs, int fentry_cnt, struct bpf_prog **fexit_progs, int fexit_cnt, void *orig_call); @@ -672,6 +676,7 @@ struct bpf_array_aux { struct work_struct work; }; +struct bpf_struct_ops_value; struct btf_type; struct btf_member; @@ -681,21 +686,61 @@ struct bpf_struct_ops { int (*init)(struct btf *btf); int (*check_member)(const struct btf_type *t, const struct btf_member *member); + int (*init_member)(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata); + int (*reg)(void *kdata); + void (*unreg)(void *kdata); const struct btf_type *type; + const struct btf_type *value_type; const char *name; struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; u32 type_id; + u32 value_id; }; #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) +#define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id); void bpf_struct_ops_init(struct btf *btf); +bool bpf_struct_ops_get(const void *kdata); +void bpf_struct_ops_put(const void *kdata); +int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, + void *value); +static inline bool bpf_try_module_get(const void *data, struct module *owner) +{ + if (owner == BPF_MODULE_OWNER) + return bpf_struct_ops_get(data); + else + return try_module_get(owner); +} +static inline void bpf_module_put(const void *data, struct module *owner) +{ + if (owner == BPF_MODULE_OWNER) + bpf_struct_ops_put(data); + else + module_put(owner); +} #else static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) { return NULL; } static inline void bpf_struct_ops_init(struct btf *btf) { } +static inline bool bpf_try_module_get(const void *data, struct module *owner) +{ + return try_module_get(owner); +} +static inline void bpf_module_put(const void *data, struct module *owner) +{ + module_put(owner); +} +static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, + void *key, + void *value) +{ + return -EINVAL; +} #endif struct bpf_array { diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index fadd243ffa2d..9f326e6ef885 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -109,3 +109,6 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops) +#if defined(CONFIG_BPF_JIT) +BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) +#endif diff --git a/include/linux/btf.h b/include/linux/btf.h index f74a09a7120b..881e9b76ef49 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -7,6 +7,8 @@ #include #include +#define BTF_TYPE_EMIT(type) ((void)(type *)0) + struct btf; struct btf_member; struct btf_type; @@ -60,6 +62,10 @@ const struct btf_type *btf_type_resolve_ptr(const struct btf *btf, u32 id, u32 *res_id); const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf, u32 id, u32 *res_id); +const struct btf_type * +btf_resolve_size(const struct btf *btf, const struct btf_type *type, + u32 *type_size, const struct btf_type **elem_type, + u32 *total_nelems); #define for_each_member(i, struct_type, member) \ for (i = 0, member = btf_type_member(struct_type); \ @@ -106,6 +112,13 @@ static inline bool btf_type_kflag(const struct btf_type *t) return BTF_INFO_KFLAG(t->info); } +static inline u32 btf_member_bit_offset(const struct btf_type *struct_type, + const struct btf_member *member) +{ + return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset) + : member->offset; +} + static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type, const struct btf_member *member) { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c1eeb3e0e116..38059880963e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -136,6 +136,7 @@ enum bpf_map_type { BPF_MAP_TYPE_STACK, BPF_MAP_TYPE_SK_STORAGE, BPF_MAP_TYPE_DEVMAP_HASH, + BPF_MAP_TYPE_STRUCT_OPS, }; /* Note that tracing related programs such as @@ -398,6 +399,10 @@ union bpf_attr { __u32 btf_fd; /* fd pointing to a BTF type data */ __u32 btf_key_type_id; /* BTF type_id of the key */ __u32 btf_value_type_id; /* BTF type_id of the value */ + __u32 btf_vmlinux_value_type_id;/* BTF type_id of a kernel- + * struct stored as the + * map value + */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ @@ -3350,7 +3355,7 @@ struct bpf_map_info { __u32 map_flags; char name[BPF_OBJ_NAME_LEN]; __u32 ifindex; - __u32 :32; + __u32 btf_vmlinux_value_type_id; __u64 netns_dev; __u64 netns_ino; __u32 btf_id; -- cgit v1.2.3 From 0baf26b0fcd74bbfcef53c5d5e8bad2b99c8d0d2 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 8 Jan 2020 16:35:08 -0800 Subject: bpf: tcp: Support tcp_congestion_ops in bpf This patch makes "struct tcp_congestion_ops" to be the first user of BPF STRUCT_OPS. It allows implementing a tcp_congestion_ops in bpf. The BPF implemented tcp_congestion_ops can be used like regular kernel tcp-cc through sysctl and setsockopt. e.g. [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic net.ipv4.tcp_congestion_control = bpf_cubic There has been attempt to move the TCP CC to the user space (e.g. CCP in TCP). The common arguments are faster turn around, get away from long-tail kernel versions in production...etc, which are legit points. BPF has been the continuous effort to join both kernel and userspace upsides together (e.g. XDP to gain the performance advantage without bypassing the kernel). The recent BPF advancements (in particular BTF-aware verifier, BPF trampoline, BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc) possible in BPF. It allows a faster turnaround for testing algorithm in the production while leveraging the existing (and continue growing) BPF feature/framework instead of building one specifically for userspace TCP CC. This patch allows write access to a few fields in tcp-sock (in bpf_tcp_ca_btf_struct_access()). The optional "get_info" is unsupported now. It can be added later. One possible way is to output the info with a btf-id to describe the content. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Acked-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com --- include/linux/filter.h | 2 ++ include/net/tcp.h | 2 ++ 2 files changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index 70e6dd960bca..a366a0b64a57 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -843,6 +843,8 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog); int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog, bpf_aux_classic_check_t trans, bool save_orig); void bpf_prog_destroy(struct bpf_prog *fp); +const struct bpf_func_proto * +bpf_base_func_proto(enum bpf_func_id func_id); int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk); int sk_attach_bpf(u32 ufd, struct sock *sk); diff --git a/include/net/tcp.h b/include/net/tcp.h index 7df37e2fddca..9dd975be7fdf 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1007,6 +1007,7 @@ enum tcp_ca_ack_event_flags { #define TCP_CONG_NON_RESTRICTED 0x1 /* Requires ECN/ECT set on all packets */ #define TCP_CONG_NEEDS_ECN 0x2 +#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN) union tcp_cc_info; @@ -1101,6 +1102,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk); void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked); extern struct tcp_congestion_ops tcp_reno; +struct tcp_congestion_ops *tcp_ca_find(const char *name); struct tcp_congestion_ops *tcp_ca_find_key(u32 key); u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca); #ifdef CONFIG_INET -- cgit v1.2.3 From 206057fe020ac5c037d5e2dd6562a9bd216ec765 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 8 Jan 2020 16:45:51 -0800 Subject: bpf: Add BPF_FUNC_tcp_send_ack helper Add a helper to send out a tcp-ack. It will be used in the later bpf_dctcp implementation that requires to send out an ack when the CE state changed. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com --- include/uapi/linux/bpf.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 38059880963e..2d6a2e572f56 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2837,6 +2837,14 @@ union bpf_attr { * Return * On success, the strictly positive length of the string, including * the trailing NUL character. On error, a negative value. + * + * int bpf_tcp_send_ack(void *tp, u32 rcv_nxt) + * Description + * Send out a tcp-ack. *tp* is the in-kernel struct tcp_sock. + * *rcv_nxt* is the ack_seq to be sent out. + * Return + * 0 on success, or a negative error in case of failure. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2954,7 +2962,8 @@ union bpf_attr { FN(probe_read_user), \ FN(probe_read_kernel), \ FN(probe_read_user_str), \ - FN(probe_read_kernel_str), + FN(probe_read_kernel_str), \ + FN(tcp_send_ack), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call -- cgit v1.2.3 From f5bfcd953d811dbb8913de36b96b38da6bb62135 Mon Sep 17 00:00:00 2001 From: Andrey Ignatov Date: Tue, 7 Jan 2020 17:40:06 -0800 Subject: bpf: Document BPF_F_QUERY_EFFECTIVE flag Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects attach_flags what may not be obvious and what may lead to confision. Specifically attach_flags is returned only for target_fd but if programs are inherited from an ancestor cgroup then returned attach_flags for current cgroup may be confusing. For example, two effective programs of same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in attach_flags. Simple repro: # bpftool c s /sys/fs/cgroup/path/to/task ID AttachType AttachFlags Name # bpftool c s /sys/fs/cgroup/path/to/task effective ID AttachType AttachFlags Name 95043 ingress tw_ipt_ingress 95048 ingress tw_ingress Signed-off-by: Andrey Ignatov Signed-off-by: Alexei Starovoitov Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com --- include/uapi/linux/bpf.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2d6a2e572f56..52966e758fe5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -359,7 +359,12 @@ enum bpf_attach_type { /* Enable memory-mapping BPF map */ #define BPF_F_MMAPABLE (1U << 10) -/* flags for BPF_PROG_QUERY */ +/* Flags for BPF_PROG_QUERY. */ + +/* Query effective (directly attached + inherited from ancestor cgroups) + * programs that will be executed for events within a cgroup. + * attach_flags with this flag are returned only for directly attached programs. + */ #define BPF_F_QUERY_EFFECTIVE (1U << 0) enum bpf_stack_build_id_status { -- cgit v1.2.3 From e9cdced78dc20c1592c1fb98ed064943007a46c5 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:14 -0800 Subject: net: Make sock protocol value checks more specific SK_PROTOCOL_MAX is only used in two places, for DECNet and AX.25. The limits have more to do with the those protocol definitions than they do with the data type of sk_protocol, so remove SK_PROTOCOL_MAX and use U8_MAX directly. Reviewed-by: Eric Dumazet Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/sock.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 8dff68b4c316..091e55428415 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -458,7 +458,6 @@ struct sock { sk_userlocks : 4, sk_protocol : 8, sk_type : 16; -#define SK_PROTOCOL_MAX U8_MAX u16 sk_gso_max_segs; u8 sk_pacing_shift; unsigned long sk_lingertime; -- cgit v1.2.3 From bf9765145b856fa2e238a5b8a54453795ba30ad6 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:15 -0800 Subject: sock: Make sk_protocol a 16-bit value Match the 16-bit width of skbuff->protocol. Fills an 8-bit hole so sizeof(struct sock) does not change. Also take care of BPF field access for sk_type/sk_protocol. Both of them are now outside the bitfield, so we can use load instructions without further shifting/masking. v5 -> v6: - update eBPF accessors, too (Intel's kbuild test robot) v2 -> v3: - keep 'sk_type' 2 bytes aligned (Eric) v1 -> v2: - preserve sk_pacing_shift as bit field (Eric) Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: bpf@vger.kernel.org Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/sock.h | 25 +++++-------------------- include/trace/events/sock.h | 2 +- 2 files changed, 6 insertions(+), 21 deletions(-) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 091e55428415..8766f9bc3e70 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -436,30 +436,15 @@ struct sock { * Because of non atomicity rules, all * changes are protected by socket lock. */ - unsigned int __sk_flags_offset[0]; -#ifdef __BIG_ENDIAN_BITFIELD -#define SK_FL_PROTO_SHIFT 16 -#define SK_FL_PROTO_MASK 0x00ff0000 - -#define SK_FL_TYPE_SHIFT 0 -#define SK_FL_TYPE_MASK 0x0000ffff -#else -#define SK_FL_PROTO_SHIFT 8 -#define SK_FL_PROTO_MASK 0x0000ff00 - -#define SK_FL_TYPE_SHIFT 16 -#define SK_FL_TYPE_MASK 0xffff0000 -#endif - - unsigned int sk_padding : 1, + u8 sk_padding : 1, sk_kern_sock : 1, sk_no_check_tx : 1, sk_no_check_rx : 1, - sk_userlocks : 4, - sk_protocol : 8, - sk_type : 16; - u16 sk_gso_max_segs; + sk_userlocks : 4; u8 sk_pacing_shift; + u16 sk_type; + u16 sk_protocol; + u16 sk_gso_max_segs; unsigned long sk_lingertime; struct proto *sk_prot_creator; rwlock_t sk_callback_lock; diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 51fe9f6719eb..3ff12b90048d 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -147,7 +147,7 @@ TRACE_EVENT(inet_sock_set_state, __field(__u16, sport) __field(__u16, dport) __field(__u16, family) - __field(__u8, protocol) + __field(__u16, protocol) __array(__u8, saddr, 4) __array(__u8, daddr, 4) __array(__u8, saddr_v6, 16) -- cgit v1.2.3 From faf391c3826cd29feae02078ca2022d2f912f7cc Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:16 -0800 Subject: tcp: Define IPPROTO_MPTCP To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP), IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing IPPROTO numbers mostly map directly to IANA-specified protocol numbers. MPTCP does not have a protocol number allocated because MPTCP packets use the TCP protocol number. Use private number not used OTA. Reviewed-by: Eric Dumazet Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/trace/events/sock.h | 3 ++- include/uapi/linux/in.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 3ff12b90048d..a966d4b5ab37 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -19,7 +19,8 @@ #define inet_protocol_names \ EM(IPPROTO_TCP) \ EM(IPPROTO_DCCP) \ - EMe(IPPROTO_SCTP) + EM(IPPROTO_SCTP) \ + EMe(IPPROTO_MPTCP) #define tcp_state_names \ EM(TCP_ESTABLISHED) \ diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index e7ad9d350a28..1521073b6348 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -76,6 +76,8 @@ enum { #define IPPROTO_MPLS IPPROTO_MPLS IPPROTO_RAW = 255, /* Raw IP packets */ #define IPPROTO_RAW IPPROTO_RAW + IPPROTO_MPTCP = 262, /* Multipath TCP connection */ +#define IPPROTO_MPTCP IPPROTO_MPTCP IPPROTO_MAX }; #endif -- cgit v1.2.3 From c74a39c861aeaf6b789b7abdbb3256f54c9fb365 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:17 -0800 Subject: tcp: Add MPTCP option number TCP option 30 is allocated for MPTCP by the IANA. Reviewed-by: Eric Dumazet Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/tcp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 7df37e2fddca..85f1d7ff6e8b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -182,6 +182,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOPT_SACK 5 /* SACK Block */ #define TCPOPT_TIMESTAMP 8 /* Better RTT estimations/PAWS */ #define TCPOPT_MD5SIG 19 /* MD5 Signature (RFC2385) */ +#define TCPOPT_MPTCP 30 /* Multipath TCP (RFC6824) */ #define TCPOPT_FASTOPEN 34 /* Fast open (RFC7413) */ #define TCPOPT_EXP 254 /* Experimental */ /* Magic number to be after the option value for sharing TCP -- cgit v1.2.3 From 1323059301c8f36d933876233516245d882346a6 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:18 -0800 Subject: tcp, ulp: Add clone operation to tcp_ulp_ops If ULP is used on a listening socket, icsk_ulp_ops and icsk_ulp_data are copied when the listener is cloned. Sometimes the clone is immediately deleted, which will invoke the release op on the clone and likely corrupt the listening socket's icsk_ulp_data. The clone operation is invoked immediately after the clone is copied and gives the ULP type an opportunity to set up the clone socket and its icsk_ulp_data. The MPTCP ULP clone will silently fallback to plain TCP on allocation failure, so 'clone()' does not need to return an error code. v6 -> v7: - move and rename ulp clone helper to make it inline-friendly v5 -> v6: - clarified MPTCP clone usage in commit message Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/tcp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 85f1d7ff6e8b..ac52633e7061 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2154,6 +2154,9 @@ struct tcp_ulp_ops { /* diagnostic */ int (*get_info)(const struct sock *sk, struct sk_buff *skb); size_t (*get_info_size)(const struct sock *sk); + /* clone ulp */ + void (*clone)(const struct request_sock *req, struct sock *newsk, + const gfp_t priority); char name[TCP_ULP_NAME_MAX]; struct module *owner; -- cgit v1.2.3 From 3ee17bc78e0f3fdeff9890993e8f3a9f5145163b Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:19 -0800 Subject: mptcp: Add MPTCP to skb extensions Add enum value for MPTCP and update config dependencies v5 -> v6: - fixed '__unused' field size Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +++ include/net/mptcp.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) create mode 100644 include/net/mptcp.h (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 64e5b1be9ff5..f5c27600b410 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4096,6 +4096,9 @@ enum skb_ext_id { #endif #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) TC_SKB_EXT, +#endif +#if IS_ENABLED(CONFIG_MPTCP) + SKB_EXT_MPTCP, #endif SKB_EXT_NUM, /* must be last */ }; diff --git a/include/net/mptcp.h b/include/net/mptcp.h new file mode 100644 index 000000000000..326043c29c0a --- /dev/null +++ b/include/net/mptcp.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Multipath TCP + * + * Copyright (c) 2017 - 2019, Intel Corporation. + */ + +#ifndef __NET_MPTCP_H +#define __NET_MPTCP_H + +#include + +/* MPTCP sk_buff extension data */ +struct mptcp_ext { + u64 data_ack; + u64 data_seq; + u32 subflow_seq; + u16 data_len; + u8 use_map:1, + dsn64:1, + data_fin:1, + use_ack:1, + ack64:1, + __unused:3; + /* one byte hole */ +}; + +#endif /* __NET_MPTCP_H */ -- cgit v1.2.3 From 85712484110df308215077be6ee21c4e57d7dec2 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:20 -0800 Subject: tcp: coalesce/collapse must respect MPTCP extensions Coalesce and collapse of packets carrying MPTCP extensions is allowed when the newer packet has no extension or the extensions carried by both packets are equal. This allows merging of TSO packet trains and even cross-TSO packets, and does not require any additional action when moving data into existing SKBs. v3 -> v4: - allow collapsing, under mptcp_skb_can_collapse() constraint v5 -> v6: - clarify MPTCP skb extensions must always be cleared at allocation time Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/mptcp.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++ include/net/tcp.h | 8 ++++++++ 2 files changed, 65 insertions(+) (limited to 'include') diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 326043c29c0a..0573ae75c3db 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -8,6 +8,7 @@ #ifndef __NET_MPTCP_H #define __NET_MPTCP_H +#include #include /* MPTCP sk_buff extension data */ @@ -25,4 +26,60 @@ struct mptcp_ext { /* one byte hole */ }; +#ifdef CONFIG_MPTCP + +/* move the skb extension owership, with the assumption that 'to' is + * newly allocated + */ +static inline void mptcp_skb_ext_move(struct sk_buff *to, + struct sk_buff *from) +{ + if (!skb_ext_exist(from, SKB_EXT_MPTCP)) + return; + + if (WARN_ON_ONCE(to->active_extensions)) + skb_ext_put(to); + + to->active_extensions = from->active_extensions; + to->extensions = from->extensions; + from->active_extensions = 0; +} + +static inline bool mptcp_ext_matches(const struct mptcp_ext *to_ext, + const struct mptcp_ext *from_ext) +{ + /* MPTCP always clears the ext when adding it to the skb, so + * holes do not bother us here + */ + return !from_ext || + (to_ext && from_ext && + !memcmp(from_ext, to_ext, sizeof(struct mptcp_ext))); +} + +/* check if skbs can be collapsed. + * MPTCP collapse is allowed if neither @to or @from carry an mptcp data + * mapping, or if the extension of @to is the same as @from. + * Collapsing is not possible if @to lacks an extension, but @from carries one. + */ +static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, + const struct sk_buff *from) +{ + return mptcp_ext_matches(skb_ext_find(to, SKB_EXT_MPTCP), + skb_ext_find(from, SKB_EXT_MPTCP)); +} + +#else + +static inline void mptcp_skb_ext_move(struct sk_buff *to, + const struct sk_buff *from) +{ +} + +static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, + const struct sk_buff *from) +{ + return true; +} + +#endif /* CONFIG_MPTCP */ #endif /* __NET_MPTCP_H */ diff --git a/include/net/tcp.h b/include/net/tcp.h index ac52633e7061..13bc83fab454 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -978,6 +979,13 @@ static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) return likely(!TCP_SKB_CB(skb)->eor); } +static inline bool tcp_skb_can_collapse(const struct sk_buff *to, + const struct sk_buff *from) +{ + return likely(tcp_skb_can_collapse_to(to) && + mptcp_skb_can_collapse(to, from)); +} + /* Events passed to congestion control interface */ enum tcp_ca_event { CA_EVENT_TX_START, /* first transmit when no packets in flight */ -- cgit v1.2.3 From 35b2c32116091ef87a15c604cac363da8322a288 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Thu, 9 Jan 2020 07:59:21 -0800 Subject: tcp: Export TCP functions and ops struct MPTCP will make use of tcp_send_mss() and tcp_push() when sending data to specific TCP subflows. tcp_request_sock_ipvX_ops and ipvX_specific will be referenced during TCP subflow creation. Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Reviewed-by: Eric Dumazet Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/tcp.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 13bc83fab454..5e4133d09b9d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -330,6 +330,9 @@ int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags); ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags); +int tcp_send_mss(struct sock *sk, int *size_goal, int flags); +void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle, + int size_goal); void tcp_release_cb(struct sock *sk); void tcp_wfree(struct sk_buff *skb); void tcp_write_timer_handler(struct sock *sk); @@ -2011,6 +2014,11 @@ struct tcp_request_sock_ops { enum tcp_synack_type synack_type); }; +extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops; +#if IS_ENABLED(CONFIG_IPV6) +extern const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops; +#endif + #ifdef CONFIG_SYN_COOKIES static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, const struct sock *sk, struct sk_buff *skb, -- cgit v1.2.3 From e66b2f31a068dd67172008459678821a79e4ea24 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 9 Jan 2020 07:59:23 -0800 Subject: tcp: clean ext on tx recycle Otherwise we will find stray/unexpected/old extensions value on next iteration. On tcp_write_xmit() we can end-up splitting an already queued skb in two parts, via tso_fragment(). The newly created skb can be allocated via the tx cache and an upper layer will not be aware of it, so that upper layer cannot set the ext properly. Resetting the ext on recycle ensures that stale data is not propagated in to packet headers or elsewhere. An alternative would be add an additional hook in tso_fragment() or in sk_stream_alloc_skb() to init the ext for upper layers that need it. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Paolo Abeni Reviewed-by: Eric Dumazet Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/sock.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 8766f9bc3e70..432ff73d20f3 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1464,6 +1464,7 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); if (static_branch_unlikely(&tcp_tx_skb_cache_key) && !sk->sk_tx_skb_cache && !skb_cloned(skb)) { + skb_ext_reset(skb); skb_zcopy_clear(skb, true); sk->sk_tx_skb_cache = skb; return; -- cgit v1.2.3 From 8b69a803814bb8b14155ea60df83f6d57527e69e Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 9 Jan 2020 07:59:24 -0800 Subject: skb: add helpers to allocate ext independently from sk_buff Currently we can allocate the extension only after the skb, this change allows the user to do the opposite, will simplify allocation failure handling from MPTCP. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f5c27600b410..016b3c4ab99a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4120,6 +4120,9 @@ struct skb_ext { char data[0] __aligned(8); }; +struct skb_ext *__skb_ext_alloc(void); +void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id, + struct skb_ext *ext); void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id); void __skb_ext_del(struct sk_buff *skb, enum skb_ext_id id); void __skb_ext_put(struct skb_ext *ext); -- cgit v1.2.3 From 51c39bb1d5d105a02e29aa7960f0a395086e6342 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Thu, 9 Jan 2020 22:41:20 -0800 Subject: bpf: Introduce function-by-function verification New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org --- include/linux/bpf.h | 7 ++++++- include/linux/bpf_verifier.h | 10 ++++++++-- include/uapi/linux/btf.h | 6 ++++++ 3 files changed, 20 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a7bfe8a388c6..aed2bc39d72b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -566,6 +566,7 @@ static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, #endif struct bpf_func_info_aux { + u16 linkage; bool unreliable; }; @@ -1081,7 +1082,11 @@ int btf_distill_func_proto(struct bpf_verifier_log *log, const char *func_name, struct btf_func_model *m); -int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog); +struct bpf_reg_state; +int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog, + struct bpf_reg_state *regs); +int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, + struct bpf_reg_state *reg); struct bpf_prog *bpf_prog_by_id(u32 id); diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 26e40de9ef55..5406e6e96585 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -304,11 +304,13 @@ struct bpf_insn_aux_data { u64 map_key_state; /* constant (32 bit) key tracking for maps */ int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ int sanitize_stack_off; /* stack slot to be cleared */ - bool seen; /* this insn was processed by the verifier */ + u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ bool zext_dst; /* this insn zero extends dst reg */ u8 alu_state; /* used in combination with alu_limit */ - bool prune_point; + + /* below fields are initialized once */ unsigned int orig_idx; /* original instruction index */ + bool prune_point; }; #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ @@ -379,6 +381,7 @@ struct bpf_verifier_env { int *insn_stack; int cur_stack; } cfg; + u32 pass_cnt; /* number of times do_check() was called */ u32 subprog_cnt; /* number of instructions analyzed by the verifier */ u32 prev_insn_processed, insn_processed; @@ -428,4 +431,7 @@ bpf_prog_offload_replace_insn(struct bpf_verifier_env *env, u32 off, void bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt); +int check_ctx_reg(struct bpf_verifier_env *env, + const struct bpf_reg_state *reg, int regno); + #endif /* _LINUX_BPF_VERIFIER_H */ diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h index 1a2898c482ee..5a667107ad2c 100644 --- a/include/uapi/linux/btf.h +++ b/include/uapi/linux/btf.h @@ -146,6 +146,12 @@ enum { BTF_VAR_GLOBAL_EXTERN = 2, }; +enum btf_func_linkage { + BTF_FUNC_STATIC = 0, + BTF_FUNC_GLOBAL = 1, + BTF_FUNC_EXTERN = 2, +}; + /* BTF_KIND_VAR is followed by a single "struct btf_var" to describe * additional information related to the variable such as its linkage. */ -- cgit v1.2.3 From 90fbca5952436e7817910b33eb4464ddd77a8964 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 12 Dec 2019 13:09:24 +0200 Subject: net/mlx5: Add Virtio Emulation related device capabilities Add Virtio Emulation related fields to the device capabilities. It includes a general bit to indicate whether Virtio Emulation is supported and the capabilities structure itself. Signed-off-by: Yishai Hadas Reviewed-by: Shahaf Shuler Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5d54fccf87fc..c6abaf4f1c55 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -87,6 +87,7 @@ enum { enum { MLX5_GENERAL_OBJ_TYPES_CAP_SW_ICM = (1ULL << MLX5_OBJ_TYPE_SW_ICM), MLX5_GENERAL_OBJ_TYPES_CAP_GENEVE_TLV_OPT = (1ULL << 11), + MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q = (1ULL << 13), }; enum { @@ -953,6 +954,19 @@ struct mlx5_ifc_device_event_cap_bits { u8 user_unaffiliated_events[4][0x40]; }; +struct mlx5_ifc_device_virtio_emulation_cap_bits { + u8 reserved_at_0[0x20]; + + u8 reserved_at_20[0x13]; + u8 log_doorbell_stride[0x5]; + u8 reserved_at_38[0x3]; + u8 log_doorbell_bar_size[0x5]; + + u8 doorbell_bar_offset[0x40]; + + u8 reserved_at_80[0x780]; +}; + enum { MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_1_BYTE = 0x0, MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_2_BYTES = 0x2, @@ -2751,6 +2765,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_fpga_cap_bits fpga_cap; struct mlx5_ifc_tls_cap_bits tls_cap; struct mlx5_ifc_device_mem_cap_bits device_mem_cap; + struct mlx5_ifc_device_virtio_emulation_cap_bits virtio_emulation_cap; u8 reserved_at_0[0x8000]; }; -- cgit v1.2.3 From ca1992c62cadb6c8e1e1b47e197b550f3cd89b76 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 12 Dec 2019 13:09:25 +0200 Subject: net/mlx5: Expose vDPA emulation device capabilities Expose vDPA emulation device capabilities from the core layer. It includes reading the capabilities from the firmware and exposing helper functions to access the data. Signed-off-by: Yishai Hadas Reviewed-by: Shahaf Shuler Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index cc1c230f10ee..1a1c53f0262d 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1105,6 +1105,7 @@ enum mlx5_cap_type { MLX5_CAP_DEV_MEM, MLX5_CAP_RESERVED_16, MLX5_CAP_TLS, + MLX5_CAP_VDPA_EMULATION = 0x13, MLX5_CAP_DEV_EVENT = 0x14, /* NUM OF CAP Types */ MLX5_CAP_NUM @@ -1297,6 +1298,14 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_DEV_EVENT(mdev, cap)\ MLX5_ADDR_OF(device_event_cap, (mdev)->caps.hca_cur[MLX5_CAP_DEV_EVENT], cap) +#define MLX5_CAP_DEV_VDPA_EMULATION(mdev, cap)\ + MLX5_GET(device_virtio_emulation_cap, \ + (mdev)->caps.hca_cur[MLX5_CAP_VDPA_EMULATION], cap) + +#define MLX5_CAP64_DEV_VDPA_EMULATION(mdev, cap)\ + MLX5_GET64(device_virtio_emulation_cap, \ + (mdev)->caps.hca_cur[MLX5_CAP_VDPA_EMULATION], cap) + enum { MLX5_CMD_STAT_OK = 0x0, MLX5_CMD_STAT_INT_ERR = 0x1, -- cgit v1.2.3 From 468672b24fbc1c018e192dcc90e887bc9a9b2595 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Thu, 9 Jan 2020 14:46:09 -0800 Subject: devlink: add macro for "fw.psid" The "fw.psid" devlink info version is documented in devlink-info.rst, and used by one driver. However, there is no associated macro for this firmware version like there is for others. Add one now. Signed-off-by: Jacob Keller Signed-off-by: David S. Miller --- include/net/devlink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 453f45cc1519..4e80d9acdb86 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -485,6 +485,8 @@ enum devlink_param_generic_id { #define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI "fw.undi" /* NCSI support/handler version */ #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi" +/* FW parameter set id */ +#define DEVLINK_INFO_VERSION_GENERIC_FW_PSID "fw.psid" struct devlink_region; struct devlink_info_req; -- cgit v1.2.3 From f4bdd7103652fab5ac8b0ed75fa5cbc515b50b8b Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Thu, 9 Jan 2020 14:46:10 -0800 Subject: devlink: move devlink documentation to subfolder Combine the documentation for devlink into a subfolder, and provide an index.rst file that can be used to generally describe devlink. Signed-off-by: Jacob Keller Signed-off-by: David S. Miller --- include/net/devlink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 4e80d9acdb86..a6856f1d5d1f 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -564,7 +564,7 @@ struct devlink_trap { }; /* All traps must be documented in - * Documentation/networking/devlink-trap.rst + * Documentation/networking/devlink/devlink-trap.rst */ enum devlink_trap_generic_id { DEVLINK_TRAP_GENERIC_ID_SMAC_MC, @@ -598,7 +598,7 @@ enum devlink_trap_generic_id { }; /* All trap groups must be documented in - * Documentation/networking/devlink-trap.rst + * Documentation/networking/devlink/devlink-trap.rst */ enum devlink_trap_group_generic_id { DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS, -- cgit v1.2.3 From a442c2c3850dc308ab972f3d10d1077e2c8fd035 Mon Sep 17 00:00:00 2001 From: Jonathan Lemon Date: Thu, 9 Jan 2020 11:23:17 -0800 Subject: mlx4: Bump up MAX_MSIX from 64 to 128 On modern hardware with a large number of cpus and using XDP, the current MSIX limit is insufficient. Bump the limit in order to allow more queues. Signed-off-by: Jonathan Lemon Reviewed-by: Jack Wang Reviewed-by: Tariq Toukan Signed-off-by: Jakub Kicinski --- include/linux/mlx4/device.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 36e412c3d657..20372de0b587 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -47,7 +47,7 @@ #define DEFAULT_UAR_PAGE_SHIFT 12 #define MAX_MSIX_P_PORT 17 -#define MAX_MSIX 64 +#define MAX_MSIX 128 #define MIN_MSIX_P_PORT 5 #define MLX4_IS_LEGACY_EQ_MODE(dev_cap) ((dev_cap).num_comp_vectors < \ (dev_cap).num_ports * MIN_MSIX_P_PORT) -- cgit v1.2.3 From c74f16b6034401b17bb1cf549871186a8ece5f92 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 12 Jan 2020 13:04:43 +0100 Subject: wan: ixp4xx_hss: prepare compile testing The ixp4xx_hss driver needs the platform data definition and the system clock rate to be compiled. Move both into a new platform_data header file. This is a prerequisite for compile testing, but turning on compile testing requires further patches to isolate the SoC headers. Signed-off-by: Arnd Bergmann Signed-off-by: Linus Walleij Signed-off-by: Jakub Kicinski --- include/linux/platform_data/wan_ixp4xx_hss.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 include/linux/platform_data/wan_ixp4xx_hss.h (limited to 'include') diff --git a/include/linux/platform_data/wan_ixp4xx_hss.h b/include/linux/platform_data/wan_ixp4xx_hss.h new file mode 100644 index 000000000000..d525a0feb9e1 --- /dev/null +++ b/include/linux/platform_data/wan_ixp4xx_hss.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PLATFORM_DATA_WAN_IXP4XX_HSS_H +#define __PLATFORM_DATA_WAN_IXP4XX_HSS_H + +#include + +/* Information about built-in HSS (synchronous serial) interfaces */ +struct hss_plat_info { + int (*set_clock)(int port, unsigned int clock_type); + int (*open)(int port, void *pdev, + void (*set_carrier_cb)(void *pdev, int carrier)); + void (*close)(int port, void *pdev); + u8 txreadyq; + u32 timer_freq; +}; + +#endif -- cgit v1.2.3 From a41a5b26d29fb0123cd3290dca453857cd8c0c66 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 12 Jan 2020 13:04:45 +0100 Subject: ixp4xx_eth: move platform_data definition The platform data is needed to compile the driver as standalone, so move it to a global location along with similar files. Signed-off-by: Arnd Bergmann Signed-off-by: Linus Walleij Signed-off-by: Jakub Kicinski --- include/linux/platform_data/eth_ixp4xx.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 include/linux/platform_data/eth_ixp4xx.h (limited to 'include') diff --git a/include/linux/platform_data/eth_ixp4xx.h b/include/linux/platform_data/eth_ixp4xx.h new file mode 100644 index 000000000000..6f652ea0c6ae --- /dev/null +++ b/include/linux/platform_data/eth_ixp4xx.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PLATFORM_DATA_ETH_IXP4XX +#define __PLATFORM_DATA_ETH_IXP4XX + +#include + +#define IXP4XX_ETH_NPEA 0x00 +#define IXP4XX_ETH_NPEB 0x10 +#define IXP4XX_ETH_NPEC 0x20 + +/* Information about built-in Ethernet MAC interfaces */ +struct eth_plat_info { + u8 phy; /* MII PHY ID, 0 - 31 */ + u8 rxq; /* configurable, currently 0 - 31 only */ + u8 txreadyq; + u8 hwaddr[6]; +}; + +#endif -- cgit v1.2.3 From 0eac8ce95bb386838121189b2aa2216cd070f143 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Mon, 13 Jan 2020 11:22:16 +0100 Subject: ptr_ring: add include of linux/mm.h Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") started to use kvmalloc_array and kvfree, which are defined in mm.h, the previous functions kcalloc and kfree, which are defined in slab.h. Add the missing include of linux/mm.h. This went unnoticed as other include files happened to include mm.h. Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails") Signed-off-by: Jesper Dangaard Brouer Acked-by: Michael S. Tsirkin Signed-off-by: Jakub Kicinski --- include/linux/ptr_ring.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 0abe9a4fc842..417db0a79a62 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -23,6 +23,7 @@ #include #include #include +#include #include #endif -- cgit v1.2.3 From 579a25a854d482bc9d0f9ab0e07ba32fb66bd9e3 Mon Sep 17 00:00:00 2001 From: Jose Abreu Date: Mon, 13 Jan 2020 17:24:09 +0100 Subject: net: stmmac: Initial support for TBS Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 0531afa9b21e..19190c609282 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -139,6 +139,7 @@ struct stmmac_txq_cfg { u32 low_credit; bool use_prio; u32 prio; + int tbs_en; }; struct plat_stmmacenet_data { -- cgit v1.2.3 From e27f178793de16ca1b421f2c3f4bc3497b2ce723 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Sun, 12 Jan 2020 09:35:38 -0800 Subject: net: phy: Added IRQ print to phylink_bringup_phy() The information about the PHY attached to the PHYLINK instance is useful but is missing the IRQ prints that phy_attached_info() adds. phy_attached_info() is a bit long and it would not be possible to use phylink_info() anyway. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/phy.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 5932bb8e9c35..3a70b756ac1a 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1131,6 +1131,8 @@ static inline void phy_unlock_mdio_bus(struct phy_device *phydev) void phy_attached_print(struct phy_device *phydev, const char *fmt, ...) __printf(2, 3); +char *phy_attached_info_irq(struct phy_device *phydev) + __malloc; void phy_attached_info(struct phy_device *phydev); /* Clause 22 PHY */ -- cgit v1.2.3 From c0e4eadfb8daf2e9557c7450f9b237c08b404419 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:39 +0100 Subject: net: macsec: move some definitions in a dedicated header This patch moves some structure, type and identifier definitions into a MACsec specific header. This patch does not modify how the MACsec code is running and only move things around. This is a preparation for the future MACsec hardware offloading support, which will re-use those definitions outside macsec.c. Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/net/macsec.h | 177 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) create mode 100644 include/net/macsec.h (limited to 'include') diff --git a/include/net/macsec.h b/include/net/macsec.h new file mode 100644 index 000000000000..e7b41c1043f6 --- /dev/null +++ b/include/net/macsec.h @@ -0,0 +1,177 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * MACsec netdev header, used for h/w accelerated implementations. + * + * Copyright (c) 2015 Sabrina Dubroca + */ +#ifndef _NET_MACSEC_H_ +#define _NET_MACSEC_H_ + +#include +#include +#include + +typedef u64 __bitwise sci_t; + +#define MACSEC_NUM_AN 4 /* 2 bits for the association number */ + +/** + * struct macsec_key - SA key + * @id: user-provided key identifier + * @tfm: crypto struct, key storage + */ +struct macsec_key { + u8 id[MACSEC_KEYID_LEN]; + struct crypto_aead *tfm; +}; + +struct macsec_rx_sc_stats { + __u64 InOctetsValidated; + __u64 InOctetsDecrypted; + __u64 InPktsUnchecked; + __u64 InPktsDelayed; + __u64 InPktsOK; + __u64 InPktsInvalid; + __u64 InPktsLate; + __u64 InPktsNotValid; + __u64 InPktsNotUsingSA; + __u64 InPktsUnusedSA; +}; + +struct macsec_rx_sa_stats { + __u32 InPktsOK; + __u32 InPktsInvalid; + __u32 InPktsNotValid; + __u32 InPktsNotUsingSA; + __u32 InPktsUnusedSA; +}; + +struct macsec_tx_sa_stats { + __u32 OutPktsProtected; + __u32 OutPktsEncrypted; +}; + +struct macsec_tx_sc_stats { + __u64 OutPktsProtected; + __u64 OutPktsEncrypted; + __u64 OutOctetsProtected; + __u64 OutOctetsEncrypted; +}; + +/** + * struct macsec_rx_sa - receive secure association + * @active: + * @next_pn: packet number expected for the next packet + * @lock: protects next_pn manipulations + * @key: key structure + * @stats: per-SA stats + */ +struct macsec_rx_sa { + struct macsec_key key; + spinlock_t lock; + u32 next_pn; + refcount_t refcnt; + bool active; + struct macsec_rx_sa_stats __percpu *stats; + struct macsec_rx_sc *sc; + struct rcu_head rcu; +}; + +struct pcpu_rx_sc_stats { + struct macsec_rx_sc_stats stats; + struct u64_stats_sync syncp; +}; + +struct pcpu_tx_sc_stats { + struct macsec_tx_sc_stats stats; + struct u64_stats_sync syncp; +}; + +/** + * struct macsec_rx_sc - receive secure channel + * @sci: secure channel identifier for this SC + * @active: channel is active + * @sa: array of secure associations + * @stats: per-SC stats + */ +struct macsec_rx_sc { + struct macsec_rx_sc __rcu *next; + sci_t sci; + bool active; + struct macsec_rx_sa __rcu *sa[MACSEC_NUM_AN]; + struct pcpu_rx_sc_stats __percpu *stats; + refcount_t refcnt; + struct rcu_head rcu_head; +}; + +/** + * struct macsec_tx_sa - transmit secure association + * @active: + * @next_pn: packet number to use for the next packet + * @lock: protects next_pn manipulations + * @key: key structure + * @stats: per-SA stats + */ +struct macsec_tx_sa { + struct macsec_key key; + spinlock_t lock; + u32 next_pn; + refcount_t refcnt; + bool active; + struct macsec_tx_sa_stats __percpu *stats; + struct rcu_head rcu; +}; + +/** + * struct macsec_tx_sc - transmit secure channel + * @active: + * @encoding_sa: association number of the SA currently in use + * @encrypt: encrypt packets on transmit, or authenticate only + * @send_sci: always include the SCI in the SecTAG + * @end_station: + * @scb: single copy broadcast flag + * @sa: array of secure associations + * @stats: stats for this TXSC + */ +struct macsec_tx_sc { + bool active; + u8 encoding_sa; + bool encrypt; + bool send_sci; + bool end_station; + bool scb; + struct macsec_tx_sa __rcu *sa[MACSEC_NUM_AN]; + struct pcpu_tx_sc_stats __percpu *stats; +}; + +/** + * struct macsec_secy - MACsec Security Entity + * @netdev: netdevice for this SecY + * @n_rx_sc: number of receive secure channels configured on this SecY + * @sci: secure channel identifier used for tx + * @key_len: length of keys used by the cipher suite + * @icv_len: length of ICV used by the cipher suite + * @validate_frames: validation mode + * @operational: MAC_Operational flag + * @protect_frames: enable protection for this SecY + * @replay_protect: enable packet number checks on receive + * @replay_window: size of the replay window + * @tx_sc: transmit secure channel + * @rx_sc: linked list of receive secure channels + */ +struct macsec_secy { + struct net_device *netdev; + unsigned int n_rx_sc; + sci_t sci; + u16 key_len; + u16 icv_len; + enum macsec_validation_type validate_frames; + bool operational; + bool protect_frames; + bool replay_protect; + u32 replay_window; + struct macsec_tx_sc tx_sc; + struct macsec_rx_sc __rcu *rx_sc; +}; + +#endif /* _NET_MACSEC_H_ */ -- cgit v1.2.3 From 76564261a7db80c5f5c624e0122a28787f266bdf Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:40 +0100 Subject: net: macsec: introduce the macsec_context structure This patch introduces the macsec_context structure. It will be used in the kernel to exchange information between the common MACsec implementation (macsec.c) and the MACsec hardware offloading implementations. This structure contains pointers to MACsec specific structures which contain the actual MACsec configuration, and to the underlying device (phydev for now). Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/linux/phy.h | 2 ++ include/net/macsec.h | 21 +++++++++++++++++++++ include/uapi/linux/if_link.h | 7 +++++++ 3 files changed, 30 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 3a70b756ac1a..be079a7bb40a 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -332,6 +332,8 @@ struct phy_c45_device_ids { u32 device_ids[8]; }; +struct macsec_context; + /* phy_device: An instance of a PHY * * drv: Pointer to the driver for this PHY instance diff --git a/include/net/macsec.h b/include/net/macsec.h index e7b41c1043f6..0b98803f92ec 100644 --- a/include/net/macsec.h +++ b/include/net/macsec.h @@ -174,4 +174,25 @@ struct macsec_secy { struct macsec_rx_sc __rcu *rx_sc; }; +/** + * struct macsec_context - MACsec context for hardware offloading + */ +struct macsec_context { + struct phy_device *phydev; + enum macsec_offload offload; + + struct macsec_secy *secy; + struct macsec_rx_sc *rx_sc; + struct { + unsigned char assoc_num; + u8 key[MACSEC_KEYID_LEN]; + union { + struct macsec_rx_sa *rx_sa; + struct macsec_tx_sa *tx_sa; + }; + } sa; + + u8 prepare:1; +}; + #endif /* _NET_MACSEC_H_ */ diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 1d69f637c5d6..024af2d1d0af 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -486,6 +486,13 @@ enum macsec_validation_type { MACSEC_VALIDATE_MAX = __MACSEC_VALIDATE_END - 1, }; +enum macsec_offload { + MACSEC_OFFLOAD_OFF = 0, + MACSEC_OFFLOAD_PHY = 1, + __MACSEC_OFFLOAD_END, + MACSEC_OFFLOAD_MAX = __MACSEC_OFFLOAD_END - 1, +}; + /* IPVLAN section */ enum { IFLA_IPVLAN_UNSPEC, -- cgit v1.2.3 From 0830e20b62ad156f7df5ff5b9c4cea280ebe8fef Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:41 +0100 Subject: net: macsec: introduce MACsec ops This patch introduces MACsec ops for drivers to support offloading MACsec operations. Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/net/macsec.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include') diff --git a/include/net/macsec.h b/include/net/macsec.h index 0b98803f92ec..16e7e5061178 100644 --- a/include/net/macsec.h +++ b/include/net/macsec.h @@ -195,4 +195,28 @@ struct macsec_context { u8 prepare:1; }; +/** + * struct macsec_ops - MACsec offloading operations + */ +struct macsec_ops { + /* Device wide */ + int (*mdo_dev_open)(struct macsec_context *ctx); + int (*mdo_dev_stop)(struct macsec_context *ctx); + /* SecY */ + int (*mdo_add_secy)(struct macsec_context *ctx); + int (*mdo_upd_secy)(struct macsec_context *ctx); + int (*mdo_del_secy)(struct macsec_context *ctx); + /* Security channels */ + int (*mdo_add_rxsc)(struct macsec_context *ctx); + int (*mdo_upd_rxsc)(struct macsec_context *ctx); + int (*mdo_del_rxsc)(struct macsec_context *ctx); + /* Security associations */ + int (*mdo_add_rxsa)(struct macsec_context *ctx); + int (*mdo_upd_rxsa)(struct macsec_context *ctx); + int (*mdo_del_rxsa)(struct macsec_context *ctx); + int (*mdo_add_txsa)(struct macsec_context *ctx); + int (*mdo_upd_txsa)(struct macsec_context *ctx); + int (*mdo_del_txsa)(struct macsec_context *ctx); +}; + #endif /* _NET_MACSEC_H_ */ -- cgit v1.2.3 From 2e18135845b359f26c37df38ba56565496517c10 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:42 +0100 Subject: net: phy: add MACsec ops in phy_device This patch adds a reference to MACsec ops in the phy_device, to allow PHYs to support offloading MACsec operations. The phydev lock will be held while calling those helpers. Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/linux/phy.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index be079a7bb40a..2929d0bc307f 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -333,6 +333,7 @@ struct phy_c45_device_ids { }; struct macsec_context; +struct macsec_ops; /* phy_device: An instance of a PHY * @@ -356,6 +357,7 @@ struct macsec_context; * attached_dev: The attached enet driver's device instance ptr * adjust_link: Callback for the enet controller to respond to * changes in the link state. + * macsec_ops: MACsec offloading ops. * * speed, duplex, pause, supported, advertising, lp_advertising, * and autoneg are used like in mii_if_info @@ -455,6 +457,11 @@ struct phy_device { void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier); void (*adjust_link)(struct net_device *dev); + +#if IS_ENABLED(CONFIG_MACSEC) + /* MACsec management functions */ + const struct macsec_ops *macsec_ops; +#endif }; #define to_phy_device(d) container_of(to_mdio_device(d), \ struct phy_device, mdio) -- cgit v1.2.3 From dcb780fb279514f268826f2e9f4df3bc75610703 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:44 +0100 Subject: net: macsec: add nla support for changing the offloading selection MACsec offloading to underlying hardware devices is disabled by default (the software implementation is used). This patch adds support for changing this setting through the MACsec netlink interface. Many checks are done when enabling offloading on a given MACsec interface as there are limitations (it must be supported by the hardware, only a single interface can be offloaded on a given physical device at a time, rules can't be moved for now). Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/uapi/linux/if_macsec.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h index 98e4d5d7c45c..1d63c43c38cc 100644 --- a/include/uapi/linux/if_macsec.h +++ b/include/uapi/linux/if_macsec.h @@ -45,6 +45,7 @@ enum macsec_attrs { MACSEC_ATTR_RXSC_LIST, /* dump, nested, macsec_rxsc_attrs for each RXSC */ MACSEC_ATTR_TXSC_STATS, /* dump, nested, macsec_txsc_stats_attr */ MACSEC_ATTR_SECY_STATS, /* dump, nested, macsec_secy_stats_attr */ + MACSEC_ATTR_OFFLOAD, /* config, nested, macsec_offload_attrs */ __MACSEC_ATTR_END, NUM_MACSEC_ATTR = __MACSEC_ATTR_END, MACSEC_ATTR_MAX = __MACSEC_ATTR_END - 1, @@ -97,6 +98,15 @@ enum macsec_sa_attrs { MACSEC_SA_ATTR_MAX = __MACSEC_SA_ATTR_END - 1, }; +enum macsec_offload_attrs { + MACSEC_OFFLOAD_ATTR_UNSPEC, + MACSEC_OFFLOAD_ATTR_TYPE, /* config/dump, u8 0..2 */ + MACSEC_OFFLOAD_ATTR_PAD, + __MACSEC_OFFLOAD_ATTR_END, + NUM_MACSEC_OFFLOAD_ATTR = __MACSEC_OFFLOAD_ATTR_END, + MACSEC_OFFLOAD_ATTR_MAX = __MACSEC_OFFLOAD_ATTR_END - 1, +}; + enum macsec_nl_commands { MACSEC_CMD_GET_TXSC, MACSEC_CMD_ADD_RXSC, @@ -108,6 +118,7 @@ enum macsec_nl_commands { MACSEC_CMD_ADD_RXSA, MACSEC_CMD_DEL_RXSA, MACSEC_CMD_UPD_RXSA, + MACSEC_CMD_UPD_OFFLOAD, }; /* u64 per-RXSC stats */ -- cgit v1.2.3 From 5c937de78b39e47ce9924fc4b863c5b727edc328 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Mon, 13 Jan 2020 23:31:47 +0100 Subject: net: macsec: PN wrap callback Allow to call macsec_pn_wrapped from hardware drivers to notify when a PN rolls over. Some drivers might used an interrupt to implement this. Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller --- include/net/macsec.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/macsec.h b/include/net/macsec.h index 16e7e5061178..92e43db8b566 100644 --- a/include/net/macsec.h +++ b/include/net/macsec.h @@ -219,4 +219,6 @@ struct macsec_ops { int (*mdo_del_txsa)(struct macsec_context *ctx); }; +void macsec_pn_wrapped(struct macsec_secy *secy, struct macsec_tx_sa *tx_sa); + #endif /* _NET_MACSEC_H_ */ -- cgit v1.2.3 From 5eee7bd7e245914e4e050c413dfe864e31805207 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Mon, 13 Jan 2020 18:42:26 -0500 Subject: net: skbuff: disambiguate argument and member for skb_list_walk_safe helper This worked before, because we made all callers name their next pointer "next". But in trying to be more "drop-in" ready, the silliness here is revealed. This commit fixes the problem by making the macro argument and the member use different names. Signed-off-by: Jason A. Donenfeld Signed-off-by: David S. Miller --- include/linux/skbuff.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 016b3c4ab99a..aaf73b34f72f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1479,9 +1479,9 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb) } /* Iterate through singly-linked GSO fragments of an skb. */ -#define skb_list_walk_safe(first, skb, next) \ - for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb); \ - (skb) = (next), (next) = (skb) ? (skb)->next : NULL) +#define skb_list_walk_safe(first, skb, next_skb) \ + for ((skb) = (first), (next_skb) = (skb) ? (skb)->next : NULL; (skb); \ + (skb) = (next_skb), (next_skb) = (skb) ? (skb)->next : NULL) static inline void skb_list_del_init(struct sk_buff *skb) { -- cgit v1.2.3 From 1e301fd04eaaa5b1e3c202450d86864e6714d783 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 14 Jan 2020 13:23:10 +0200 Subject: ipv4: Encapsulate function arguments in a struct fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages using the passed arguments. Currently, the function takes 11 arguments, 6 of which are attributes of the route being dumped (e.g., prefix, TOS). The next patch will need the function to also dump to user space an indication if the route is present in hardware or not. Instead of passing yet another argument, change the function to take a struct containing the different route attributes. v2: * Name last argument of fib_dump_info() * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- include/net/ip_fib.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index b9cba41c6d4f..0c071c820e33 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -204,6 +204,15 @@ __be32 fib_result_prefsrc(struct net *net, struct fib_result *res); #define FIB_RES_DEV(res) (FIB_RES_NHC(res)->nhc_dev) #define FIB_RES_OIF(res) (FIB_RES_NHC(res)->nhc_oif) +struct fib_rt_info { + struct fib_info *fi; + u32 tb_id; + __be32 dst; + int dst_len; + u8 tos; + u8 type; +}; + struct fib_entry_notifier_info { struct fib_notifier_info info; /* must be first */ u32 dst; -- cgit v1.2.3 From 90b93f1b31f86dcde2fa3c57f1ae33d28d87a1f8 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 14 Jan 2020 13:23:11 +0200 Subject: ipv4: Add "offload" and "trap" indications to routes When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 */ struct fib_info * fa_info; /* 16 8 */ u8 fa_tos; /* 24 1 */ u8 fa_type; /* 25 1 */ u8 fa_state; /* 26 1 */ u8 fa_slen; /* 27 1 */ u32 tb_id; /* 28 4 */ s16 fa_default; /* 32 2 */ u8 offload:1; /* 34: 0 1 */ u8 trap:1; /* 34: 1 1 */ u8 unused:6; /* 34: 2 1 */ /* XXX 5 bytes hole, try to pack */ struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 50, holes: 1, sum holes: 5 */ /* sum bitfield members: 8 bits (1 bytes) */ /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- include/net/ip_fib.h | 4 ++++ include/uapi/linux/rtnetlink.h | 2 ++ 2 files changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 0c071c820e33..6a1ae49809de 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -211,6 +211,9 @@ struct fib_rt_info { int dst_len; u8 tos; u8 type; + u8 offload:1, + trap:1, + unused:6; }; struct fib_entry_notifier_info { @@ -473,6 +476,7 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap, void fib_nh_common_release(struct fib_nh_common *nhc); /* Exported by fib_trie.c */ +void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri); void fib_trie_init(void); struct fib_table *fib_trie_table(u32 id, struct fib_table *alias); diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 1418a8362bb7..cd43321d20dd 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -309,6 +309,8 @@ enum rt_scope_t { #define RTM_F_PREFIX 0x800 /* Prefix addresses */ #define RTM_F_LOOKUP_TABLE 0x1000 /* set rtm_table to FIB lookup result */ #define RTM_F_FIB_MATCH 0x2000 /* return full fib lookup match */ +#define RTM_F_OFFLOAD 0x4000 /* route is offloaded */ +#define RTM_F_TRAP 0x8000 /* route is trapping packets */ /* Reserved table identifiers */ -- cgit v1.2.3 From bb3c4ab93e44784c1e958bdbba7824bba40f23cd Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 14 Jan 2020 13:23:12 +0200 Subject: ipv6: Add "offload" and "trap" indications to routes In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Reviewed-by: David Ahern Acked-by: Roopa Prabhu Signed-off-by: David S. Miller --- include/net/ip6_fib.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index b579faea41e9..fd60a8ac02ee 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -192,7 +192,9 @@ struct fib6_info { dst_nopolicy:1, dst_host:1, fib6_destroying:1, - unused:3; + offload:1, + trap:1, + unused:1; struct rcu_head rcu; struct nexthop *nh; @@ -329,6 +331,13 @@ static inline void fib6_info_release(struct fib6_info *f6i) call_rcu(&f6i->rcu, fib6_info_destroy_rcu); } +static inline void fib6_info_hw_flags_set(struct fib6_info *f6i, bool offload, + bool trap) +{ + f6i->offload = offload; + f6i->trap = trap; +} + enum fib6_walk_state { #ifdef CONFIG_IPV6_SUBTREES FWS_S, -- cgit v1.2.3 From 8dcea187088bce5d2f1149294ad109f022653547 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Tue, 14 Jan 2020 19:56:09 +0200 Subject: net: bridge: vlan: add rtm definitions and dump support This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/if_bridge.h | 28 ++++++++++++++++++++++++++++ include/uapi/linux/rtnetlink.h | 7 +++++++ 2 files changed, 35 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index 4a58e3d7de46..4da04f77d9ee 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -165,6 +165,34 @@ struct bridge_stp_xstats { __u64 tx_tcn; }; +/* Bridge vlan RTM header */ +struct br_vlan_msg { + __u8 family; + __u8 reserved1; + __u16 reserved2; + __u32 ifindex; +}; + +/* Bridge vlan RTM attributes + * [BRIDGE_VLANDB_ENTRY] = { + * [BRIDGE_VLANDB_ENTRY_INFO] + * ... + * } + */ +enum { + BRIDGE_VLANDB_UNSPEC, + BRIDGE_VLANDB_ENTRY, + __BRIDGE_VLANDB_MAX, +}; +#define BRIDGE_VLANDB_MAX (__BRIDGE_VLANDB_MAX - 1) + +enum { + BRIDGE_VLANDB_ENTRY_UNSPEC, + BRIDGE_VLANDB_ENTRY_INFO, + __BRIDGE_VLANDB_ENTRY_MAX, +}; +#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1) + /* Bridge multicast database attributes * [MDBA_MDB] = { * [MDBA_MDB_ENTRY] = { diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index cd43321d20dd..b333cff14071 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -171,6 +171,13 @@ enum { RTM_GETLINKPROP, #define RTM_GETLINKPROP RTM_GETLINKPROP + RTM_NEWVLAN = 112, +#define RTM_NEWNVLAN RTM_NEWVLAN + RTM_DELVLAN, +#define RTM_DELVLAN RTM_DELVLAN + RTM_GETVLAN, +#define RTM_GETVLAN RTM_GETVLAN + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; -- cgit v1.2.3 From 0ab558795184f87b532363e262357160d25f5d64 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Tue, 14 Jan 2020 19:56:12 +0200 Subject: net: bridge: vlan: add rtm range support Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/if_bridge.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index 4da04f77d9ee..ac38f0b674b8 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -189,6 +189,7 @@ enum { enum { BRIDGE_VLANDB_ENTRY_UNSPEC, BRIDGE_VLANDB_ENTRY_INFO, + BRIDGE_VLANDB_ENTRY_RANGE, __BRIDGE_VLANDB_ENTRY_MAX, }; #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1) -- cgit v1.2.3 From cf5bddb95cbe5340e486e84d46cf2f0bb324078c Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Tue, 14 Jan 2020 19:56:13 +0200 Subject: net: bridge: vlan: add rtnetlink group and notify support Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/rtnetlink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index b333cff14071..4a8c5b745157 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -730,6 +730,8 @@ enum rtnetlink_groups { #define RTNLGRP_IPV6_MROUTE_R RTNLGRP_IPV6_MROUTE_R RTNLGRP_NEXTHOP, #define RTNLGRP_NEXTHOP RTNLGRP_NEXTHOP + RTNLGRP_BRVLAN, +#define RTNLGRP_BRVLAN RTNLGRP_BRVLAN __RTNLGRP_MAX }; #define RTNLGRP_MAX (__RTNLGRP_MAX - 1) -- cgit v1.2.3 From 8482941f09067da42f9c3362e15bfb3f3c19d610 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Tue, 14 Jan 2020 19:50:02 -0800 Subject: bpf: Add bpf_send_signal_thread() helper Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may be delivered to any threads in the process. We found a use case where sending the signal to the current thread is more preferable. - A bpf program will collect the stack trace and then send signal to the user application. - The user application will add some thread specific information to the just collected stack trace for later analysis. If bpf_send_signal() is used, user application will need to check whether the thread receiving the signal matches the thread collecting the stack by checking thread id. If not, it will need to send signal to another thread through pthread_kill(). This patch proposed a new helper bpf_send_signal_thread(), which sends the signal to the thread corresponding to the current kernel task. This way, user space is guaranteed that bpf_program execution context and user space signal handling context are the same thread. Signed-off-by: Yonghong Song Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com --- include/uapi/linux/bpf.h | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 52966e758fe5..a88837d4dd18 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2714,7 +2714,8 @@ union bpf_attr { * * int bpf_send_signal(u32 sig) * Description - * Send signal *sig* to the current task. + * Send signal *sig* to the process of the current task. + * The signal may be delivered to any of this process's threads. * Return * 0 on success or successfully queued. * @@ -2850,6 +2851,19 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_send_signal_thread(u32 sig) + * Description + * Send signal *sig* to the thread corresponding to the current task. + * Return + * 0 on success or successfully queued. + * + * **-EBUSY** if work queue under nmi is full. + * + * **-EINVAL** if *sig* is invalid. + * + * **-EPERM** if no permission to send the *sig*. + * + * **-EAGAIN** if bpf program can try again. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2968,7 +2982,8 @@ union bpf_attr { FN(probe_read_kernel), \ FN(probe_read_user_str), \ FN(probe_read_kernel_str), \ - FN(tcp_send_ack), + FN(tcp_send_ack), \ + FN(send_signal_thread), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call -- cgit v1.2.3 From 600a87490ff9823d065fc15e86c709e707033ecc Mon Sep 17 00:00:00 2001 From: Alain Michaud Date: Tue, 7 Jan 2020 00:43:17 +0000 Subject: Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS. MGMT command is added to receive the list of blocked keys from user-space. The list is used to: 1) Block keys from being distributed by the device during the ke distribution phase of SMP. 2) Filter out any keys that were previously saved so they are no longer used. Signed-off-by: Alain Michaud Signed-off-by: Marcel Holtmann --- include/net/bluetooth/hci_core.h | 10 ++++++++++ include/net/bluetooth/mgmt.h | 17 +++++++++++++++++ 2 files changed, 27 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index faebe3859931..89ecf0a80aa1 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -118,6 +118,13 @@ struct bt_uuid { u8 svc_hint; }; +struct blocked_key { + struct list_head list; + struct rcu_head rcu; + u8 type; + u8 val[16]; +}; + struct smp_csrk { bdaddr_t bdaddr; u8 bdaddr_type; @@ -397,6 +404,7 @@ struct hci_dev { struct list_head le_conn_params; struct list_head pend_le_conns; struct list_head pend_le_reports; + struct list_head blocked_keys; struct hci_dev_stats stat; @@ -1123,6 +1131,8 @@ struct smp_irk *hci_find_irk_by_addr(struct hci_dev *hdev, bdaddr_t *bdaddr, struct smp_irk *hci_add_irk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 addr_type, u8 val[16], bdaddr_t *rpa); void hci_remove_irk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 addr_type); +bool hci_is_blocked_key(struct hci_dev *hdev, u8 type, u8 val[16]); +void hci_blocked_keys_clear(struct hci_dev *hdev); void hci_smp_irks_clear(struct hci_dev *hdev); bool hci_bdaddr_is_paired(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type); diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h index 9cee7ddc6741..a90666af05bd 100644 --- a/include/net/bluetooth/mgmt.h +++ b/include/net/bluetooth/mgmt.h @@ -654,6 +654,23 @@ struct mgmt_cp_set_phy_confguration { } __packed; #define MGMT_SET_PHY_CONFIGURATION_SIZE 4 +#define MGMT_OP_SET_BLOCKED_KEYS 0x0046 + +#define HCI_BLOCKED_KEY_TYPE_LINKKEY 0x00 +#define HCI_BLOCKED_KEY_TYPE_LTK 0x01 +#define HCI_BLOCKED_KEY_TYPE_IRK 0x02 + +struct mgmt_blocked_key_info { + __u8 type; + __u8 val[16]; +} __packed; + +struct mgmt_cp_set_blocked_keys { + __le16 key_count; + struct mgmt_blocked_key_info keys[0]; +} __packed; +#define MGMT_OP_SET_BLOCKED_KEYS_SIZE 2 + #define MGMT_EV_CMD_COMPLETE 0x0001 struct mgmt_ev_cmd_complete { __le16 opcode; -- cgit v1.2.3 From 4de0fc599eb936d37542f819e931ba3fd8e435ca Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Wed, 15 Jan 2020 13:02:11 -0800 Subject: Bluetooth: Add definitions for CIS connections These adds the HCI definitions for handling CIS connections along with ISO data packets. Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Marcel Holtmann --- include/net/bluetooth/hci.h | 159 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 158 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index 07b6ecedc6ce..6293bdd7d862 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -27,6 +27,7 @@ #define HCI_MAX_ACL_SIZE 1024 #define HCI_MAX_SCO_SIZE 255 +#define HCI_MAX_ISO_SIZE 251 #define HCI_MAX_EVENT_SIZE 260 #define HCI_MAX_FRAME_SIZE (HCI_MAX_ACL_SIZE + 4) @@ -303,6 +304,7 @@ enum { #define HCI_ACLDATA_PKT 0x02 #define HCI_SCODATA_PKT 0x03 #define HCI_EVENT_PKT 0x04 +#define HCI_ISODATA_PKT 0x05 #define HCI_DIAG_PKT 0xf0 #define HCI_VENDOR_PKT 0xff @@ -352,6 +354,15 @@ enum { #define ACL_ACTIVE_BCAST 0x04 #define ACL_PICO_BCAST 0x08 +/* ISO PB flags */ +#define ISO_START 0x00 +#define ISO_CONT 0x01 +#define ISO_SINGLE 0x02 +#define ISO_END 0x03 + +/* ISO TS flags */ +#define ISO_TS 0x01 + /* Baseband links */ #define SCO_LINK 0x00 #define ACL_LINK 0x01 @@ -359,6 +370,7 @@ enum { /* Low Energy links do not have defined link type. Use invented one */ #define LE_LINK 0x80 #define AMP_LINK 0x81 +#define ISO_LINK 0x82 #define INVALID_LINK 0xff /* LMP features */ @@ -440,6 +452,8 @@ enum { #define HCI_LE_PHY_2M 0x01 #define HCI_LE_PHY_CODED 0x08 #define HCI_LE_CHAN_SEL_ALG2 0x40 +#define HCI_LE_CIS_MASTER 0x10 +#define HCI_LE_CIS_SLAVE 0x20 /* Connection modes */ #define HCI_CM_ACTIVE 0x0000 @@ -1718,6 +1732,86 @@ struct hci_cp_le_set_adv_set_rand_addr { bdaddr_t bdaddr; } __packed; +#define HCI_OP_LE_READ_BUFFER_SIZE_V2 0x2060 +struct hci_rp_le_read_buffer_size_v2 { + __u8 status; + __le16 acl_mtu; + __u8 acl_max_pkt; + __le16 iso_mtu; + __u8 iso_max_pkt; +} __packed; + +#define HCI_OP_LE_READ_ISO_TX_SYNC 0x2061 +struct hci_cp_le_read_iso_tx_sync { + __le16 handle; +} __packed; + +struct hci_rp_le_read_iso_tx_sync { + __u8 status; + __le16 handle; + __le16 seq; + __le32 imestamp; + __u8 offset[3]; +} __packed; + +#define HCI_OP_LE_SET_CIG_PARAMS 0x2062 +struct hci_cis_params { + __u8 cis_id; + __le16 m_sdu; + __le16 s_sdu; + __u8 m_phy; + __u8 s_phy; + __u8 m_rtn; + __u8 s_rtn; +} __packed; + +struct hci_cp_le_set_cig_params { + __u8 cig_id; + __u8 m_interval[3]; + __u8 s_interval[3]; + __u8 sca; + __u8 packing; + __u8 framing; + __le16 m_latency; + __le16 s_latency; + __u8 num_cis; + struct hci_cis_params cis[0]; +} __packed; + +struct hci_rp_le_set_cig_params { + __u8 status; + __u8 cig_id; + __u8 num_handles; + __le16 handle[0]; +} __packed; + +#define HCI_OP_LE_CREATE_CIS 0x2064 +struct hci_cis { + __le16 cis_handle; + __le16 acl_handle; +} __packed; + +struct hci_cp_le_create_cis { + __u8 num_cis; + struct hci_cis cis[0]; +} __packed; + +#define HCI_OP_LE_REMOVE_CIG 0x2065 +struct hci_cp_le_remove_cig { + __u8 cig_id; +} __packed; + +#define HCI_OP_LE_ACCEPT_CIS 0x2066 +struct hci_cp_le_accept_cis { + __le16 handle; +} __packed; + +#define HCI_OP_LE_REJECT_CIS 0x2067 +struct hci_cp_le_reject_cis { + __le16 handle; + __u8 reason; +} __packed; + /* ---- HCI Events ---- */ #define HCI_EV_INQUIRY_COMPLETE 0x01 @@ -2189,7 +2283,7 @@ struct hci_ev_le_direct_adv_info { #define HCI_EV_LE_PHY_UPDATE_COMPLETE 0x0c struct hci_ev_le_phy_update_complete { __u8 status; - __u16 handle; + __le16 handle; __u8 tx_phy; __u8 rx_phy; } __packed; @@ -2234,6 +2328,34 @@ struct hci_evt_le_ext_adv_set_term { __u8 num_evts; } __packed; +#define HCI_EVT_LE_CIS_ESTABLISHED 0x19 +struct hci_evt_le_cis_established { + __u8 status; + __le16 handle; + __u8 cig_sync_delay[3]; + __u8 cis_sync_delay[3]; + __u8 m_latency[3]; + __u8 s_latency[3]; + __u8 m_phy; + __u8 s_phy; + __u8 nse; + __u8 m_bn; + __u8 s_bn; + __u8 m_ft; + __u8 s_ft; + __le16 m_mtu; + __le16 s_mtu; + __le16 interval; +} __packed; + +#define HCI_EVT_LE_CIS_REQ 0x1a +struct hci_evt_le_cis_req { + __le16 acl_handle; + __le16 cis_handle; + __u8 cig_id; + __u8 cis_id; +} __packed; + #define HCI_EV_VENDOR 0xff /* Internal events generated by Bluetooth stack */ @@ -2262,6 +2384,7 @@ struct hci_ev_si_security { #define HCI_EVENT_HDR_SIZE 2 #define HCI_ACL_HDR_SIZE 4 #define HCI_SCO_HDR_SIZE 3 +#define HCI_ISO_HDR_SIZE 4 struct hci_command_hdr { __le16 opcode; /* OCF & OGF */ @@ -2283,6 +2406,30 @@ struct hci_sco_hdr { __u8 dlen; } __packed; +struct hci_iso_hdr { + __le16 handle; + __le16 dlen; + __u8 data[0]; +} __packed; + +/* ISO data packet status flags */ +#define HCI_ISO_STATUS_VALID 0x00 +#define HCI_ISO_STATUS_INVALID 0x01 +#define HCI_ISO_STATUS_NOP 0x02 + +#define HCI_ISO_DATA_HDR_SIZE 4 +struct hci_iso_data_hdr { + __le16 sn; + __le16 slen; +}; + +#define HCI_ISO_TS_DATA_HDR_SIZE 8 +struct hci_iso_ts_data_hdr { + __le32 ts; + __le16 sn; + __le16 slen; +}; + static inline struct hci_event_hdr *hci_event_hdr(const struct sk_buff *skb) { return (struct hci_event_hdr *) skb->data; @@ -2308,4 +2455,14 @@ static inline struct hci_sco_hdr *hci_sco_hdr(const struct sk_buff *skb) #define hci_handle(h) (h & 0x0fff) #define hci_flags(h) (h >> 12) +/* ISO handle and flags pack/unpack */ +#define hci_iso_flags_pb(f) (f & 0x0003) +#define hci_iso_flags_ts(f) ((f >> 2) & 0x0001) +#define hci_iso_flags_pack(pb, ts) ((pb & 0x03) | ((ts & 0x01) << 2)) + +/* ISO data length and flags pack/unpack */ +#define hci_iso_data_len_pack(h, f) ((__u16) ((h) | ((f) << 14))) +#define hci_iso_data_len(h) ((h) & 0x3fff) +#define hci_iso_data_flags(h) ((h) >> 14) + #endif /* __HCI_H */ -- cgit v1.2.3 From f9a619db7c137b7c2dec0414d8deb8ec762ae8f9 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Wed, 15 Jan 2020 13:02:17 -0800 Subject: Bluetooth: monitor: Add support for ISO packets This enables passing ISO packets to the monitor socket. Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Marcel Holtmann --- include/net/bluetooth/hci_mon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_mon.h b/include/net/bluetooth/hci_mon.h index 240786b04a46..2d5fcda1bcd0 100644 --- a/include/net/bluetooth/hci_mon.h +++ b/include/net/bluetooth/hci_mon.h @@ -49,6 +49,8 @@ struct hci_mon_hdr { #define HCI_MON_CTRL_CLOSE 15 #define HCI_MON_CTRL_COMMAND 16 #define HCI_MON_CTRL_EVENT 17 +#define HCI_MON_ISO_TX_PKT 18 +#define HCI_MON_ISO_RX_PKT 19 struct hci_mon_new_index { __u8 type; -- cgit v1.2.3 From cb4d03ab499d4c040f4ab6fd4389d2b49f42b5a5 Mon Sep 17 00:00:00 2001 From: Brian Vazquez Date: Wed, 15 Jan 2020 10:43:01 -0800 Subject: bpf: Add generic support for lookup batch op This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP_*_BATCH commands */ __aligned_u64 in_batch; /* start batch, * NULL to start from beginning */ __aligned_u64 out_batch; /* output: next start batch */ __aligned_u64 keys; __aligned_u64 values; __u32 count; /* input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size * count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev Signed-off-by: Brian Vazquez Signed-off-by: Yonghong Song Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com --- include/linux/bpf.h | 5 +++++ include/uapi/linux/bpf.h | 18 ++++++++++++++++++ 2 files changed, 23 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index aed2bc39d72b..807744ecaa5a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -44,6 +44,8 @@ struct bpf_map_ops { int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); + int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); @@ -982,6 +984,9 @@ void *bpf_map_area_alloc(u64 size, int numa_node); void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); void bpf_map_area_free(void *base); void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); +int generic_map_lookup_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr); extern int sysctl_unprivileged_bpf_disabled; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a88837d4dd18..0721b2c6bb8c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -107,6 +107,7 @@ enum bpf_cmd { BPF_MAP_LOOKUP_AND_DELETE_ELEM, BPF_MAP_FREEZE, BPF_BTF_GET_NEXT_ID, + BPF_MAP_LOOKUP_BATCH, }; enum bpf_map_type { @@ -420,6 +421,23 @@ union bpf_attr { __u64 flags; }; + struct { /* struct used by BPF_MAP_*_BATCH commands */ + __aligned_u64 in_batch; /* start batch, + * NULL to start from beginning + */ + __aligned_u64 out_batch; /* output: next start batch */ + __aligned_u64 keys; + __aligned_u64 values; + __u32 count; /* input/output: + * input: # of key/value + * elements + * output: # of filled elements + */ + __u32 map_fd; + __u64 elem_flags; + __u64 flags; + } batch; + struct { /* anonymous struct used by BPF_PROG_LOAD command */ __u32 prog_type; /* one of enum bpf_prog_type */ __u32 insn_cnt; -- cgit v1.2.3 From aa2e93b8e58e18442edfb2427446732415bc215e Mon Sep 17 00:00:00 2001 From: Brian Vazquez Date: Wed, 15 Jan 2020 10:43:02 -0800 Subject: bpf: Add generic support for update and delete batch ops This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev Signed-off-by: Brian Vazquez Signed-off-by: Yonghong Song Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com --- include/linux/bpf.h | 10 ++++++++++ include/uapi/linux/bpf.h | 2 ++ 2 files changed, 12 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 807744ecaa5a..05466ad6cf1c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -46,6 +46,10 @@ struct bpf_map_ops { void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr); + int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr); + int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr, + union bpf_attr __user *uattr); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); @@ -987,6 +991,12 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr); +int generic_map_update_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr); +int generic_map_delete_batch(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr); extern int sysctl_unprivileged_bpf_disabled; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 0721b2c6bb8c..d5320b0be459 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -108,6 +108,8 @@ enum bpf_cmd { BPF_MAP_FREEZE, BPF_BTF_GET_NEXT_ID, BPF_MAP_LOOKUP_BATCH, + BPF_MAP_UPDATE_BATCH, + BPF_MAP_DELETE_BATCH, }; enum bpf_map_type { -- cgit v1.2.3 From 057996380a42bb64ccc04383cfa9c0ace4ea11f0 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 15 Jan 2020 10:43:04 -0800 Subject: bpf: Add batch ops to all htab bpf map htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song Signed-off-by: Brian Vazquez Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com --- include/linux/bpf.h | 3 +++ include/uapi/linux/bpf.h | 1 + 2 files changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 05466ad6cf1c..3517e32149a4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -46,6 +46,9 @@ struct bpf_map_ops { void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr); + int (*map_lookup_and_delete_batch)(struct bpf_map *map, + const union bpf_attr *attr, + union bpf_attr __user *uattr); int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr); int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d5320b0be459..033d90a2282d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -108,6 +108,7 @@ enum bpf_cmd { BPF_MAP_FREEZE, BPF_BTF_GET_NEXT_ID, BPF_MAP_LOOKUP_BATCH, + BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH, }; -- cgit v1.2.3 From c320e527e1548305f31d95ec405140b04aed25f5 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Wed, 15 Jan 2020 14:43:31 +0200 Subject: IB: Allow calls to ib_umem_get from kernel ULPs So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky --- include/rdma/ib_umem.h | 4 ++-- include/rdma/ib_umem_odp.h | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 753f54e17e0a..e3518fd6b95b 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -69,7 +69,7 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem) #ifdef CONFIG_INFINIBAND_USER_MEM -struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, +struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, size_t size, int access); void ib_umem_release(struct ib_umem *umem); int ib_umem_page_count(struct ib_umem *umem); @@ -83,7 +83,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, #include -static inline struct ib_umem *ib_umem_get(struct ib_udata *udata, +static inline struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, size_t size, int access) { diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index 81429acc8257..64314ff76612 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -114,9 +114,9 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp) #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING struct ib_umem_odp * -ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, size_t size, +ib_umem_odp_get(struct ib_device *device, unsigned long addr, size_t size, int access, const struct mmu_interval_notifier_ops *ops); -struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_udata *udata, +struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device, int access); struct ib_umem_odp * ib_umem_odp_alloc_child(struct ib_umem_odp *root_umem, unsigned long addr, @@ -134,7 +134,7 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 start_offset, #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline struct ib_umem_odp * -ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, size_t size, +ib_umem_odp_get(struct ib_device *device, unsigned long addr, size_t size, int access, const struct mmu_interval_notifier_ops *ops) { return ERR_PTR(-EINVAL); -- cgit v1.2.3 From 33006bd4f37f7d2c3d1cf0268b4f327b5fdc2558 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Wed, 15 Jan 2020 14:43:32 +0200 Subject: IB/core: Introduce ib_reg_user_mr Add ib_reg_user_mr() for kernel ULPs to register user MRs. The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP that acts as an agent for the userspace application. This function is intended to be used without a user context so vendor drivers need to be aware of calling reg_user_mr() device operation with udata equal to NULL. Among all drivers, i40iw is the only driver which relies on presence of udata, so check udata existence for that driver. Signed-off-by: Moni Shoua Reviewed-by: Guy Levi Signed-off-by: Leon Romanovsky --- include/rdma/ib_verbs.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5608e14e3aad..170d5ec95b79 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -4153,6 +4153,12 @@ static inline void ib_dma_free_coherent(struct ib_device *dev, dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle); } +/* ib_reg_user_mr - register a memory region for virtual addresses from kernel + * space. This function should be called when 'current' is the owning MM. + */ +struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, + u64 virt_addr, int mr_access_flags); + /** * ib_dereg_mr_user - Deregisters a memory region and removes it from the * HCA translation table. -- cgit v1.2.3 From 87d8069f6b028793254ddd0a66df1d7b6d79b450 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Wed, 15 Jan 2020 14:43:33 +0200 Subject: IB/core: Add interface to advise_mr for kernel users Allow ULPs to call advise_mr, so they can control ODP regions in the same way as user space applications. Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky --- include/rdma/ib_verbs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 170d5ec95b79..e2cc62217cc2 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -4159,6 +4159,9 @@ static inline void ib_dma_free_coherent(struct ib_device *dev, struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int mr_access_flags); +/* ib_advise_mr - give an advice about an address range in a memory region */ +int ib_advise_mr(struct ib_pd *pd, enum ib_uverbs_advise_mr_advice advice, + u32 flags, struct ib_sge *sg_list, u32 num_sge); /** * ib_dereg_mr_user - Deregisters a memory region and removes it from the * HCA translation table. -- cgit v1.2.3 From 4a7faaf4add3108522512c16c9ee08fb2ad4525e Mon Sep 17 00:00:00 2001 From: Jeremy Sowden Date: Wed, 1 Jan 2020 13:41:32 +0000 Subject: netfilter: nft_bitwise: correct uapi header comment. The comment documenting how bitwise expressions work includes a table which summarizes the mask and xor arguments combined to express the supported boolean operations. However, the row for OR: mask xor 0 x is incorrect. dreg = (sreg & 0) ^ x is not equivalent to: dreg = sreg | x What the code actually does is: dreg = (sreg & ~x) ^ x Update the documentation to match. Signed-off-by: Jeremy Sowden Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index e237ecbdcd8a..dd4611767933 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -501,7 +501,7 @@ enum nft_immediate_attributes { * * mask xor * NOT: 1 1 - * OR: 0 x + * OR: ~x x * XOR: 1 x * AND: x 0 */ -- cgit v1.2.3 From 445db8d09659eb27bcd5920cb91d91686f0197d0 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Sun, 5 Jan 2020 22:00:57 +0100 Subject: netfilter: flowtable: remove dying bit, use teardown bit instead The dying bit removes the conntrack entry if the netdev that owns this flow is going down. Instead, use the teardown mechanism to push back the flow to conntrack to let the classic software path decide what to do with it. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 415b8f49d150..4ad924d5f983 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -85,7 +85,6 @@ struct flow_offload_tuple_rhash { #define FLOW_OFFLOAD_SNAT 0x1 #define FLOW_OFFLOAD_DNAT 0x2 -#define FLOW_OFFLOAD_DYING 0x4 #define FLOW_OFFLOAD_TEARDOWN 0x8 #define FLOW_OFFLOAD_HW 0x10 #define FLOW_OFFLOAD_HW_DYING 0x20 @@ -134,10 +133,6 @@ int nf_flow_table_init(struct nf_flowtable *flow_table); void nf_flow_table_free(struct nf_flowtable *flow_table); void flow_offload_teardown(struct flow_offload *flow); -static inline void flow_offload_dead(struct flow_offload *flow) -{ - flow->flags |= FLOW_OFFLOAD_DYING; -} int nf_flow_snat_port(const struct flow_offload *flow, struct sk_buff *skb, unsigned int thoff, -- cgit v1.2.3 From 355a8b13f87a8964ebe785b065f1388a1bd00c7e Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Sun, 5 Jan 2020 20:41:15 +0100 Subject: netfilter: flowtable: use atomic bitwise operations for flow flags Originally, all flow flag bits were set on only from the workqueue. With the introduction of the flow teardown state and hardware offload this is no longer true. Let's be safe and use atomic bitwise operation to operation with flow flags. Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading") Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 4ad924d5f983..5a10e28c3e40 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -83,12 +83,14 @@ struct flow_offload_tuple_rhash { struct flow_offload_tuple tuple; }; -#define FLOW_OFFLOAD_SNAT 0x1 -#define FLOW_OFFLOAD_DNAT 0x2 -#define FLOW_OFFLOAD_TEARDOWN 0x8 -#define FLOW_OFFLOAD_HW 0x10 -#define FLOW_OFFLOAD_HW_DYING 0x20 -#define FLOW_OFFLOAD_HW_DEAD 0x40 +enum nf_flow_flags { + NF_FLOW_SNAT, + NF_FLOW_DNAT, + NF_FLOW_TEARDOWN, + NF_FLOW_HW, + NF_FLOW_HW_DYING, + NF_FLOW_HW_DEAD, +}; enum flow_offload_type { NF_FLOW_OFFLOAD_UNSPEC = 0, @@ -98,7 +100,7 @@ enum flow_offload_type { struct flow_offload { struct flow_offload_tuple_rhash tuplehash[FLOW_OFFLOAD_DIR_MAX]; struct nf_conn *ct; - u16 flags; + unsigned long flags; u16 type; u32 timeout; struct rcu_head rcu_head; -- cgit v1.2.3 From a5449cdcaac5c78d62b8bea8f79158071f23da01 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 7 Jan 2020 09:56:27 +0100 Subject: netfilter: flowtable: add nf_flowtable_hw_offload() helper function This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that the flowtable hardware offload is enabled. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 5a10e28c3e40..9ee1eaeaab04 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -47,6 +47,11 @@ struct nf_flowtable { possible_net_t net; }; +static inline bool nf_flowtable_hw_offload(struct nf_flowtable *flowtable) +{ + return flowtable->flags & NF_FLOWTABLE_HW_OFFLOAD; +} + enum flow_offload_tuple_dir { FLOW_OFFLOAD_DIR_ORIGINAL = IP_CT_DIR_ORIGINAL, FLOW_OFFLOAD_DIR_REPLY = IP_CT_DIR_REPLY, -- cgit v1.2.3 From f698fe40829b21088d323c8b0a7c626571528fc6 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Mon, 6 Jan 2020 12:56:47 +0100 Subject: netfilter: flowtable: refresh flow if hardware offload fails If nf_flow_offload_add() fails to add the flow to hardware, then the NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable software path. If flowtable hardware offload is enabled, this patch enqueues a new request to offload this flow to hardware. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 9ee1eaeaab04..e0f709d9d547 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -95,6 +95,7 @@ enum nf_flow_flags { NF_FLOW_HW, NF_FLOW_HW_DYING, NF_FLOW_HW_DEAD, + NF_FLOW_HW_REFRESH, }; enum flow_offload_type { -- cgit v1.2.3 From 9d1f979986c2e29632b6a8f7a8ef8b3c7d24a48c Mon Sep 17 00:00:00 2001 From: Jeremy Sowden Date: Wed, 15 Jan 2020 20:05:51 +0000 Subject: netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute. Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a value of a new enum, nft_bitwise_ops. It describes the type of operation an expression contains. Currently, it only has one value: NFT_BITWISE_BOOL. More values will be added later to implement shifts. Signed-off-by: Jeremy Sowden Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index dd4611767933..0cddf357281f 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -484,6 +484,16 @@ enum nft_immediate_attributes { }; #define NFTA_IMMEDIATE_MAX (__NFTA_IMMEDIATE_MAX - 1) +/** + * enum nft_bitwise_ops - nf_tables bitwise operations + * + * @NFT_BITWISE_BOOL: mask-and-xor operation used to implement NOT, AND, OR and + * XOR boolean operations + */ +enum nft_bitwise_ops { + NFT_BITWISE_BOOL, +}; + /** * enum nft_bitwise_attributes - nf_tables bitwise expression netlink attributes * @@ -492,6 +502,7 @@ enum nft_immediate_attributes { * @NFTA_BITWISE_LEN: length of operands (NLA_U32) * @NFTA_BITWISE_MASK: mask value (NLA_NESTED: nft_data_attributes) * @NFTA_BITWISE_XOR: xor value (NLA_NESTED: nft_data_attributes) + * @NFTA_BITWISE_OP: type of operation (NLA_U32: nft_bitwise_ops) * * The bitwise expression performs the following operation: * @@ -512,6 +523,7 @@ enum nft_bitwise_attributes { NFTA_BITWISE_LEN, NFTA_BITWISE_MASK, NFTA_BITWISE_XOR, + NFTA_BITWISE_OP, __NFTA_BITWISE_MAX }; #define NFTA_BITWISE_MAX (__NFTA_BITWISE_MAX - 1) -- cgit v1.2.3 From 779f725e142cd987a69296c0eb7d416ee3ba57dd Mon Sep 17 00:00:00 2001 From: Jeremy Sowden Date: Wed, 15 Jan 2020 20:05:56 +0000 Subject: netfilter: bitwise: add NFTA_BITWISE_DATA attribute. Add a new bitwise netlink attribute that will be used by shift operations to store the size of the shift. It is not used by boolean operations. Signed-off-by: Jeremy Sowden Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 0cddf357281f..8bef0620bc4f 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -503,6 +503,8 @@ enum nft_bitwise_ops { * @NFTA_BITWISE_MASK: mask value (NLA_NESTED: nft_data_attributes) * @NFTA_BITWISE_XOR: xor value (NLA_NESTED: nft_data_attributes) * @NFTA_BITWISE_OP: type of operation (NLA_U32: nft_bitwise_ops) + * @NFTA_BITWISE_DATA: argument for non-boolean operations + * (NLA_NESTED: nft_data_attributes) * * The bitwise expression performs the following operation: * @@ -524,6 +526,7 @@ enum nft_bitwise_attributes { NFTA_BITWISE_MASK, NFTA_BITWISE_XOR, NFTA_BITWISE_OP, + NFTA_BITWISE_DATA, __NFTA_BITWISE_MAX }; #define NFTA_BITWISE_MAX (__NFTA_BITWISE_MAX - 1) -- cgit v1.2.3 From 567d746b55bc66d3800c9ae91d50f0c5deb2fd93 Mon Sep 17 00:00:00 2001 From: Jeremy Sowden Date: Wed, 15 Jan 2020 20:05:57 +0000 Subject: netfilter: bitwise: add support for shifts. Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR and XOR. Extend it to do shifts as well. Signed-off-by: Jeremy Sowden Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 8bef0620bc4f..261864736b26 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -489,9 +489,13 @@ enum nft_immediate_attributes { * * @NFT_BITWISE_BOOL: mask-and-xor operation used to implement NOT, AND, OR and * XOR boolean operations + * @NFT_BITWISE_LSHIFT: left-shift operation + * @NFT_BITWISE_RSHIFT: right-shift operation */ enum nft_bitwise_ops { NFT_BITWISE_BOOL, + NFT_BITWISE_LSHIFT, + NFT_BITWISE_RSHIFT, }; /** @@ -506,11 +510,12 @@ enum nft_bitwise_ops { * @NFTA_BITWISE_DATA: argument for non-boolean operations * (NLA_NESTED: nft_data_attributes) * - * The bitwise expression performs the following operation: + * The bitwise expression supports boolean and shift operations. It implements + * the boolean operations by performing the following operation: * * dreg = (sreg & mask) ^ xor * - * which allow to express all bitwise operations: + * with these mask and xor values: * * mask xor * NOT: 1 1 -- cgit v1.2.3 From f397464eb7c25bda903ec8b9cf5701e72a1f7b16 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Mon, 7 Oct 2019 10:29:46 +0300 Subject: net/mlx5: Add structures layout for new MCAM access reg groups MCAM has 3 access_reg_groups (0-2). Defines data structures in order to read and parse access_reg_groups #1 and #2. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index c6abaf4f1c55..43cdf9211747 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -8832,6 +8832,28 @@ struct mlx5_ifc_mcam_access_reg_bits { u8 regs_31_to_0[0x20]; }; +struct mlx5_ifc_mcam_access_reg_bits1 { + u8 regs_127_to_96[0x20]; + + u8 regs_95_to_64[0x20]; + + u8 regs_63_to_32[0x20]; + + u8 regs_31_to_0[0x20]; +}; + +struct mlx5_ifc_mcam_access_reg_bits2 { + u8 regs_127_to_99[0x1d]; + u8 mirc[0x1]; + u8 regs_97_to_96[0x2]; + + u8 regs_95_to_64[0x20]; + + u8 regs_63_to_32[0x20]; + + u8 regs_31_to_0[0x20]; +}; + struct mlx5_ifc_mcam_reg_bits { u8 reserved_at_0[0x8]; u8 feature_group[0x8]; @@ -8842,6 +8864,8 @@ struct mlx5_ifc_mcam_reg_bits { union { struct mlx5_ifc_mcam_access_reg_bits access_regs; + struct mlx5_ifc_mcam_access_reg_bits1 access_regs1; + struct mlx5_ifc_mcam_access_reg_bits2 access_regs2; u8 reserved_at_0[0x80]; } mng_access_reg_cap_mask; -- cgit v1.2.3 From 932ef155117cc5caf1108bd27664dab974ba6e89 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Mon, 7 Oct 2019 10:31:42 +0300 Subject: net/mlx5: Read MCAM register groups 1 and 2 On load, Driver caches MCAM (Management Capabilities Mask Register) registers. in addition to the only MCAM register group (0) the driver already reads, here we add support for reading groups 1 and 2. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 14 +++++++++++++- include/linux/mlx5/driver.h | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 1a1c53f0262d..0e62c3db45e5 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1121,6 +1121,9 @@ enum mlx5_pcam_feature_groups { enum mlx5_mcam_reg_groups { MLX5_MCAM_REGS_FIRST_128 = 0x0, + MLX5_MCAM_REGS_0x9080_0x90FF = 0x1, + MLX5_MCAM_REGS_0x9100_0x917F = 0x2, + MLX5_MCAM_REGS_NUM = 0x3, }; enum mlx5_mcam_feature_groups { @@ -1269,7 +1272,16 @@ enum mlx5_qcam_feature_groups { MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg) #define MLX5_CAP_MCAM_REG(mdev, reg) \ - MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg) + MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \ + mng_access_reg_cap_mask.access_regs.reg) + +#define MLX5_CAP_MCAM_REG1(mdev, reg) \ + MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \ + mng_access_reg_cap_mask.access_regs1.reg) + +#define MLX5_CAP_MCAM_REG2(mdev, reg) \ + MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \ + mng_access_reg_cap_mask.access_regs2.reg) #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 27200dea0297..54431256af42 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -684,7 +684,7 @@ struct mlx5_core_dev { u32 hca_cur[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)]; u32 hca_max[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)]; u32 pcam[MLX5_ST_SZ_DW(pcam_reg)]; - u32 mcam[MLX5_ST_SZ_DW(mcam_reg)]; + u32 mcam[MLX5_MCAM_REGS_NUM][MLX5_ST_SZ_DW(mcam_reg)]; u32 fpga[MLX5_ST_SZ_DW(fpga_cap)]; u32 qcam[MLX5_ST_SZ_DW(qcam_reg)]; u8 embedded_cpu; -- cgit v1.2.3 From bab58ba10ecfa39c46d280d2acbca6054e1e863d Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Mon, 7 Oct 2019 10:30:32 +0300 Subject: net/mlx5: Add structures and defines for MIRC register Add needed structures, layouts and defines for MIRC (Management Image Re-activation Control) register. This structure will be used for the FSM reactivation flow in the downstream patches. Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 8 ++++++++ 2 files changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 54431256af42..7848b9858587 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -145,6 +145,7 @@ enum { MLX5_REG_MCC = 0x9062, MLX5_REG_MCDA = 0x9063, MLX5_REG_MCAM = 0x907f, + MLX5_REG_MIRC = 0x9162, }; enum mlx5_qpts_trust_state { diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 43cdf9211747..a133583c3e4f 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9471,6 +9471,13 @@ struct mlx5_ifc_mcda_reg_bits { u8 data[0][0x20]; }; +struct mlx5_ifc_mirc_reg_bits { + u8 reserved_at_0[0x18]; + u8 status_code[0x8]; + + u8 reserved_at_20[0x20]; +}; + union mlx5_ifc_ports_control_registers_document_bits { struct mlx5_ifc_bufferx_reg_bits bufferx_reg; struct mlx5_ifc_eth_2819_cntrs_grp_data_layout_bits eth_2819_cntrs_grp_data_layout; @@ -9526,6 +9533,7 @@ union mlx5_ifc_ports_control_registers_document_bits { struct mlx5_ifc_mcqi_reg_bits mcqi_reg; struct mlx5_ifc_mcc_reg_bits mcc_reg; struct mlx5_ifc_mcda_reg_bits mcda_reg; + struct mlx5_ifc_mirc_reg_bits mirc_reg; u8 reserved_at_0[0x60e0]; }; -- cgit v1.2.3 From 609b82727f719b41b50440c4028d48d0b2e04913 Mon Sep 17 00:00:00 2001 From: Aya Levin Date: Mon, 4 Nov 2019 14:51:55 +0200 Subject: net/mlx5: Expose resource dump register mapping Add new register enumeration for resource dump. Add layout mapping for resource dump: access command and response. Signed-off-by: Aya Levin Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + include/linux/mlx5/mlx5_ifc.h | 130 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 130 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 7848b9858587..c821fa4d7475 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -146,6 +146,7 @@ enum { MLX5_REG_MCDA = 0x9063, MLX5_REG_MCAM = 0x907f, MLX5_REG_MIRC = 0x9162, + MLX5_REG_RESOURCE_DUMP = 0xC000, }; enum mlx5_qpts_trust_state { diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index a133583c3e4f..6fe0431e11ec 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -823,7 +823,9 @@ struct mlx5_ifc_qos_cap_bits { struct mlx5_ifc_debug_cap_bits { u8 core_dump_general[0x1]; u8 core_dump_qp[0x1]; - u8 reserved_at_2[0x1e]; + u8 reserved_at_2[0x7]; + u8 resource_dump[0x1]; + u8 reserved_at_a[0x16]; u8 reserved_at_20[0x2]; u8 stall_detect[0x1]; @@ -1767,6 +1769,132 @@ struct mlx5_ifc_resize_field_select_bits { u8 resize_field_select[0x20]; }; +struct mlx5_ifc_resource_dump_bits { + u8 more_dump[0x1]; + u8 inline_dump[0x1]; + u8 reserved_at_2[0xa]; + u8 seq_num[0x4]; + u8 segment_type[0x10]; + + u8 reserved_at_20[0x10]; + u8 vhca_id[0x10]; + + u8 index1[0x20]; + + u8 index2[0x20]; + + u8 num_of_obj1[0x10]; + u8 num_of_obj2[0x10]; + + u8 reserved_at_a0[0x20]; + + u8 device_opaque[0x40]; + + u8 mkey[0x20]; + + u8 size[0x20]; + + u8 address[0x40]; + + u8 inline_data[52][0x20]; +}; + +struct mlx5_ifc_resource_dump_menu_record_bits { + u8 reserved_at_0[0x4]; + u8 num_of_obj2_supports_active[0x1]; + u8 num_of_obj2_supports_all[0x1]; + u8 must_have_num_of_obj2[0x1]; + u8 support_num_of_obj2[0x1]; + u8 num_of_obj1_supports_active[0x1]; + u8 num_of_obj1_supports_all[0x1]; + u8 must_have_num_of_obj1[0x1]; + u8 support_num_of_obj1[0x1]; + u8 must_have_index2[0x1]; + u8 support_index2[0x1]; + u8 must_have_index1[0x1]; + u8 support_index1[0x1]; + u8 segment_type[0x10]; + + u8 segment_name[4][0x20]; + + u8 index1_name[4][0x20]; + + u8 index2_name[4][0x20]; +}; + +struct mlx5_ifc_resource_dump_segment_header_bits { + u8 length_dw[0x10]; + u8 segment_type[0x10]; +}; + +struct mlx5_ifc_resource_dump_command_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; + + u8 segment_called[0x10]; + u8 vhca_id[0x10]; + + u8 index1[0x20]; + + u8 index2[0x20]; + + u8 num_of_obj1[0x10]; + u8 num_of_obj2[0x10]; +}; + +struct mlx5_ifc_resource_dump_error_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; + + u8 reserved_at_20[0x10]; + u8 syndrome_id[0x10]; + + u8 reserved_at_40[0x40]; + + u8 error[8][0x20]; +}; + +struct mlx5_ifc_resource_dump_info_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; + + u8 reserved_at_20[0x18]; + u8 dump_version[0x8]; + + u8 hw_version[0x20]; + + u8 fw_version[0x20]; +}; + +struct mlx5_ifc_resource_dump_menu_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; + + u8 reserved_at_20[0x10]; + u8 num_of_records[0x10]; + + struct mlx5_ifc_resource_dump_menu_record_bits record[0]; +}; + +struct mlx5_ifc_resource_dump_resource_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; + + u8 reserved_at_20[0x20]; + + u8 index1[0x20]; + + u8 index2[0x20]; + + u8 payload[0][0x20]; +}; + +struct mlx5_ifc_resource_dump_terminate_segment_bits { + struct mlx5_ifc_resource_dump_segment_header_bits segment_header; +}; + +struct mlx5_ifc_menu_resource_dump_response_bits { + struct mlx5_ifc_resource_dump_info_segment_bits info; + struct mlx5_ifc_resource_dump_command_segment_bits cmd; + struct mlx5_ifc_resource_dump_menu_segment_bits menu; + struct mlx5_ifc_resource_dump_terminate_segment_bits terminate; +}; + enum { MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_PERIOD = 0x1, MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_MAX_COUNT = 0x2, -- cgit v1.2.3 From 31d8bde1c8812c9b44065dcd98e554488c6a98d2 Mon Sep 17 00:00:00 2001 From: Hamdan Igbaria Date: Thu, 9 Jan 2020 13:26:53 +0200 Subject: net/mlx5: Add copy header action struct layout Add definition for copy header action, copy action is used to copy header fields from source to destination. Signed-off-by: Hamdan Igbaria Signed-off-by: Alex Vesker Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 6fe0431e11ec..23613a6ea51c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5609,6 +5609,21 @@ struct mlx5_ifc_add_action_in_bits { u8 data[0x20]; }; +struct mlx5_ifc_copy_action_in_bits { + u8 action_type[0x4]; + u8 src_field[0xc]; + u8 reserved_at_10[0x3]; + u8 src_offset[0x5]; + u8 reserved_at_18[0x3]; + u8 length[0x5]; + + u8 reserved_at_20[0x4]; + u8 dst_field[0xc]; + u8 reserved_at_30[0x3]; + u8 dst_offset[0x5]; + u8 reserved_at_38[0x8]; +}; + union mlx5_ifc_set_action_in_add_action_in_auto_bits { struct mlx5_ifc_set_action_in_bits set_action_in; struct mlx5_ifc_add_action_in_bits add_action_in; @@ -5618,6 +5633,7 @@ union mlx5_ifc_set_action_in_add_action_in_auto_bits { enum { MLX5_ACTION_TYPE_SET = 0x1, MLX5_ACTION_TYPE_ADD = 0x2, + MLX5_ACTION_TYPE_COPY = 0x3, }; enum { -- cgit v1.2.3 From 822e114b50641d3b57d2eb30939e60d8b4758288 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Mon, 1 Apr 2019 13:31:32 +0300 Subject: net/mlx5: Add mlx5_ifc definitions for connection tracking support Add the required hardware definitions to mlx5_ifc: ignore_flow_level, registers, copy_header, and fwd_and_modify cap. Signed-off-by: Paul Blakey Reviewed-by: Roi Dayan Reviewed-by: Oz Sholomo Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 23613a6ea51c..e9c165ffe3f9 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -375,8 +375,17 @@ struct mlx5_ifc_flow_table_fields_supported_bits { u8 outer_esp_spi[0x1]; u8 reserved_at_58[0x2]; u8 bth_dst_qp[0x1]; + u8 reserved_at_5b[0x5]; - u8 reserved_at_5b[0x25]; + u8 reserved_at_60[0x18]; + u8 metadata_reg_c_7[0x1]; + u8 metadata_reg_c_6[0x1]; + u8 metadata_reg_c_5[0x1]; + u8 metadata_reg_c_4[0x1]; + u8 metadata_reg_c_3[0x1]; + u8 metadata_reg_c_2[0x1]; + u8 metadata_reg_c_1[0x1]; + u8 metadata_reg_c_0[0x1]; }; struct mlx5_ifc_flow_table_prop_layout_bits { @@ -401,7 +410,8 @@ struct mlx5_ifc_flow_table_prop_layout_bits { u8 reformat_l3_tunnel_to_l2[0x1]; u8 reformat_l2_to_l3_tunnel[0x1]; u8 reformat_and_modify_action[0x1]; - u8 reserved_at_15[0x2]; + u8 ignore_flow_level[0x1]; + u8 reserved_at_16[0x1]; u8 table_miss_action_domain[0x1]; u8 termination_table[0x1]; u8 reserved_at_19[0x7]; @@ -722,7 +732,9 @@ enum { struct mlx5_ifc_flow_table_eswitch_cap_bits { u8 fdb_to_vport_reg_c_id[0x8]; - u8 reserved_at_8[0xf]; + u8 reserved_at_8[0xd]; + u8 fdb_modify_header_fwd_to_table[0x1]; + u8 reserved_at_16[0x1]; u8 flow_source[0x1]; u8 reserved_at_18[0x2]; u8 multi_fdb_encap[0x1]; @@ -4141,7 +4153,8 @@ struct mlx5_ifc_set_fte_in_bits { u8 reserved_at_a0[0x8]; u8 table_id[0x18]; - u8 reserved_at_c0[0x18]; + u8 ignore_flow_level[0x1]; + u8 reserved_at_c1[0x17]; u8 modify_enable_mask[0x8]; u8 reserved_at_e0[0x20]; @@ -5627,6 +5640,7 @@ struct mlx5_ifc_copy_action_in_bits { union mlx5_ifc_set_action_in_add_action_in_auto_bits { struct mlx5_ifc_set_action_in_bits set_action_in; struct mlx5_ifc_add_action_in_bits add_action_in; + struct mlx5_ifc_copy_action_in_bits copy_action_in; u8 reserved_at_0[0x40]; }; @@ -5669,6 +5683,8 @@ enum { MLX5_ACTION_IN_FIELD_METADATA_REG_C_3 = 0x54, MLX5_ACTION_IN_FIELD_METADATA_REG_C_4 = 0x55, MLX5_ACTION_IN_FIELD_METADATA_REG_C_5 = 0x56, + MLX5_ACTION_IN_FIELD_METADATA_REG_C_6 = 0x57, + MLX5_ACTION_IN_FIELD_METADATA_REG_C_7 = 0x58, MLX5_ACTION_IN_FIELD_OUT_TCP_SEQ_NUM = 0x59, MLX5_ACTION_IN_FIELD_OUT_TCP_ACK_NUM = 0x5B, }; -- cgit v1.2.3 From a58837f52d432f32995b1c00e803cc4db18762d3 Mon Sep 17 00:00:00 2001 From: Aya Levin Date: Mon, 30 Dec 2019 14:22:57 +0200 Subject: net/mlx5e: Expose FEC feilds and related capability bit Introduce 50G per lane FEC modes capability bit and newly supported fields in PPLM register which allow this configuration. Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index e9c165ffe3f9..2ab4562b4851 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -8581,6 +8581,18 @@ struct mlx5_ifc_pplm_reg_bits { u8 fec_override_admin_50g[0x4]; u8 fec_override_admin_25g[0x4]; u8 fec_override_admin_10g_40g[0x4]; + + u8 fec_override_cap_400g_8x[0x10]; + u8 fec_override_cap_200g_4x[0x10]; + + u8 fec_override_cap_100g_2x[0x10]; + u8 fec_override_cap_50g_1x[0x10]; + + u8 fec_override_admin_400g_8x[0x10]; + u8 fec_override_admin_200g_4x[0x10]; + + u8 fec_override_admin_100g_2x[0x10]; + u8 fec_override_admin_50g_1x[0x10]; }; struct mlx5_ifc_ppcnt_reg_bits { @@ -8907,7 +8919,9 @@ struct mlx5_ifc_mpegc_reg_bits { }; struct mlx5_ifc_pcam_enhanced_features_bits { - u8 reserved_at_0[0x6d]; + u8 reserved_at_0[0x68]; + u8 fec_50G_per_lane_in_pplm[0x1]; + u8 reserved_at_69[0x4]; u8 rx_icrc_encapsulated_counter[0x1]; u8 reserved_at_6e[0x4]; u8 ptys_extended_ethernet[0x1]; -- cgit v1.2.3 From 827a8cb2dd2b72848652b2a425bba3262808ff44 Mon Sep 17 00:00:00 2001 From: Aharon Landau Date: Mon, 16 Dec 2019 12:50:13 +0200 Subject: net/mlx5e: Add discard counters per priority Add counters that count (per priority) the number of received packets that dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. Signed-off-by: Aharon Landau Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 2ab4562b4851..ee0a34d66c7c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -2180,7 +2180,9 @@ struct mlx5_ifc_eth_per_prio_grp_data_layout_bits { u8 rx_pause_transition_low[0x20]; - u8 reserved_at_3c0[0x40]; + u8 rx_discards_high[0x20]; + + u8 rx_discards_low[0x20]; u8 device_stall_minor_watermark_cnt_high[0x20]; -- cgit v1.2.3 From 61dc7b0141c51f5fa4aed97e49f9cf102ec51479 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Thu, 14 Nov 2019 16:59:58 +0200 Subject: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey Reviewed-by: Roi Dayan Reviewed-by: Oz Shlomo Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 4e5b84e66822..a3f8b63839de 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -145,25 +145,25 @@ mlx5_get_flow_vport_acl_namespace(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type type, int vport); -struct mlx5_flow_table * -mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns, - int prio, - int num_flow_table_entries, - int max_num_groups, - u32 level, - u32 flags); - struct mlx5_flow_table_attr { int prio; int max_fte; u32 level; u32 flags; + + struct { + int max_num_groups; + } autogroup; }; struct mlx5_flow_table * mlx5_create_flow_table(struct mlx5_flow_namespace *ns, struct mlx5_flow_table_attr *ft_attr); +struct mlx5_flow_table * +mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns, + struct mlx5_flow_table_attr *ft_attr); + struct mlx5_flow_table * mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns, int prio, -- cgit v1.2.3 From 5281a0c909194c477656e89401ac11dd7b29ad2d Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Tue, 23 Jul 2019 11:43:57 +0300 Subject: net/mlx5: fs_core: Introduce unmanaged flow tables Currently, Most of the steering tree is statically declared ahead of time, with steering prios instances allocated for each fdb chain to assign max number of levels for each of them. This allows fs_core to manage the connections and levels of the flow tables hierarcy to prevent loops, but restricts us with the number of supported chains and priorities. Introduce unmananged flow tables, allowing the user to manage the flow table connections. A unamanged table is detached from the fs_core flow table hierarcy, and is only connected back to the hierarchy by explicit FTEs forward actions. This will be used together with firmware that supports ignoring the flow table levels to increase the number of supported chains and prios. Signed-off-by: Paul Blakey Reviewed-by: Roi Dayan Reviewed-by: Oz Shlomo Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index a3f8b63839de..de2c838bae90 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -48,6 +48,7 @@ enum { MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT = BIT(0), MLX5_FLOW_TABLE_TUNNEL_EN_DECAP = BIT(1), MLX5_FLOW_TABLE_TERMINATION = BIT(2), + MLX5_FLOW_TABLE_UNMANAGED = BIT(3), }; #define LEFTOVERS_RULE_NUM 2 @@ -150,6 +151,7 @@ struct mlx5_flow_table_attr { int max_fte; u32 level; u32 flags; + struct mlx5_flow_table *next_ft; struct { int max_num_groups; -- cgit v1.2.3 From ff189b43568216c6211e9e7ddd9026cb8295e744 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Sun, 5 Jan 2020 15:15:54 +0200 Subject: net/mlx5: Add ignore level support fwd to table rules If user sets ignore flow level flag on a rule, that rule can point to a flow table of any level, including those with levels equal or less than the level of the flow table it is added on. This with unamanged tables will be used to create a FDB chain/prio hierarchy much larger than currently supported level range. Signed-off-by: Paul Blakey Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index de2c838bae90..81f393fb7d96 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -196,6 +196,7 @@ struct mlx5_fs_vlan { enum { FLOW_ACT_NO_APPEND = BIT(0), + FLOW_ACT_IGNORE_FLOW_LEVEL = BIT(1), }; struct mlx5_flow_act { -- cgit v1.2.3 From 79cdb0aaea8b5478db34afa1d4d5ecc808689a67 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Thu, 14 Nov 2019 17:02:59 +0200 Subject: net/mlx5: Allow creating autogroups with reserved entries Exclude the last n entries for an autogrouped flow table. Reserving entries at the end of the FT will ensure that this FG will be the last to be evaluated. This will be used in the next patch to create a miss group enabling custom actions on FT miss. Signed-off-by: Paul Blakey Reviewed-by: Roi Dayan Reviewed-by: Oz Shlomo Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 81f393fb7d96..4cae16016b2b 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -155,6 +155,7 @@ struct mlx5_flow_table_attr { struct { int max_num_groups; + int num_reserved_entries; } autogroup; }; -- cgit v1.2.3 From 75ccae62cb8d42a619323a85c577107b8b37d797 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Thu, 16 Jan 2020 16:14:44 +0100 Subject: xdp: Move devmap bulk queue into struct net_device MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. To avoid other fields of struct net_device moving to different cache lines, we also move a couple of other members around. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. After this patch, the relevant part of struct net_device looks like this, according to pahole: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ unsigned int tx_queue_len; /* 920 4 */ spinlock_t tx_global_lock; /* 924 4 */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */ struct xps_dev_maps * xps_cpus_map; /* 936 8 */ struct xps_dev_maps * xps_rxqs_map; /* 944 8 */ struct mini_Qdisc * miniq_egress; /* 952 8 */ /* --- cacheline 15 boundary (960 bytes) --- */ struct hlist_head qdisc_hash[16]; /* 960 128 */ /* --- cacheline 17 boundary (1088 bytes) --- */ struct timer_list watchdog_timer; /* 1088 40 */ /* XXX last struct has 4 bytes of padding */ int watchdog_timeo; /* 1128 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head todo_list; /* 1136 16 */ /* --- cacheline 18 boundary (1152 bytes) --- */ Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Alexei Starovoitov Acked-by: Björn Töpel Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk --- include/linux/netdevice.h | 13 ++++++++----- include/trace/events/xdp.h | 2 +- 2 files changed, 9 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2741aa35bec6..5ec3537fbdb1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -876,6 +876,7 @@ enum bpf_netdev_command { struct bpf_prog_offload_ops; struct netlink_ext_ack; struct xdp_umem; +struct xdp_dev_bulk_queue; struct netdev_bpf { enum bpf_netdev_command command; @@ -1986,12 +1987,10 @@ struct net_device { unsigned int num_tx_queues; unsigned int real_num_tx_queues; struct Qdisc *qdisc; -#ifdef CONFIG_NET_SCHED - DECLARE_HASHTABLE (qdisc_hash, 4); -#endif unsigned int tx_queue_len; spinlock_t tx_global_lock; - int watchdog_timeo; + + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_cpus_map; @@ -2001,11 +2000,15 @@ struct net_device { struct mini_Qdisc __rcu *miniq_egress; #endif +#ifdef CONFIG_NET_SCHED + DECLARE_HASHTABLE (qdisc_hash, 4); +#endif /* These may be needed for future network-power-down code. */ struct timer_list watchdog_timer; + int watchdog_timeo; - int __percpu *pcpu_refcnt; struct list_head todo_list; + int __percpu *pcpu_refcnt; struct list_head link_watch_list; diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index a7378bcd9928..72bad13d4a3c 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit, ), TP_fast_assign( - __entry->map_id = map->id; + __entry->map_id = map ? map->id : 0; __entry->act = XDP_REDIRECT; __entry->map_index = map_index; __entry->drops = drops; -- cgit v1.2.3 From 1d233886dd904edbf239eeffe435c3308ae97625 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Thu, 16 Jan 2020 16:14:45 +0100 Subject: xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Alexei Starovoitov Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk --- include/linux/bpf.h | 13 +++++- include/linux/filter.h | 10 ++++- include/trace/events/xdp.h | 101 ++++++++++++++++++++------------------------- 3 files changed, 63 insertions(+), 61 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3517e32149a4..8e3b8f4ad183 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1056,7 +1056,9 @@ struct sk_buff; struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); -void __dev_map_flush(void); +void __dev_flush(void); +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -1169,13 +1171,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map return NULL; } -static inline void __dev_map_flush(void) +static inline void __dev_flush(void) { } struct xdp_buff; struct bpf_dtab_netdev; +static inline +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + return 0; +} + static inline int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx) diff --git a/include/linux/filter.h b/include/linux/filter.h index a366a0b64a57..f349e2c0884c 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -918,7 +918,7 @@ static inline int xdp_ok_fwd_dev(const struct net_device *fwd, return 0; } -/* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the +/* The pair of xdp_do_redirect and xdp_do_flush MUST be called in the * same cpu context. Further for best results no more than a single map * for the do_redirect/do_flush pair should be used. This limitation is * because we only track one map and force a flush when the map changes. @@ -929,7 +929,13 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *prog); -void xdp_do_flush_map(void); +void xdp_do_flush(void); + +/* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as + * it is no longer only flushing maps. Keep this define for compatibility + * until all drivers are updated - do not use xdp_do_flush_map() in new code! + */ +#define xdp_do_flush_map xdp_do_flush void bpf_warn_invalid_xdp_action(u32 act); diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 72bad13d4a3c..b680973687b4 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -79,14 +79,26 @@ TRACE_EVENT(xdp_bulk_tx, __entry->sent, __entry->drops, __entry->err) ); +#ifndef __DEVMAP_OBJ_TYPE +#define __DEVMAP_OBJ_TYPE +struct _bpf_dtab_netdev { + struct net_device *dev; +}; +#endif /* __DEVMAP_OBJ_TYPE */ + +#define devmap_ifindex(tgt, map) \ + (((map->map_type == BPF_MAP_TYPE_DEVMAP || \ + map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \ + ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0) + DECLARE_EVENT_CLASS(xdp_redirect_template, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), + const void *tgt, int err, + const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), + TP_ARGS(dev, xdp, tgt, err, map, index), TP_STRUCT__entry( __field(int, prog_id) @@ -103,90 +115,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template, __entry->act = XDP_REDIRECT; __entry->ifindex = dev->ifindex; __entry->err = err; - __entry->to_ifindex = to_ifindex; + __entry->to_ifindex = map ? devmap_ifindex(tgt, map) : + index; __entry->map_id = map ? map->id : 0; - __entry->map_index = map_index; + __entry->map_index = map ? index : 0; ), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d", + TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" + " map_id=%d map_index=%d", __entry->prog_id, __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), __entry->ifindex, __entry->to_ifindex, - __entry->err) + __entry->err, __entry->map_id, __entry->map_index) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); #define _trace_xdp_redirect(dev, xdp, to) \ - trace_xdp_redirect(dev, xdp, to, 0, NULL, 0); + trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to); #define _trace_xdp_redirect_err(dev, xdp, to, err) \ - trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0); + trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to); + +#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \ + trace_xdp_redirect(dev, xdp, to, 0, map, index); + +#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \ + trace_xdp_redirect_err(dev, xdp, to, err, map, index); -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map, +/* not used anymore, but kept around so as not to break old programs */ +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" - " map_id=%d map_index=%d", - __entry->prog_id, - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), - __entry->ifindex, __entry->to_ifindex, - __entry->err, - __entry->map_id, __entry->map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err, +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" - " map_id=%d map_index=%d", - __entry->prog_id, - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), - __entry->ifindex, __entry->to_ifindex, - __entry->err, - __entry->map_id, __entry->map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); -#ifndef __DEVMAP_OBJ_TYPE -#define __DEVMAP_OBJ_TYPE -struct _bpf_dtab_netdev { - struct net_device *dev; -}; -#endif /* __DEVMAP_OBJ_TYPE */ - -#define devmap_ifindex(fwd, map) \ - ((map->map_type == BPF_MAP_TYPE_DEVMAP || \ - map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \ - ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0) - -#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \ - trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \ - 0, map, idx) - -#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err) \ - trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \ - err, map, idx) - TRACE_EVENT(xdp_cpumap_kthread, TP_PROTO(int map_id, unsigned int processed, unsigned int drops, -- cgit v1.2.3 From 58aa94f922c1b44e0919d1814d2eab5b9e8bf945 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Thu, 16 Jan 2020 16:14:46 +0100 Subject: devmap: Adjust tracepoint for map-less queue flush MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that we don't have a reference to a devmap when flushing the device bulk queue, let's change the the devmap_xmit tracepoint to remote the map_id and map_index fields entirely. Rearrange the fields so 'drops' and 'sent' stay in the same position in the tracepoint struct, to make it possible for the xdp_monitor utility to read both the old and the new format. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk --- include/trace/events/xdp.h | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) (limited to 'include') diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index b680973687b4..b95d65e8c628 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -246,43 +246,38 @@ TRACE_EVENT(xdp_cpumap_enqueue, TRACE_EVENT(xdp_devmap_xmit, - TP_PROTO(const struct bpf_map *map, u32 map_index, - int sent, int drops, - const struct net_device *from_dev, - const struct net_device *to_dev, int err), + TP_PROTO(const struct net_device *from_dev, + const struct net_device *to_dev, + int sent, int drops, int err), - TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err), + TP_ARGS(from_dev, to_dev, sent, drops, err), TP_STRUCT__entry( - __field(int, map_id) + __field(int, from_ifindex) __field(u32, act) - __field(u32, map_index) + __field(int, to_ifindex) __field(int, drops) __field(int, sent) - __field(int, from_ifindex) - __field(int, to_ifindex) __field(int, err) ), TP_fast_assign( - __entry->map_id = map ? map->id : 0; + __entry->from_ifindex = from_dev->ifindex; __entry->act = XDP_REDIRECT; - __entry->map_index = map_index; + __entry->to_ifindex = to_dev->ifindex; __entry->drops = drops; __entry->sent = sent; - __entry->from_ifindex = from_dev->ifindex; - __entry->to_ifindex = to_dev->ifindex; __entry->err = err; ), TP_printk("ndo_xdp_xmit" - " map_id=%d map_index=%d action=%s" + " from_ifindex=%d to_ifindex=%d action=%s" " sent=%d drops=%d" - " from_ifindex=%d to_ifindex=%d err=%d", - __entry->map_id, __entry->map_index, + " err=%d", + __entry->from_ifindex, __entry->to_ifindex, __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), __entry->sent, __entry->drops, - __entry->from_ifindex, __entry->to_ifindex, __entry->err) + __entry->err) ); /* Expect users already include , but not xdp_priv.h */ -- cgit v1.2.3 From 080bb352fad00d04995102f681b134e3754bfb6e Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Wed, 15 Jan 2020 20:48:50 -0800 Subject: net: phy: Maintain MDIO device and bus statistics We maintain global statistics for an entire MDIO bus, as well as broken down, per MDIO bus address statistics. Given that it is possible for MDIO devices such as switches to access MDIO bus addresses for which there is not a mdio_device instance created (therefore not a a corresponding device directory in sysfs either), we also maintain per-address statistics under the statistics folder. The layout looks like this: /sys/class/mdio_bus/../statistics/ transfers errrors writes reads transfers_ errors_ writes_ reads_ When a mdio_device instance is registered, a statistics/ folder is created with the tranfers, errors, writes and reads attributes which point to the appropriate MDIO bus statistics structure. Statistics are 64-bit unsigned quantities and maintained through the u64_stats_sync.h helper functions. Signed-off-by: Florian Fainelli Tested-by: Andrew Lunn Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 2929d0bc307f..99a87f02667f 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -22,6 +22,7 @@ #include #include #include +#include #include @@ -212,6 +213,15 @@ struct sfp_bus; struct sfp_upstream_ops; struct sk_buff; +struct mdio_bus_stats { + u64_stats_t transfers; + u64_stats_t errors; + u64_stats_t writes; + u64_stats_t reads; + /* Must be last, add new statistics above */ + struct u64_stats_sync syncp; +}; + /* * The Bus class for PHYs. Devices which provide access to * PHYs should register using this structure @@ -224,6 +234,7 @@ struct mii_bus { int (*read)(struct mii_bus *bus, int addr, int regnum); int (*write)(struct mii_bus *bus, int addr, int regnum, u16 val); int (*reset)(struct mii_bus *bus); + struct mdio_bus_stats stats[PHY_MAX_ADDR]; /* * A lock to ensure that only one thing can read/write -- cgit v1.2.3 From 56f200c78ce4d94680a27a1ce97a29ebeb4f23e1 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Thu, 16 Jan 2020 21:16:46 +0100 Subject: netns: Constify exported functions Mark function parameters as 'const' where possible. Signed-off-by: Guillaume Nault Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller --- include/net/net_namespace.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index b8ceaf0cd997..854d39ef1ca3 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -347,9 +347,9 @@ static inline struct net *read_pnet(const possible_net_t *pnet) #endif int peernet2id_alloc(struct net *net, struct net *peer, gfp_t gfp); -int peernet2id(struct net *net, struct net *peer); -bool peernet_has_id(struct net *net, struct net *peer); -struct net *get_net_ns_by_id(struct net *net, int id); +int peernet2id(const struct net *net, struct net *peer); +bool peernet_has_id(const struct net *net, struct net *peer); +struct net *get_net_ns_by_id(const struct net *net, int id); struct pernet_operations { struct list_head list; @@ -427,7 +427,7 @@ static inline void unregister_net_sysctl_table(struct ctl_table_header *header) } #endif -static inline int rt_genid_ipv4(struct net *net) +static inline int rt_genid_ipv4(const struct net *net) { return atomic_read(&net->ipv4.rt_genid); } @@ -459,7 +459,7 @@ static inline void rt_genid_bump_all(struct net *net) rt_genid_bump_ipv6(net); } -static inline int fnhe_genid(struct net *net) +static inline int fnhe_genid(const struct net *net) { return atomic_read(&net->fnhe_genid); } -- cgit v1.2.3 From 95f0ead8f04bec18e474594ef585f3734bd85b4c Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Sun, 19 Jan 2020 15:00:48 +0200 Subject: devlink: Add non-routable packet trap Add packet trap that can report packets that reached the router, but are non-routable. For example, IGMP queries can be flooded by the device in layer 2 and reach the router. Such packets should not be routed and instead dropped. Signed-off-by: Amit Cohen Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/net/devlink.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index a6856f1d5d1f..08b757753e1c 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -591,6 +591,7 @@ enum devlink_trap_generic_id { DEVLINK_TRAP_GENERIC_ID_REJECT_ROUTE, DEVLINK_TRAP_GENERIC_ID_IPV4_LPM_UNICAST_MISS, DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS, + DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE, /* Add new generic trap IDs above */ __DEVLINK_TRAP_GENERIC_ID_MAX, @@ -659,6 +660,8 @@ enum devlink_trap_group_generic_id { "ipv4_lpm_miss" #define DEVLINK_TRAP_GENERIC_NAME_IPV6_LPM_UNICAST_MISS \ "ipv6_lpm_miss" +#define DEVLINK_TRAP_GENERIC_NAME_NON_ROUTABLE \ + "non_routable_packet" #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \ "l2_drops" -- cgit v1.2.3 From 13c056ec7d006b11557cebd9f1803edd646d2876 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Sun, 19 Jan 2020 15:00:54 +0200 Subject: devlink: Add tunnel generic packet traps Add packet traps that can report packets that were dropped during tunnel decapsulation. Signed-off-by: Amit Cohen Acked-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/net/devlink.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 08b757753e1c..455282a4b714 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -592,6 +592,7 @@ enum devlink_trap_generic_id { DEVLINK_TRAP_GENERIC_ID_IPV4_LPM_UNICAST_MISS, DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS, DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE, + DEVLINK_TRAP_GENERIC_ID_DECAP_ERROR, /* Add new generic trap IDs above */ __DEVLINK_TRAP_GENERIC_ID_MAX, @@ -605,6 +606,7 @@ enum devlink_trap_group_generic_id { DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS, DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS, DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS, + DEVLINK_TRAP_GROUP_GENERIC_ID_TUNNEL_DROPS, /* Add new generic trap group IDs above */ __DEVLINK_TRAP_GROUP_GENERIC_ID_MAX, @@ -662,6 +664,8 @@ enum devlink_trap_group_generic_id { "ipv6_lpm_miss" #define DEVLINK_TRAP_GENERIC_NAME_NON_ROUTABLE \ "non_routable_packet" +#define DEVLINK_TRAP_GENERIC_NAME_DECAP_ERROR \ + "decap_error" #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \ "l2_drops" @@ -669,6 +673,8 @@ enum devlink_trap_group_generic_id { "l3_drops" #define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \ "buffer_drops" +#define DEVLINK_TRAP_GROUP_GENERIC_NAME_TUNNEL_DROPS \ + "tunnel_drops" #define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \ { \ -- cgit v1.2.3 From c3cae4916e57d2f0364d5e7769218547fb1b7c60 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Sun, 19 Jan 2020 15:00:58 +0200 Subject: devlink: Add overlay source MAC is multicast trap Add packet trap that can report NVE packets that the device decided to drop because their overlay source MAC is multicast. Signed-off-by: Amit Cohen Acked-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/net/devlink.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 455282a4b714..2813fd06ee89 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -593,6 +593,7 @@ enum devlink_trap_generic_id { DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS, DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE, DEVLINK_TRAP_GENERIC_ID_DECAP_ERROR, + DEVLINK_TRAP_GENERIC_ID_OVERLAY_SMAC_MC, /* Add new generic trap IDs above */ __DEVLINK_TRAP_GENERIC_ID_MAX, @@ -666,6 +667,8 @@ enum devlink_trap_group_generic_id { "non_routable_packet" #define DEVLINK_TRAP_GENERIC_NAME_DECAP_ERROR \ "decap_error" +#define DEVLINK_TRAP_GENERIC_NAME_OVERLAY_SMAC_MC \ + "overlay_smac_is_mc" #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \ "l2_drops" -- cgit v1.2.3 From 2ab1d925aa4c0c179dd1eb492e8c03536972707b Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 19 Jan 2020 14:31:55 +0100 Subject: net: phy: add generic ndo_do_ioctl handler phy_do_ioctl A number of network drivers has the same glue code to use phy_mii_ioctl as ndo_do_ioctl handler. So let's add such a generic ndo_do_ioctl handler to phylib. Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 99a87f02667f..be6b3a1b03da 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1242,6 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev, int phy_ethtool_ksettings_set(struct phy_device *phydev, const struct ethtool_link_ksettings *cmd); int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd); +int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); -- cgit v1.2.3 From 3231e5d2228a2078ce5982d63ea9a617e4972c00 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 20 Jan 2020 22:16:07 +0100 Subject: net: phy: rename phy_do_ioctl to phy_do_ioctl_running We just added phy_do_ioctl, but it turned out that we need another version of this function that doesn't check whether net_device is running. So rename phy_do_ioctl to phy_do_ioctl_running. Signed-off-by: Heiner Kallweit Reviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index be6b3a1b03da..f6e714da37d8 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1242,7 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev, int phy_ethtool_ksettings_set(struct phy_device *phydev, const struct ethtool_link_ksettings *cmd); int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd); -int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); +int phy_do_ioctl_running(struct net_device *dev, struct ifreq *ifr, int cmd); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); -- cgit v1.2.3 From bbbf8430afe6906abbf879352fe10d24d380e588 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 20 Jan 2020 22:17:11 +0100 Subject: net: phy: add new version of phy_do_ioctl Add a new version of phy_do_ioctl that doesn't check whether net_device is running. It will typically be used if suitable drivers attach the PHY in probe already. Signed-off-by: Heiner Kallweit Reviewed-by: Florian Fainelli Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index f6e714da37d8..c570e162e05e 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1242,6 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev, int phy_ethtool_ksettings_set(struct phy_device *phydev, const struct ethtool_link_ksettings *cmd); int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd); +int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); int phy_do_ioctl_running(struct net_device *dev, struct ifreq *ifr, int cmd); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); -- cgit v1.2.3 From f362e5fe0f1f9774b28c5b4c9b07b0168575cf6e Mon Sep 17 00:00:00 2001 From: Martin Schiller Date: Tue, 21 Jan 2020 07:00:33 +0100 Subject: wan/hdlc_x25: make lapb params configurable This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via sethdlc (which needs to be patched as well). Signed-off-by: Martin Schiller Signed-off-by: David S. Miller --- include/uapi/linux/hdlc/ioctl.h | 9 +++++++++ include/uapi/linux/if.h | 1 + 2 files changed, 10 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/hdlc/ioctl.h b/include/uapi/linux/hdlc/ioctl.h index 0fe4238e8246..b06341acab5e 100644 --- a/include/uapi/linux/hdlc/ioctl.h +++ b/include/uapi/linux/hdlc/ioctl.h @@ -79,6 +79,15 @@ typedef struct { unsigned int timeout; } cisco_proto; +typedef struct { + unsigned short dce; /* 1 for DCE (network side) operation */ + unsigned int modulo; /* modulo (8 = basic / 128 = extended) */ + unsigned int window; /* frame window size */ + unsigned int t1; /* timeout t1 */ + unsigned int t2; /* timeout t2 */ + unsigned int n2; /* frame retry counter */ +} x25_hdlc_proto; + /* PPP doesn't need any info now - supply length = 0 to ioctl */ #endif /* __ASSEMBLY__ */ diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h index 4bf33344aab1..be714cd8c826 100644 --- a/include/uapi/linux/if.h +++ b/include/uapi/linux/if.h @@ -213,6 +213,7 @@ struct if_settings { fr_proto __user *fr; fr_proto_pvc __user *fr_pvc; fr_proto_pvc_info __user *fr_pvc_info; + x25_hdlc_proto __user *x25; /* interface settings */ sync_serial_settings __user *sync; -- cgit v1.2.3 From 43a825afc91e2b06af1e8e7422198e759c2c5e20 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= Date: Mon, 20 Jan 2020 10:29:17 +0100 Subject: xsk, net: Make sock_def_readable() have external linkage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit XDP sockets use the default implementation of struct sock's sk_data_ready callback, which is sock_def_readable(). This function is called in the XDP socket fast-path, and involves a retpoline. By letting sock_def_readable() have external linkage, and being called directly, the retpoline can be avoided. Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 8dff68b4c316..0891c55f1e82 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2612,4 +2612,6 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) return false; } +void sock_def_readable(struct sock *sk); + #endif /* _SOCK_H */ -- cgit v1.2.3 From be8704ff07d2374bcc5c675526f95e70c6459683 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Mon, 20 Jan 2020 16:53:46 -0800 Subject: bpf: Introduce dynamic program extensions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce dynamic program extensions. The users can load additional BPF functions and replace global functions in previously loaded BPF programs while these programs are executing. Global functions are verified individually by the verifier based on their types only. Hence the global function in the new program which types match older function can safely replace that corresponding function. This new function/program is called 'an extension' of old program. At load time the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function to be replaced. The BPF program type is derived from the target program into extension program. Technically bpf_verifier_ops is copied from target program. The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops. The extension program can call the same bpf helper functions as target program. Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program types. The verifier allows only one level of replacement. Meaning that the extension program cannot recursively extend an extension. That also means that the maximum stack size is increasing from 512 to 1024 bytes and maximum function nesting level from 8 to 16. The programs don't always consume that much. The stack usage is determined by the number of on-stack variables used by the program. The verifier could have enforced 512 limit for combined original plus extension program, but it makes for difficult user experience. The main use case for extensions is to provide generic mechanism to plug external programs into policy program or function call chaining. BPF trampoline is used to track both fentry/fexit and program extensions because both are using the same nop slot at the beginning of every BPF function. Attaching fentry/fexit to a function that was replaced is not allowed. The opposite is true as well. Replacing a function that currently being analyzed with fentry/fexit is not allowed. The executable page allocated by BPF trampoline is not used by program extensions. This inefficiency will be optimized in future patches. Function by function verification of global function supports scalars and pointer to context only. Hence program extensions are supported for such class of global functions only. In the future the verifier will be extended with support to pointers to structures, arrays with sizes, etc. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Acked-by: Andrii Nakryiko Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org --- include/linux/bpf.h | 10 +++++++++- include/linux/bpf_types.h | 2 ++ include/linux/btf.h | 5 +++++ include/uapi/linux/bpf.h | 1 + 4 files changed, 17 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8e3b8f4ad183..05d16615054c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -465,7 +465,8 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); enum bpf_tramp_prog_type { BPF_TRAMP_FENTRY, BPF_TRAMP_FEXIT, - BPF_TRAMP_MAX + BPF_TRAMP_MAX, + BPF_TRAMP_REPLACE, /* more than MAX */ }; struct bpf_trampoline { @@ -480,6 +481,11 @@ struct bpf_trampoline { void *addr; bool ftrace_managed; } func; + /* if !NULL this is BPF_PROG_TYPE_EXT program that extends another BPF + * program by replacing one of its functions. func.addr is the address + * of the function it replaced. + */ + struct bpf_prog *extension_prog; /* list of BPF programs using this trampoline */ struct hlist_head progs_hlist[BPF_TRAMP_MAX]; /* Number of attached programs. A counter per kind. */ @@ -1107,6 +1113,8 @@ int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *regs); int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *reg); +int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog, + struct btf *btf, const struct btf_type *t); struct bpf_prog *bpf_prog_by_id(u32 id); diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 9f326e6ef885..c81d4ece79a4 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -68,6 +68,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport, #if defined(CONFIG_BPF_JIT) BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops, void *, void *) +BPF_PROG_TYPE(BPF_PROG_TYPE_EXT, bpf_extension, + void *, void *) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) diff --git a/include/linux/btf.h b/include/linux/btf.h index 881e9b76ef49..5c1ea99b480f 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -107,6 +107,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t) return BTF_INFO_VLEN(t->info); } +static inline u16 btf_func_linkage(const struct btf_type *t) +{ + return BTF_INFO_VLEN(t->info); +} + static inline bool btf_type_kflag(const struct btf_type *t) { return BTF_INFO_KFLAG(t->info); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 033d90a2282d..e81628eb059c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -180,6 +180,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_PROG_TYPE_TRACING, BPF_PROG_TYPE_STRUCT_OPS, + BPF_PROG_TYPE_EXT, }; enum bpf_attach_type { -- cgit v1.2.3 From 5576b991e9c1a11d2cc21c4b94fc75ec27603896 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 22 Jan 2020 15:36:46 -0800 Subject: bpf: Add BPF_FUNC_jiffies64 This patch adds a helper to read the 64bit jiffies. It will be used in a later patch to implement the bpf_cubic.c. The helper is inlined for jit_requested and 64 BITS_PER_LONG as the map_gen_lookup(). Other cases could be considered together with map_gen_lookup() if needed. Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 9 ++++++++- 2 files changed, 9 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 05d16615054c..a9687861fd7e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1414,6 +1414,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; extern const struct bpf_func_proto bpf_strtol_proto; extern const struct bpf_func_proto bpf_strtoul_proto; extern const struct bpf_func_proto bpf_tcp_sock_proto; +extern const struct bpf_func_proto bpf_jiffies64_proto; /* Shared helpers among cBPF and eBPF. */ void bpf_user_rnd_init_once(void); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e81628eb059c..f1d74a2bd234 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2886,6 +2886,12 @@ union bpf_attr { * **-EPERM** if no permission to send the *sig*. * * **-EAGAIN** if bpf program can try again. + * + * u64 bpf_jiffies64(void) + * Description + * Obtain the 64bit jiffies + * Return + * The 64 bit jiffies */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3005,7 +3011,8 @@ union bpf_attr { FN(probe_read_user_str), \ FN(probe_read_kernel_str), \ FN(tcp_send_ack), \ - FN(send_signal_thread), + FN(send_signal_thread), \ + FN(jiffies64), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call -- cgit v1.2.3 From 84bf557fb02f5924c109a21a160ffc353d878487 Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:24 +0530 Subject: net: sched: pie: move common code to pie.h This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 include/net/pie.h (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h new file mode 100644 index 000000000000..440213ec83eb --- /dev/null +++ b/include/net/pie.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __NET_SCHED_PIE_H +#define __NET_SCHED_PIE_H + +#include +#include +#include +#include +#include + +#define QUEUE_THRESHOLD 16384 +#define DQCOUNT_INVALID -1 +#define DTIME_INVALID 0xffffffffffffffff +#define MAX_PROB 0xffffffffffffffff +#define PIE_SCALE 8 + +/* parameters used */ +struct pie_params { + psched_time_t target; /* user specified target delay in pschedtime */ + u32 tupdate; /* timer frequency (in jiffies) */ + u32 limit; /* number of packets that can be enqueued */ + u32 alpha; /* alpha and beta are between 0 and 32 */ + u32 beta; /* and are used for shift relative to 1 */ + bool ecn; /* true if ecn is enabled */ + bool bytemode; /* to scale drop early prob based on pkt size */ + u8 dq_rate_estimator; /* to calculate delay using Little's law */ +}; + +/* variables used */ +struct pie_vars { + u64 prob; /* probability but scaled by u64 limit. */ + psched_time_t burst_time; + psched_time_t qdelay; + psched_time_t qdelay_old; + u64 dq_count; /* measured in bytes */ + psched_time_t dq_tstamp; /* drain rate */ + u64 accu_prob; /* accumulated drop probability */ + u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */ + u32 qlen_old; /* in bytes */ + u8 accu_prob_overflows; /* overflows of accu_prob */ +}; + +/* statistics gathering */ +struct pie_stats { + u32 packets_in; /* total number of packets enqueued */ + u32 dropped; /* packets dropped due to pie_action */ + u32 overlimit; /* dropped due to lack of space in queue */ + u32 maxq; /* maximum queue size */ + u32 ecn_mark; /* packets marked with ECN */ +}; + +/* private skb vars */ +struct pie_skb_cb { + psched_time_t enqueue_time; +}; + +static inline void pie_params_init(struct pie_params *params) +{ + params->alpha = 2; + params->beta = 20; + params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC); /* 15 ms */ + params->limit = 1000; /* default of 1000 packets */ + params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */ + params->ecn = false; + params->bytemode = false; + params->dq_rate_estimator = false; +} + +static inline void pie_vars_init(struct pie_vars *vars) +{ + vars->dq_count = DQCOUNT_INVALID; + vars->dq_tstamp = DTIME_INVALID; + vars->accu_prob = 0; + vars->avg_dq_rate = 0; + /* default of 150 ms in pschedtime */ + vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); + vars->accu_prob_overflows = 0; +} + +static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb) +{ + qdisc_cb_private_validate(skb, sizeof(struct pie_skb_cb)); + return (struct pie_skb_cb *)qdisc_skb_cb(skb)->data; +} + +static inline psched_time_t pie_get_enqueue_time(const struct sk_buff *skb) +{ + return get_pie_cb(skb)->enqueue_time; +} + +static inline void pie_set_enqueue_time(struct sk_buff *skb) +{ + get_pie_cb(skb)->enqueue_time = psched_get_time(); +} + +#endif -- cgit v1.2.3 From 805a5a23a4c4ee126e781bd14a1deb242395e817 Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:25 +0530 Subject: pie: use U64_MAX to denote (2^64 - 1) Use the U64_MAX macro to denote the constant (2^64 - 1). Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index 440213ec83eb..7ef375db5bab 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -10,8 +10,8 @@ #define QUEUE_THRESHOLD 16384 #define DQCOUNT_INVALID -1 -#define DTIME_INVALID 0xffffffffffffffff -#define MAX_PROB 0xffffffffffffffff +#define DTIME_INVALID U64_MAX +#define MAX_PROB U64_MAX #define PIE_SCALE 8 /* parameters used */ -- cgit v1.2.3 From cf4eeee5ff56180b525bfb6a204071216ca4000a Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:26 +0530 Subject: pie: rearrange macros in order of length Rearrange macros in order of length and align the values to improve readability. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index 7ef375db5bab..397c7abf0879 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -8,11 +8,11 @@ #include #include -#define QUEUE_THRESHOLD 16384 -#define DQCOUNT_INVALID -1 -#define DTIME_INVALID U64_MAX -#define MAX_PROB U64_MAX -#define PIE_SCALE 8 +#define MAX_PROB U64_MAX +#define DTIME_INVALID U64_MAX +#define QUEUE_THRESHOLD 16384 +#define DQCOUNT_INVALID -1 +#define PIE_SCALE 8 /* parameters used */ struct pie_params { -- cgit v1.2.3 From 1dbfc5e071db3f5acc3c7c87a564bf57b838cf49 Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:27 +0530 Subject: pie: use u8 instead of bool in pie_vars Linux best practice recommends using u8 for true/false values in structures. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index 397c7abf0879..f9c6a44bdb0c 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -21,8 +21,8 @@ struct pie_params { u32 limit; /* number of packets that can be enqueued */ u32 alpha; /* alpha and beta are between 0 and 32 */ u32 beta; /* and are used for shift relative to 1 */ - bool ecn; /* true if ecn is enabled */ - bool bytemode; /* to scale drop early prob based on pkt size */ + u8 ecn; /* true if ecn is enabled */ + u8 bytemode; /* to scale drop early prob based on pkt size */ u8 dq_rate_estimator; /* to calculate delay using Little's law */ }; -- cgit v1.2.3 From 2dfb1952a9a1fde0b515f58605c11902e69415bf Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:28 +0530 Subject: pie: rearrange structure members and their initializations Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index f9c6a44bdb0c..ec0fbe98ec2f 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -28,13 +28,13 @@ struct pie_params { /* variables used */ struct pie_vars { - u64 prob; /* probability but scaled by u64 limit. */ - psched_time_t burst_time; psched_time_t qdelay; psched_time_t qdelay_old; - u64 dq_count; /* measured in bytes */ + psched_time_t burst_time; psched_time_t dq_tstamp; /* drain rate */ + u64 prob; /* probability but scaled by u64 limit. */ u64 accu_prob; /* accumulated drop probability */ + u64 dq_count; /* measured in bytes */ u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */ u32 qlen_old; /* in bytes */ u8 accu_prob_overflows; /* overflows of accu_prob */ @@ -45,8 +45,8 @@ struct pie_stats { u32 packets_in; /* total number of packets enqueued */ u32 dropped; /* packets dropped due to pie_action */ u32 overlimit; /* dropped due to lack of space in queue */ - u32 maxq; /* maximum queue size */ u32 ecn_mark; /* packets marked with ECN */ + u32 maxq; /* maximum queue size */ }; /* private skb vars */ @@ -56,11 +56,11 @@ struct pie_skb_cb { static inline void pie_params_init(struct pie_params *params) { - params->alpha = 2; - params->beta = 20; + params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */ params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC); /* 15 ms */ params->limit = 1000; /* default of 1000 packets */ - params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */ + params->alpha = 2; + params->beta = 20; params->ecn = false; params->bytemode = false; params->dq_rate_estimator = false; @@ -68,12 +68,12 @@ static inline void pie_params_init(struct pie_params *params) static inline void pie_vars_init(struct pie_vars *vars) { - vars->dq_count = DQCOUNT_INVALID; + /* default of 150 ms in pschedtime */ + vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); vars->dq_tstamp = DTIME_INVALID; vars->accu_prob = 0; + vars->dq_count = DQCOUNT_INVALID; vars->avg_dq_rate = 0; - /* default of 150 ms in pschedtime */ - vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); vars->accu_prob_overflows = 0; } -- cgit v1.2.3 From b42a3d7c7cfff3555d7057c20dbbe57fe75d77c0 Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:29 +0530 Subject: pie: improve comments and commenting style Improve the comments along with the commenting style used to describe the members of the structures and their initial values in the init functions. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 85 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 58 insertions(+), 27 deletions(-) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index ec0fbe98ec2f..51a1984c2dce 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -14,42 +14,74 @@ #define DQCOUNT_INVALID -1 #define PIE_SCALE 8 -/* parameters used */ +/** + * struct pie_params - contains pie parameters + * @target: target delay in pschedtime + * @tudpate: interval at which drop probability is calculated + * @limit: total number of packets that can be in the queue + * @alpha: parameter to control drop probability + * @beta: parameter to control drop probability + * @ecn: is ECN marking of packets enabled + * @bytemode: is drop probability scaled based on pkt size + * @dq_rate_estimator: is Little's law used for qdelay calculation + */ struct pie_params { - psched_time_t target; /* user specified target delay in pschedtime */ - u32 tupdate; /* timer frequency (in jiffies) */ - u32 limit; /* number of packets that can be enqueued */ - u32 alpha; /* alpha and beta are between 0 and 32 */ - u32 beta; /* and are used for shift relative to 1 */ - u8 ecn; /* true if ecn is enabled */ - u8 bytemode; /* to scale drop early prob based on pkt size */ - u8 dq_rate_estimator; /* to calculate delay using Little's law */ + psched_time_t target; + u32 tupdate; + u32 limit; + u32 alpha; + u32 beta; + u8 ecn; + u8 bytemode; + u8 dq_rate_estimator; }; -/* variables used */ +/** + * struct pie_vars - contains pie variables + * @qdelay: current queue delay + * @qdelay_old: queue delay in previous qdelay calculation + * @burst_time: burst time allowance + * @dq_tstamp: timestamp at which dq rate was last calculated + * @prob: drop probability + * @accu_prob: accumulated drop probability + * @dq_count: number of bytes dequeued in a measurement cycle + * @avg_dq_rate: calculated average dq rate + * @qlen_old: queue length during previous qdelay calculation + * @accu_prob_overflows: number of times accu_prob overflows + */ struct pie_vars { psched_time_t qdelay; psched_time_t qdelay_old; psched_time_t burst_time; - psched_time_t dq_tstamp; /* drain rate */ - u64 prob; /* probability but scaled by u64 limit. */ - u64 accu_prob; /* accumulated drop probability */ - u64 dq_count; /* measured in bytes */ - u32 avg_dq_rate; /* bytes per pschedtime tick,scaled */ - u32 qlen_old; /* in bytes */ - u8 accu_prob_overflows; /* overflows of accu_prob */ + psched_time_t dq_tstamp; + u64 prob; + u64 accu_prob; + u64 dq_count; + u32 avg_dq_rate; + u32 qlen_old; + u8 accu_prob_overflows; }; -/* statistics gathering */ +/** + * struct pie_stats - contains pie stats + * @packets_in: total number of packets enqueued + * @dropped: packets dropped due to pie action + * @overlimit: packets dropped due to lack of space in queue + * @ecn_mark: packets marked with ECN + * @maxq: maximum queue size + */ struct pie_stats { - u32 packets_in; /* total number of packets enqueued */ - u32 dropped; /* packets dropped due to pie_action */ - u32 overlimit; /* dropped due to lack of space in queue */ - u32 ecn_mark; /* packets marked with ECN */ - u32 maxq; /* maximum queue size */ + u32 packets_in; + u32 dropped; + u32 overlimit; + u32 ecn_mark; + u32 maxq; }; -/* private skb vars */ +/** + * struct pie_skb_cb - contains private skb vars + * @enqueue_time: timestamp when the packet is enqueued + */ struct pie_skb_cb { psched_time_t enqueue_time; }; @@ -58,7 +90,7 @@ static inline void pie_params_init(struct pie_params *params) { params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC); /* 15 ms */ params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC); /* 15 ms */ - params->limit = 1000; /* default of 1000 packets */ + params->limit = 1000; params->alpha = 2; params->beta = 20; params->ecn = false; @@ -68,8 +100,7 @@ static inline void pie_params_init(struct pie_params *params) static inline void pie_vars_init(struct pie_vars *vars) { - /* default of 150 ms in pschedtime */ - vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); + vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); /* 150 ms */ vars->dq_tstamp = DTIME_INVALID; vars->accu_prob = 0; vars->dq_count = DQCOUNT_INVALID; -- cgit v1.2.3 From 5205ea00cda1ac23cebfb97dfccca84722d58dfe Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:32 +0530 Subject: net: sched: pie: export symbols to be reused by FQ-PIE This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index 51a1984c2dce..90f5db3d29e7 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -124,4 +124,13 @@ static inline void pie_set_enqueue_time(struct sk_buff *skb) get_pie_cb(skb)->enqueue_time = psched_get_time(); } +bool pie_drop_early(struct Qdisc *sch, struct pie_params *params, + struct pie_vars *vars, u32 qlen, u32 packet_size); + +void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params, + struct pie_vars *vars, u32 qlen); + +void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars, + u32 qlen); + #endif -- cgit v1.2.3 From ec97ecf1ebe485a17cd8395a5f35e6b80b57665a Mon Sep 17 00:00:00 2001 From: "Mohit P. Tahiliani" Date: Wed, 22 Jan 2020 23:52:33 +0530 Subject: net: sched: add Flow Queue PIE packet scheduler Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani Signed-off-by: Sachin D. Patil Signed-off-by: V. Saicharan Signed-off-by: Mohit Bhasi Signed-off-by: Leslie Monis Signed-off-by: Gautam Ramakrishnan Signed-off-by: David S. Miller --- include/net/pie.h | 2 ++ include/uapi/linux/pkt_sched.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) (limited to 'include') diff --git a/include/net/pie.h b/include/net/pie.h index 90f5db3d29e7..fd5a37cb7993 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -81,9 +81,11 @@ struct pie_stats { /** * struct pie_skb_cb - contains private skb vars * @enqueue_time: timestamp when the packet is enqueued + * @mem_usage: size of the skb during enqueue */ struct pie_skb_cb { psched_time_t enqueue_time; + u32 mem_usage; }; static inline void pie_params_init(struct pie_params *params) diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index bf5a5b1dfb0b..bbe791b24168 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -971,6 +971,37 @@ struct tc_pie_xstats { __u32 ecn_mark; /* packets marked with ecn*/ }; +/* FQ PIE */ +enum { + TCA_FQ_PIE_UNSPEC, + TCA_FQ_PIE_LIMIT, + TCA_FQ_PIE_FLOWS, + TCA_FQ_PIE_TARGET, + TCA_FQ_PIE_TUPDATE, + TCA_FQ_PIE_ALPHA, + TCA_FQ_PIE_BETA, + TCA_FQ_PIE_QUANTUM, + TCA_FQ_PIE_MEMORY_LIMIT, + TCA_FQ_PIE_ECN_PROB, + TCA_FQ_PIE_ECN, + TCA_FQ_PIE_BYTEMODE, + TCA_FQ_PIE_DQ_RATE_ESTIMATOR, + __TCA_FQ_PIE_MAX +}; +#define TCA_FQ_PIE_MAX (__TCA_FQ_PIE_MAX - 1) + +struct tc_fq_pie_xstats { + __u32 packets_in; /* total number of packets enqueued */ + __u32 dropped; /* packets dropped due to fq_pie_action */ + __u32 overlimit; /* dropped due to lack of space in queue */ + __u32 overmemory; /* dropped due to lack of memory in queue */ + __u32 ecn_mark; /* packets marked with ecn */ + __u32 new_flow_count; /* count of new flows created by packets */ + __u32 new_flows_len; /* count of flows in new list */ + __u32 old_flows_len; /* count of flows in old list */ + __u32 memory_usage; /* total memory across all queues */ +}; + /* CBS */ struct tc_cbs_qopt { __u8 offload; -- cgit v1.2.3 From a5d29ae226812ae620f58a0726525d8af9b1d751 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Fri, 24 Jan 2020 13:40:21 +0200 Subject: net: bridge: vlan: add basic option setting support This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/if_bridge.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index ac38f0b674b8..c8b87ad52c5b 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -130,6 +130,7 @@ enum { #define BRIDGE_VLAN_INFO_RANGE_BEGIN (1<<3) /* VLAN is start of vlan range */ #define BRIDGE_VLAN_INFO_RANGE_END (1<<4) /* VLAN is end of vlan range */ #define BRIDGE_VLAN_INFO_BRENTRY (1<<5) /* Global bridge VLAN entry */ +#define BRIDGE_VLAN_INFO_ONLY_OPTS (1<<6) /* Skip create/delete/flags */ struct bridge_vlan_info { __u16 flags; -- cgit v1.2.3 From a580c76d534c7360ba68042b19cb255e8420e987 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Fri, 24 Jan 2020 13:40:22 +0200 Subject: net: bridge: vlan: add per-vlan state The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/if_bridge.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index c8b87ad52c5b..42f7ca38ad80 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -191,6 +191,7 @@ enum { BRIDGE_VLANDB_ENTRY_UNSPEC, BRIDGE_VLANDB_ENTRY_INFO, BRIDGE_VLANDB_ENTRY_RANGE, + BRIDGE_VLANDB_ENTRY_STATE, __BRIDGE_VLANDB_ENTRY_MAX, }; #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1) -- cgit v1.2.3 From f870fa0b5768842cb4690c1c11f19f28b731ae6d Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Tue, 21 Jan 2020 16:56:15 -0800 Subject: mptcp: Add MPTCP socket stubs Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); The resulting socket is simply a wrapper around a single regular TCP socket, without any of the MPTCP protocol implemented over the wire. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/net/mptcp.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include') diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 0573ae75c3db..98ba22379117 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -28,6 +28,8 @@ struct mptcp_ext { #ifdef CONFIG_MPTCP +void mptcp_init(void); + /* move the skb extension owership, with the assumption that 'to' is * newly allocated */ @@ -70,6 +72,10 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, #else +static inline void mptcp_init(void) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { @@ -82,4 +88,14 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, } #endif /* CONFIG_MPTCP */ + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +int mptcpv6_init(void); +#elif IS_ENABLED(CONFIG_IPV6) +static inline int mptcpv6_init(void) +{ + return 0; +} +#endif + #endif /* __NET_MPTCP_H */ -- cgit v1.2.3 From eda7acddf8080bb2d022a8d4b8b2345eb80c63ec Mon Sep 17 00:00:00 2001 From: Peter Krystad Date: Tue, 21 Jan 2020 16:56:16 -0800 Subject: mptcp: Handle MPTCP TCP options Add hooks to parse and format the MP_CAPABLE option. This option is handled according to MPTCP version 0 (RFC6824). MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in coordination with related code changes. Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/linux/tcp.h | 18 ++++++++++++++++++ include/net/mptcp.h | 18 ++++++++++++++++++ 2 files changed, 36 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index ca6f01531e64..52798ab00394 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -78,6 +78,16 @@ struct tcp_sack_block { #define TCP_SACK_SEEN (1 << 0) /*1 = peer is SACK capable, */ #define TCP_DSACK_SEEN (1 << 2) /*1 = DSACK was received from peer*/ +#if IS_ENABLED(CONFIG_MPTCP) +struct mptcp_options_received { + u64 sndr_key; + u64 rcvr_key; + u8 mp_capable : 1, + mp_join : 1, + dss : 1; +}; +#endif + struct tcp_options_received { /* PAWS/RTTM data */ int ts_recent_stamp;/* Time we stored ts_recent (for aging) */ @@ -95,6 +105,9 @@ struct tcp_options_received { u8 num_sacks; /* Number of SACK blocks */ u16 user_mss; /* mss requested by user in ioctl */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ +#if IS_ENABLED(CONFIG_MPTCP) + struct mptcp_options_received mptcp; +#endif }; static inline void tcp_clear_options(struct tcp_options_received *rx_opt) @@ -104,6 +117,11 @@ static inline void tcp_clear_options(struct tcp_options_received *rx_opt) #if IS_ENABLED(CONFIG_SMC) rx_opt->smc_ok = 0; #endif +#if IS_ENABLED(CONFIG_MPTCP) + rx_opt->mptcp.mp_capable = 0; + rx_opt->mptcp.mp_join = 0; + rx_opt->mptcp.dss = 0; +#endif } /* This is the max number of SACKS that we'll generate and process. It's safe diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 98ba22379117..3daec2ceb3ff 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -9,6 +9,7 @@ #define __NET_MPTCP_H #include +#include #include /* MPTCP sk_buff extension data */ @@ -26,10 +27,22 @@ struct mptcp_ext { /* one byte hole */ }; +struct mptcp_out_options { +#if IS_ENABLED(CONFIG_MPTCP) + u16 suboptions; + u64 sndr_key; + u64 rcvr_key; +#endif +}; + #ifdef CONFIG_MPTCP void mptcp_init(void); +void mptcp_parse_option(const unsigned char *ptr, int opsize, + struct tcp_options_received *opt_rx); +void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); + /* move the skb extension owership, with the assumption that 'to' is * newly allocated */ @@ -76,6 +89,11 @@ static inline void mptcp_init(void) { } +static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, + struct tcp_options_received *opt_rx) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { -- cgit v1.2.3 From 2303f994b3e187091fd08148066688b08f837efc Mon Sep 17 00:00:00 2001 From: Peter Krystad Date: Tue, 21 Jan 2020 16:56:17 -0800 Subject: mptcp: Associate MPTCP context with TCP socket Use ULP to associate a subflow_context structure with each TCP subflow socket. Creating these sockets requires new bind and connect functions to make sure ULP is set up immediately when the subflow sockets are created. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/linux/tcp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 52798ab00394..877947475814 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -397,6 +397,9 @@ struct tcp_sock { u32 mtu_info; /* We received an ICMP_FRAG_NEEDED / ICMPV6_PKT_TOOBIG * while socket was owned by user. */ +#if IS_ENABLED(CONFIG_MPTCP) + bool is_mptcp; +#endif #ifdef CONFIG_TCP_MD5SIG /* TCP AF-Specific parts; only used by MD5 Signature support so far */ -- cgit v1.2.3 From cec37a6e41aae7bf3df9a3da783380a4d9325fd8 Mon Sep 17 00:00:00 2001 From: Peter Krystad Date: Tue, 21 Jan 2020 16:56:18 -0800 Subject: mptcp: Handle MP_CAPABLE options for outgoing connections Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request, to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connection layer. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/linux/tcp.h | 3 +++ include/net/mptcp.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 877947475814..e9ee06d887fa 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -137,6 +137,9 @@ struct tcp_request_sock { const struct tcp_request_sock_ops *af_specific; u64 snt_synack; /* first SYNACK sent time */ bool tfo_listener; +#if IS_ENABLED(CONFIG_MPTCP) + bool is_mptcp; +#endif u32 txhash; u32 rcv_isn; u32 snt_isn; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 3daec2ceb3ff..eabc57c3fde4 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -39,8 +39,27 @@ struct mptcp_out_options { void mptcp_init(void); +static inline bool sk_is_mptcp(const struct sock *sk) +{ + return tcp_sk(sk)->is_mptcp; +} + +static inline bool rsk_is_mptcp(const struct request_sock *req) +{ + return tcp_rsk(req)->is_mptcp; +} + void mptcp_parse_option(const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx); +bool mptcp_syn_options(struct sock *sk, unsigned int *size, + struct mptcp_out_options *opts); +void mptcp_rcv_synsent(struct sock *sk); +bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, + struct mptcp_out_options *opts); +bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, + unsigned int *size, unsigned int remaining, + struct mptcp_out_options *opts); + void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); /* move the skb extension owership, with the assumption that 'to' is @@ -89,11 +108,47 @@ static inline void mptcp_init(void) { } +static inline bool sk_is_mptcp(const struct sock *sk) +{ + return false; +} + +static inline bool rsk_is_mptcp(const struct request_sock *req) +{ + return false; +} + static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx) { } +static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size, + struct mptcp_out_options *opts) +{ + return false; +} + +static inline void mptcp_rcv_synsent(struct sock *sk) +{ +} + +static inline bool mptcp_synack_options(const struct request_sock *req, + unsigned int *size, + struct mptcp_out_options *opts) +{ + return false; +} + +static inline bool mptcp_established_options(struct sock *sk, + struct sk_buff *skb, + unsigned int *size, + unsigned int remaining, + struct mptcp_out_options *opts) +{ + return false; +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { @@ -107,6 +162,8 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, #endif /* CONFIG_MPTCP */ +void mptcp_handle_ipv6_mapped(struct sock *sk, bool mapped); + #if IS_ENABLED(CONFIG_MPTCP_IPV6) int mptcpv6_init(void); #elif IS_ENABLED(CONFIG_IPV6) -- cgit v1.2.3 From 6d0060f600adfddaa43fefb96b6b12643331961e Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Tue, 21 Jan 2020 16:56:23 -0800 Subject: mptcp: Write MPTCP DSS headers to outgoing data packets Per-packet metadata required to write the MPTCP DSS option is written to the skb_ext area. One write to the socket may contain more than one packet of data, which is copied to page fragments and mapped in to MPTCP DSS segments with size determined by the available page fragments and the maximum mapping length allowed by the MPTCP specification. If do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok - the later skbs can either have duplicated DSS mapping information or none at all, and the receiver can handle that. The current implementation uses the subflow frag cache and tcp sendpages to avoid excessive code duplication. More work is required to ensure that it works correctly under memory pressure and to support MPTCP-level retransmissions. The MPTCP DSS checksum is not yet implemented. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/net/mptcp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/mptcp.h b/include/net/mptcp.h index eabc57c3fde4..06dcc665135e 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -32,6 +32,7 @@ struct mptcp_out_options { u16 suboptions; u64 sndr_key; u64 rcvr_key; + struct mptcp_ext ext_copy; #endif }; -- cgit v1.2.3 From 648ef4b88673dadb8463bf0d4b10fbf33d55def8 Mon Sep 17 00:00:00 2001 From: Mat Martineau Date: Tue, 21 Jan 2020 16:56:24 -0800 Subject: mptcp: Implement MPTCP receive path Parses incoming DSS options and populates outgoing MPTCP ACK fields. MPTCP fields are parsed from the TCP option header and placed in an skb extension, allowing the upper MPTCP layer to access MPTCP options after the skb has gone through the TCP stack. The subflow implements its own data_ready() ops, which ensures that the pending data is in sequence - according to MPTCP seq number - dropping out-of-seq skbs. The DATA_READY bit flag is set if this is the case. This allows the MPTCP socket layer to determine if more data is available without having to consult the individual subflows. It additionally validates the current mapping and propagates EoF events to the connection socket. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/linux/tcp.h | 10 ++++++++++ include/net/mptcp.h | 8 ++++++++ 2 files changed, 18 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index e9ee06d887fa..0d00dad4b85d 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -82,9 +82,19 @@ struct tcp_sack_block { struct mptcp_options_received { u64 sndr_key; u64 rcvr_key; + u64 data_ack; + u64 data_seq; + u32 subflow_seq; + u16 data_len; u8 mp_capable : 1, mp_join : 1, dss : 1; + u8 use_map:1, + dsn64:1, + data_fin:1, + use_ack:1, + ack64:1, + __unused:3; }; #endif diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 06dcc665135e..8619c1fca741 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -60,6 +60,8 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, struct mptcp_out_options *opts); +void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, + struct tcp_options_received *opt_rx); void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); @@ -150,6 +152,12 @@ static inline bool mptcp_established_options(struct sock *sk, return false; } +static inline void mptcp_incoming_options(struct sock *sk, + struct sk_buff *skb, + struct tcp_options_received *opt_rx) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { -- cgit v1.2.3 From cc7972ea1932335e0a0ee00ac8a24b3e8304630d Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Tue, 21 Jan 2020 16:56:31 -0800 Subject: mptcp: parse and emit MP_CAPABLE option according to v1 spec This implements MP_CAPABLE options parsing and writing according to RFC 6824 bis / RFC 8684: MPTCP v1. Local key is sent on syn/ack, and both keys are sent on 3rd ack. MP_CAPABLE messages len are updated accordingly. We need the skbuff to correctly emit the above, so we push the skbuff struct as an argument all the way from tcp code to the relevant mptcp callbacks. When processing incoming MP_CAPABLE + data, build a full blown DSS-like map info, to simplify later processing. On child socket creation, we need to record the remote key, if available. Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller --- include/linux/tcp.h | 3 ++- include/net/mptcp.h | 17 ++++++++++------- 2 files changed, 12 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 0d00dad4b85d..4e2124607d32 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -94,7 +94,8 @@ struct mptcp_options_received { data_fin:1, use_ack:1, ack64:1, - __unused:3; + mpc_map:1, + __unused:2; }; #endif diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 8619c1fca741..27627e2d1bc2 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -23,7 +23,8 @@ struct mptcp_ext { data_fin:1, use_ack:1, ack64:1, - __unused:3; + mpc_map:1, + __unused:2; /* one byte hole */ }; @@ -50,10 +51,10 @@ static inline bool rsk_is_mptcp(const struct request_sock *req) return tcp_rsk(req)->is_mptcp; } -void mptcp_parse_option(const unsigned char *ptr, int opsize, - struct tcp_options_received *opt_rx); -bool mptcp_syn_options(struct sock *sk, unsigned int *size, - struct mptcp_out_options *opts); +void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr, + int opsize, struct tcp_options_received *opt_rx); +bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, + unsigned int *size, struct mptcp_out_options *opts); void mptcp_rcv_synsent(struct sock *sk); bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, struct mptcp_out_options *opts); @@ -121,12 +122,14 @@ static inline bool rsk_is_mptcp(const struct request_sock *req) return false; } -static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, +static inline void mptcp_parse_option(const struct sk_buff *skb, + const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx) { } -static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size, +static inline bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, + unsigned int *size, struct mptcp_out_options *opts) { return false; -- cgit v1.2.3 From e42f1ac626e7f799717d006e0f8393b6d6f9fc8c Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 24 Jan 2020 16:04:02 -0800 Subject: mptcp: do not inherit inet proto ops We need to initialise the struct ourselves, else we expose tcp-specific callbacks such as tcp_splice_read which will then trigger splat because the socket is an mptcp one: BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478 CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3 Call Trace: tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791 do_splice+0x1259/0x1560 fs/splice.c:1205 To prevent build error with ipv6, add the recv/sendmsg function declaration to ipv6.h. The functions are already accessible "thanks" to retpoline related work, but they are currently only made visible by socket.c specific INDIRECT_CALLABLE macros. Reported-by: Christoph Paasch Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- include/net/ipv6.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 4e95f6df508c..cec1a54401f2 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1113,6 +1113,9 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); int inet6_hash_connect(struct inet_timewait_death_row *death_row, struct sock *sk); +int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size); +int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, + int flags); /* * reassembly.c -- cgit v1.2.3 From ef6aadcc76c97e25f62adc4e9d19684d3e5d0b87 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Fri, 24 Jan 2020 15:23:06 +0200 Subject: net: sched: Make TBF Qdisc offloadable Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying and dumping of TBF Qdisc. Signed-off-by: Petr Machata Acked-by: Jiri Pirko Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/linux/netdevice.h | 1 + include/net/pkt_cls.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5ec3537fbdb1..11bdf6cb30bd 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -850,6 +850,7 @@ enum tc_setup_type { TC_SETUP_QDISC_TAPRIO, TC_SETUP_FT, TC_SETUP_QDISC_ETS, + TC_SETUP_QDISC_TBF, }; /* These structures hold the attributes of bpf state that are being passed diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 47b115e2012a..ce036492986a 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -854,4 +854,26 @@ struct tc_ets_qopt_offload { }; }; +enum tc_tbf_command { + TC_TBF_REPLACE, + TC_TBF_DESTROY, + TC_TBF_STATS, +}; + +struct tc_tbf_qopt_offload_replace_params { + struct psched_ratecfg rate; + u32 max_size; + struct gnet_stats_queue *qstats; +}; + +struct tc_tbf_qopt_offload { + enum tc_tbf_command command; + u32 handle; + u32 parent; + union { + struct tc_tbf_qopt_offload_replace_params replace_params; + struct tc_qopt_offload_stats stats; + }; +}; + #endif -- cgit v1.2.3 From e9b4e606c2289d6610113253922bb8c9ac7f68b0 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Thu, 23 Jan 2020 17:15:07 +0100 Subject: bpf: Allow to resolve bpf trampoline and dispatcher in unwind When unwinding the stack we need to identify each address to successfully continue. Adding latch tree to keep trampolines for quick lookup during the unwind. The patch uses first 48 bytes for latch tree node, leaving 4048 bytes from the rest of the page for trampoline or dispatcher generated code. It's still enough not to affect trampoline and dispatcher progs maximum counts. Signed-off-by: Jiri Olsa Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20200123161508.915203-3-jolsa@kernel.org --- include/linux/bpf.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a9687861fd7e..8e9ad3943cd9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -525,7 +525,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key); int bpf_trampoline_link_prog(struct bpf_prog *prog); int bpf_trampoline_unlink_prog(struct bpf_prog *prog); void bpf_trampoline_put(struct bpf_trampoline *tr); -void *bpf_jit_alloc_exec_page(void); #define BPF_DISPATCHER_INIT(name) { \ .mutex = __MUTEX_INITIALIZER(name.mutex), \ .func = &name##func, \ @@ -557,6 +556,13 @@ void *bpf_jit_alloc_exec_page(void); #define BPF_DISPATCHER_PTR(name) (&name) void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to); +struct bpf_image { + struct latch_tree_node tnode; + unsigned char data[]; +}; +#define BPF_IMAGE_SIZE (PAGE_SIZE - sizeof(struct bpf_image)) +bool is_bpf_image_address(unsigned long address); +void *bpf_image_alloc(void); #else static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key) { @@ -578,6 +584,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {} static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to) {} +static inline bool is_bpf_image_address(unsigned long address) +{ + return false; +} #endif struct bpf_func_info_aux { -- cgit v1.2.3 From 32efcc06d2a15fa87585614d12d6c2308cc2d3f3 Mon Sep 17 00:00:00 2001 From: Abdul Kabbani Date: Fri, 24 Jan 2020 16:34:02 -0500 Subject: tcp: export count for rehash attempts Using IPv6 flow-label to swiftly route around avoid congested or disconnected network path can greatly improve TCP reliability. This patch adds SNMP counters and a OPT_STATS counter to track both host-level and connection-level statistics. Network administrators can use these counters to evaluate the impact of this new ability better. Export count for rehash attempts to 1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts), and TcpDuplicateDataRehash (rehash due to receiving duplicate packets) 2) Timestamping API SOF_TIMESTAMPING_OPT_STATS. Signed-off-by: Abdul Kabbani Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Kevin(Yudong) Yang Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/tcp.h | 2 ++ include/uapi/linux/snmp.h | 2 ++ include/uapi/linux/tcp.h | 1 + 3 files changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 4e2124607d32..1cf73e6f85ca 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -386,6 +386,8 @@ struct tcp_sock { #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0 #endif + u16 timeout_rehash; /* Timeout-triggered rehash attempts */ + u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */ /* Receiver side RTT estimation */ diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 7eee233e78d2..7d91f4debc48 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -285,6 +285,8 @@ enum LINUX_MIB_TCPRCVQDROP, /* TCPRcvQDrop */ LINUX_MIB_TCPWQUEUETOOBIG, /* TCPWqueueTooBig */ LINUX_MIB_TCPFASTOPENPASSIVEALTKEY, /* TCPFastOpenPassiveAltKey */ + LINUX_MIB_TCPTIMEOUTREHASH, /* TCPTimeoutRehash */ + LINUX_MIB_TCPDUPLICATEDATAREHASH, /* TCPDuplicateDataRehash */ __LINUX_MIB_MAX }; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index d87184e673ca..fd9eb8f6bcae 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -311,6 +311,7 @@ enum { TCP_NLA_DSACK_DUPS, /* DSACK blocks received */ TCP_NLA_REORD_SEEN, /* reordering events seen */ TCP_NLA_SRTT, /* smoothed RTT in usecs */ + TCP_NLA_TIMEOUT_REHASH, /* Timeout-triggered rehash attempts */ }; /* for TCP_MD5SIG socket option */ -- cgit v1.2.3 From 7b225d0b5c6dda5fefab578175f210c6fc7e389a Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 22 Jan 2020 00:17:52 +0100 Subject: netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the interval between kernel and userspace. This patch also adds the NFT_SET_EXT_KEY_END extension to store the closing element value in this interval. v4: No changes v3: New patch [sbrivio: refactor error paths and labels; add corresponding nft_set_ext_type for new key; rebase] Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 14 +++++++++++++- include/uapi/linux/netfilter/nf_tables.h | 2 ++ 2 files changed, 15 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index fe7c50acc681..504c0aa93805 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -231,6 +231,7 @@ struct nft_userdata { * struct nft_set_elem - generic representation of set elements * * @key: element key + * @key_end: closing element key * @priv: element private data and extensions */ struct nft_set_elem { @@ -238,6 +239,10 @@ struct nft_set_elem { u32 buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)]; struct nft_data val; } key; + union { + u32 buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)]; + struct nft_data val; + } key_end; void *priv; }; @@ -502,6 +507,7 @@ void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set); * enum nft_set_extensions - set extension type IDs * * @NFT_SET_EXT_KEY: element key + * @NFT_SET_EXT_KEY_END: upper bound element key, for ranges * @NFT_SET_EXT_DATA: mapping data * @NFT_SET_EXT_FLAGS: element flags * @NFT_SET_EXT_TIMEOUT: element timeout @@ -513,6 +519,7 @@ void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set); */ enum nft_set_extensions { NFT_SET_EXT_KEY, + NFT_SET_EXT_KEY_END, NFT_SET_EXT_DATA, NFT_SET_EXT_FLAGS, NFT_SET_EXT_TIMEOUT, @@ -606,6 +613,11 @@ static inline struct nft_data *nft_set_ext_key(const struct nft_set_ext *ext) return nft_set_ext(ext, NFT_SET_EXT_KEY); } +static inline struct nft_data *nft_set_ext_key_end(const struct nft_set_ext *ext) +{ + return nft_set_ext(ext, NFT_SET_EXT_KEY_END); +} + static inline struct nft_data *nft_set_ext_data(const struct nft_set_ext *ext) { return nft_set_ext(ext, NFT_SET_EXT_DATA); @@ -655,7 +667,7 @@ static inline struct nft_object **nft_set_ext_obj(const struct nft_set_ext *ext) void *nft_set_elem_init(const struct nft_set *set, const struct nft_set_ext_tmpl *tmpl, - const u32 *key, const u32 *data, + const u32 *key, const u32 *key_end, const u32 *data, u64 timeout, u64 expiration, gfp_t gfp); void nft_set_elem_destroy(const struct nft_set *set, void *elem, bool destroy_expr); diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 261864736b26..c13106496bd2 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -370,6 +370,7 @@ enum nft_set_elem_flags { * @NFTA_SET_ELEM_USERDATA: user data (NLA_BINARY) * @NFTA_SET_ELEM_EXPR: expression (NLA_NESTED: nft_expr_attributes) * @NFTA_SET_ELEM_OBJREF: stateful object reference (NLA_STRING) + * @NFTA_SET_ELEM_KEY_END: closing key value (NLA_NESTED: nft_data) */ enum nft_set_elem_attributes { NFTA_SET_ELEM_UNSPEC, @@ -382,6 +383,7 @@ enum nft_set_elem_attributes { NFTA_SET_ELEM_EXPR, NFTA_SET_ELEM_PAD, NFTA_SET_ELEM_OBJREF, + NFTA_SET_ELEM_KEY_END, __NFTA_SET_ELEM_MAX }; #define NFTA_SET_ELEM_MAX (__NFTA_SET_ELEM_MAX - 1) -- cgit v1.2.3 From f3a2181e16f1dcbf5446ed43f6b5d9f56c459f85 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 22 Jan 2020 00:17:53 +0100 Subject: netfilter: nf_tables: Support for sets with multiple ranged fields Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used to specify the length of each field in a set concatenation. This allows set implementations to support concatenation of multiple ranged items, as they can divide the input key into matching data for every single field. Such set implementations would be selected as they specify support for NFT_SET_INTERVAL and allow desc->field_count to be greater than one. Explicitly disallow this for nft_set_rbtree. In order to specify the interval for a set entry, userspace would include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass range endpoints as two separate keys, represented by attributes NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END. While at it, export the number of 32-bit registers available for packet matching, as nftables will need this to know the maximum number of field lengths that can be specified. For example, "packets with an IPv4 address between 192.0.2.0 and 192.0.2.42, with destination port between 22 and 25", can be expressed as two concatenated elements: NFTA_SET_ELEM_KEY: 192.0.2.0 . 22 NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25 and NFTA_SET_DESC_CONCAT attribute would contain: NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 4 NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 2 v4: No changes v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY v2: No changes Signed-off-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 8 ++++++++ include/uapi/linux/netfilter/nf_tables.h | 15 +++++++++++++++ 2 files changed, 23 insertions(+) (limited to 'include') diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 504c0aa93805..4170c033d461 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -264,11 +264,15 @@ struct nft_set_iter { * @klen: key length * @dlen: data length * @size: number of set elements + * @field_len: length of each field in concatenation, bytes + * @field_count: number of concatenated fields in element */ struct nft_set_desc { unsigned int klen; unsigned int dlen; unsigned int size; + u8 field_len[NFT_REG32_COUNT]; + u8 field_count; }; /** @@ -409,6 +413,8 @@ void nft_unregister_set(struct nft_set_type *type); * @dtype: data type (verdict or numeric type defined by userspace) * @objtype: object type (see NFT_OBJECT_* definitions) * @size: maximum set size + * @field_len: length of each field in concatenation, bytes + * @field_count: number of concatenated fields in element * @use: number of rules references to this set * @nelems: number of elements * @ndeact: number of deactivated elements queued for removal @@ -435,6 +441,8 @@ struct nft_set { u32 dtype; u32 objtype; u32 size; + u8 field_len[NFT_REG32_COUNT]; + u8 field_count; u32 use; atomic_t nelems; u32 ndeact; diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index c13106496bd2..065218a20bb7 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -48,6 +48,7 @@ enum nft_registers { #define NFT_REG_SIZE 16 #define NFT_REG32_SIZE 4 +#define NFT_REG32_COUNT (NFT_REG32_15 - NFT_REG32_00 + 1) /** * enum nft_verdicts - nf_tables internal verdicts @@ -301,14 +302,28 @@ enum nft_set_policies { * enum nft_set_desc_attributes - set element description * * @NFTA_SET_DESC_SIZE: number of elements in set (NLA_U32) + * @NFTA_SET_DESC_CONCAT: description of field concatenation (NLA_NESTED) */ enum nft_set_desc_attributes { NFTA_SET_DESC_UNSPEC, NFTA_SET_DESC_SIZE, + NFTA_SET_DESC_CONCAT, __NFTA_SET_DESC_MAX }; #define NFTA_SET_DESC_MAX (__NFTA_SET_DESC_MAX - 1) +/** + * enum nft_set_field_attributes - attributes of concatenated fields + * + * @NFTA_SET_FIELD_LEN: length of single field, in bits (NLA_U32) + */ +enum nft_set_field_attributes { + NFTA_SET_FIELD_UNSPEC, + NFTA_SET_FIELD_LEN, + __NFTA_SET_FIELD_MAX +}; +#define NFTA_SET_FIELD_MAX (__NFTA_SET_FIELD_MAX - 1) + /** * enum nft_set_attributes - nf_tables set netlink attributes * -- cgit v1.2.3 From 2092767168f0681aa03727448b801600a364c013 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 22 Jan 2020 00:17:54 +0100 Subject: bitmap: Introduce bitmap_cut(): cut bits and shift remaining The new bitmap function bitmap_cut() copies bits from source to destination by removing the region specified by parameters first and cut, and remapping the bits above the cut region by right shifting them. Signed-off-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- include/linux/bitmap.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index ff335b22f23c..f0f3a9fffa6a 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -53,6 +53,7 @@ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above * bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n * bitmap_shift_left(dst, src, n, nbits) *dst = *src << n + * bitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy rest * bitmap_replace(dst, old, new, mask, nbits) *dst = (*old & ~(*mask)) | (*new & *mask) * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) * bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit) @@ -133,6 +134,9 @@ extern void __bitmap_shift_right(unsigned long *dst, const unsigned long *src, unsigned int shift, unsigned int nbits); extern void __bitmap_shift_left(unsigned long *dst, const unsigned long *src, unsigned int shift, unsigned int nbits); +extern void bitmap_cut(unsigned long *dst, const unsigned long *src, + unsigned int first, unsigned int cut, + unsigned int nbits); extern int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); extern void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1, -- cgit v1.2.3 From 3c4287f62044a90e73a561aa05fc46e62da173da Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 22 Jan 2020 00:17:55 +0100 Subject: nf_tables: Add set type for arbitrary concatenation of ranges This new set type allows for intervals in concatenated fields, which are expressed in the usual way, that is, simple byte concatenation with padding to 32 bits for single fields, and given as ranges by specifying start and end elements containing, each, the full concatenation of start and end values for the single fields. Ranges are expanded to composing netmasks, for each field: these are inserted as rules in per-field lookup tables. Bits to be classified are divided in 4-bit groups, and for each group, the lookup table contains 4^2 buckets, representing all the possible values of a bit group. This approach was inspired by the Grouper algorithm: http://www.cse.usf.edu/~ligatti/projects/grouper/ Matching is performed by a sequence of AND operations between bucket values, with buckets selected according to the value of packet bits, for each group. The result of this sequence tells us which rules matched for a given field. In order to concatenate several ranged fields, per-field rules are mapped using mapping arrays, one per field, that specify which rules should be considered while matching the next field. The mapping array for the last field contains a reference to the element originally inserted. The notes in nft_set_pipapo.c cover the algorithm in deeper detail. A pure hash-based approach is of no use here, as ranges need to be classified. An implementation based on "proxying" the existing red-black tree set type, creating a tree for each field, was considered, but deemed impractical due to the fact that elements would need to be shared between trees, at least as long as we want to keep UAPI changes to a minimum. A stand-alone implementation of this algorithm is available at: https://pipapo.lameexcu.se together with notes about possible future optimisations (in pipapo.c). This algorithm was designed with data locality in mind, and can be highly optimised for SIMD instruction sets, as the bulk of the matching work is done with repetitive, simple bitwise operations. At this point, without further optimisations, nft_concat_range.sh reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$): TEST: performance net,port [ OK ] baseline (drop from netdev hook): 10190076pps baseline hash (non-ranged entries): 6179564pps baseline rbtree (match on first field only): 2950341pps set with 1000 full, ranged entries: 2304165pps port,net [ OK ] baseline (drop from netdev hook): 10143615pps baseline hash (non-ranged entries): 6135776pps baseline rbtree (match on first field only): 4311934pps set with 100 full, ranged entries: 4131471pps net6,port [ OK ] baseline (drop from netdev hook): 9730404pps baseline hash (non-ranged entries): 4809557pps baseline rbtree (match on first field only): 1501699pps set with 1000 full, ranged entries: 1092557pps port,proto [ OK ] baseline (drop from netdev hook): 10812426pps baseline hash (non-ranged entries): 6929353pps baseline rbtree (match on first field only): 3027105pps set with 30000 full, ranged entries: 284147pps net6,port,mac [ OK ] baseline (drop from netdev hook): 9660114pps baseline hash (non-ranged entries): 3778877pps baseline rbtree (match on first field only): 3179379pps set with 10 full, ranged entries: 2082880pps net6,port,mac,proto [ OK ] baseline (drop from netdev hook): 9718324pps baseline hash (non-ranged entries): 3799021pps baseline rbtree (match on first field only): 1506689pps set with 1000 full, ranged entries: 783810pps net,mac [ OK ] baseline (drop from netdev hook): 10190029pps baseline hash (non-ranged entries): 5172218pps baseline rbtree (match on first field only): 2946863pps set with 1000 full, ranged entries: 1279122pps v4: - fix build for 32-bit architectures: 64-bit division needs div_u64() (kbuild test robot ) v3: - rework interface for field length specification, NFT_SET_SUBKEY disappears and information is stored in description - remove scratch area to store closing element of ranges, as elements now come with an actual attribute to specify the upper range limit (Pablo Neira Ayuso) - also remove pointer to 'start' element from mapping table, closing key is now accessible via extension data - use bytes right away instead of bits for field lengths, this way we can also double the inner loop of the lookup function to take care of upper and lower bits in a single iteration (minor performance improvement) - make it clearer that set operations are actually atomic API-wise, but we can't e.g. implement flush() as one-shot action - fix type for 'dup' in nft_pipapo_insert(), check for duplicates only in the next generation, and in general take care of differentiating generation mask cases depending on the operation (Pablo Neira Ayuso) - report C implementation matching rate in commit message, so that AVX2 implementation can be compared (Pablo Neira Ayuso) v2: - protect access to scratch maps in nft_pipapo_lookup() with local_bh_disable/enable() (Florian Westphal) - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's already implied (Florian Westphal) - explain why partial allocation failures don't need handling in pipapo_realloc_scratch(), rename 'm' to clone and update related kerneldoc to make it clear we're not operating on the live copy (Florian Westphal) - add expicit check for priv->start_elem in nft_pipapo_insert() to avoid ending up in nft_pipapo_walk() with a NULL start element, and also zero it out in every operation that might make it invalid, so that insertion doesn't proceed with an invalid element (Florian Westphal) Signed-off-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 2656155b4069..29e7e1021267 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -74,6 +74,7 @@ extern struct nft_set_type nft_set_hash_type; extern struct nft_set_type nft_set_hash_fast_type; extern struct nft_set_type nft_set_rbtree_type; extern struct nft_set_type nft_set_bitmap_type; +extern struct nft_set_type nft_set_pipapo_type; struct nft_expr; struct nft_regs; -- cgit v1.2.3 From 2e24cd755552350b94a7617617c6877b8cbcb701 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Thu, 23 Jan 2020 16:26:18 -0800 Subject: net_sched: fix ops->bind_class() implementations The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim Cc: Jiri Pirko Signed-off-by: Cong Wang Signed-off-by: David S. Miller --- include/net/pkt_cls.h | 33 +++++++++++++++++++-------------- include/net/sch_generic.h | 3 ++- 2 files changed, 21 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index ce036492986a..a972244ab193 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -141,31 +141,38 @@ __cls_set_class(unsigned long *clp, unsigned long cl) return xchg(clp, cl); } -static inline unsigned long -cls_set_class(struct Qdisc *q, unsigned long *clp, unsigned long cl) +static inline void +__tcf_bind_filter(struct Qdisc *q, struct tcf_result *r, unsigned long base) { - unsigned long old_cl; + unsigned long cl; - sch_tree_lock(q); - old_cl = __cls_set_class(clp, cl); - sch_tree_unlock(q); - return old_cl; + cl = q->ops->cl_ops->bind_tcf(q, base, r->classid); + cl = __cls_set_class(&r->class, cl); + if (cl) + q->ops->cl_ops->unbind_tcf(q, cl); } static inline void tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base) { struct Qdisc *q = tp->chain->block->q; - unsigned long cl; /* Check q as it is not set for shared blocks. In that case, * setting class is not supported. */ if (!q) return; - cl = q->ops->cl_ops->bind_tcf(q, base, r->classid); - cl = cls_set_class(q, &r->class, cl); - if (cl) + sch_tree_lock(q); + __tcf_bind_filter(q, r, base); + sch_tree_unlock(q); +} + +static inline void +__tcf_unbind_filter(struct Qdisc *q, struct tcf_result *r) +{ + unsigned long cl; + + if ((cl = __cls_set_class(&r->class, 0)) != 0) q->ops->cl_ops->unbind_tcf(q, cl); } @@ -173,12 +180,10 @@ static inline void tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r) { struct Qdisc *q = tp->chain->block->q; - unsigned long cl; if (!q) return; - if ((cl = __cls_set_class(&r->class, 0)) != 0) - q->ops->cl_ops->unbind_tcf(q, cl); + __tcf_unbind_filter(q, r); } struct tcf_exts { diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index fceddf89592a..151208704ed2 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -318,7 +318,8 @@ struct tcf_proto_ops { void *type_data); void (*hw_del)(struct tcf_proto *tp, void *type_data); - void (*bind_class)(void *, u32, unsigned long); + void (*bind_class)(void *, u32, unsigned long, + void *, unsigned long); void * (*tmplt_create)(struct net *net, struct tcf_chain *chain, struct nlattr **tca, -- cgit v1.2.3 From 3b33583265ed3b0ae76eddbabf9d038b4076d1a9 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Sat, 25 Jan 2020 11:26:42 +0100 Subject: net: Add fraglist GRO/GSO feature flags This adds new Fraglist GRO/GSO feature flags. They will be used to configure fraglist GRO/GSO what will be implemented with some followup paches. Signed-off-by: Steffen Klassert Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/netdev_features.h | 6 +++++- include/linux/netdevice.h | 1 + include/linux/skbuff.h | 2 ++ 3 files changed, 8 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 4b19c544c59a..b239507da2a0 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -53,8 +53,9 @@ enum { NETIF_F_GSO_ESP_BIT, /* ... ESP with TSO */ NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */ NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */ + NETIF_F_GSO_FRAGLIST_BIT, /* ... Fraglist GSO */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_UDP_L4_BIT, + NETIF_F_GSO_FRAGLIST_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ @@ -80,6 +81,7 @@ enum { NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */ NETIF_F_HW_TLS_RECORD_BIT, /* Offload TLS record */ + NETIF_F_GRO_FRAGLIST_BIT, /* Fraglist GRO */ /* * Add your fresh new feature above and remember to update @@ -150,6 +152,8 @@ enum { #define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4) #define NETIF_F_HW_TLS_TX __NETIF_F(HW_TLS_TX) #define NETIF_F_HW_TLS_RX __NETIF_F(HW_TLS_RX) +#define NETIF_F_GRO_FRAGLIST __NETIF_F(GRO_FRAGLIST) +#define NETIF_F_GSO_FRAGLIST __NETIF_F(GSO_FRAGLIST) /* Finds the next feature with the highest number of the range of start till 0. */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 78e9c6c1b131..fcc76b890f50 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4570,6 +4570,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT)); return (features & feature) == feature; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 26beae7db264..23aaaf08e1e9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -592,6 +592,8 @@ enum { SKB_GSO_UDP = 1 << 16, SKB_GSO_UDP_L4 = 1 << 17, + + SKB_GSO_FRAGLIST = 1 << 18, }; #if BITS_PER_LONG > 32 -- cgit v1.2.3 From 1a3c998f3a27ab6ecf56bdbb17e27e55fd6d47cd Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Sat, 25 Jan 2020 11:26:43 +0100 Subject: net: Add a netdev software feature set that defaults to off. The previous patch added the NETIF_F_GRO_FRAGLIST feature. This is a software feature that should default to off. Current software features default to on, so add a new feature set that defaults to off. Signed-off-by: Steffen Klassert Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/netdev_features.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index b239507da2a0..34d050bb1ae6 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -230,6 +230,9 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) /* changeable features with no special hardware requirements */ #define NETIF_F_SOFT_FEATURES (NETIF_F_GSO | NETIF_F_GRO) +/* Changeable features with no special hardware requirements that defaults to off. */ +#define NETIF_F_SOFT_FEATURES_OFF NETIF_F_GRO_FRAGLIST + #define NETIF_F_VLAN_FEATURES (NETIF_F_HW_VLAN_CTAG_FILTER | \ NETIF_F_HW_VLAN_CTAG_RX | \ NETIF_F_HW_VLAN_CTAG_TX | \ -- cgit v1.2.3 From 3a1296a38d0cf62bffb9a03c585cbd5dbf15d596 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Sat, 25 Jan 2020 11:26:44 +0100 Subject: net: Support GRO/GSO fraglist chaining. This patch adds the core functions to chain/unchain GSO skbs at the frag_list pointer. This also adds a new GSO type SKB_GSO_FRAGLIST and a is_flist flag to napi_gro_cb which indicates that this flow will be GROed by fraglist chaining. Signed-off-by: Steffen Klassert Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/netdevice.h | 4 +++- include/linux/skbuff.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fcc76b890f50..20445f94eb1c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2326,7 +2326,8 @@ struct napi_gro_cb { /* Number of gro_receive callbacks this packet already went through */ u8 recursion_counter:4; - /* 1 bit hole */ + /* GRO is done by frag_list pointer chaining. */ + u8 is_flist:1; /* used to support CHECKSUM_COMPLETE for tunneling protocols */ __wsum csum; @@ -2694,6 +2695,7 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id); int netdev_get_name(struct net *net, char *name, int ifindex); int dev_restart(struct net_device *dev); int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); +int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb); static inline unsigned int skb_gro_offset(const struct sk_buff *skb) { diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 23aaaf08e1e9..3d13a4b717e9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3535,6 +3535,8 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet); bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu); bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len); struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features); +struct sk_buff *skb_segment_list(struct sk_buff *skb, netdev_features_t features, + unsigned int offset); struct sk_buff *skb_vlan_untag(struct sk_buff *skb); int skb_ensure_writable(struct sk_buff *skb, int write_len); int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci); -- cgit v1.2.3 From 9fd1ff5d2ac7181844735806b0a703c942365291 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Sat, 25 Jan 2020 11:26:45 +0100 Subject: udp: Support UDP fraglist GRO/GSO. This patch extends UDP GRO to support fraglist GRO/GSO by using the previously introduced infrastructure. If the feature is enabled, all UDP packets are going to fraglist GRO (local input and forward). After validating the csum, we mark ip_summed as CHECKSUM_UNNECESSARY for fraglist GRO packets to make sure that the csum is not touched. Signed-off-by: Steffen Klassert Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/net/udp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/udp.h b/include/net/udp.h index bad74f780831..44e0e52b585c 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -167,7 +167,7 @@ typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport, __be16 dport); struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, - struct udphdr *uh, udp_lookup_t lookup); + struct udphdr *uh, struct sock *sk); int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup); struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, -- cgit v1.2.3 From 93642e14bd50e59b11cf6389ce3fc243e932777a Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Sat, 25 Jan 2020 12:17:08 +0100 Subject: net: introduce dev_net notifier register/unregister variants Introduce dev_net variants of netdev notifier register/unregister functions and allow per-net notifier to follow the netdevice into the namespace it is moved to. Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller --- include/linux/netdevice.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 20445f94eb1c..4626188a754b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -939,6 +939,11 @@ struct netdev_name_node { int netdev_name_node_alt_create(struct net_device *dev, const char *name); int netdev_name_node_alt_destroy(struct net_device *dev, const char *name); +struct netdev_net_notifier { + struct list_head list; + struct notifier_block *nb; +}; + /* * This structure defines the management hooks for network devices. * The following hooks can be defined; unless noted otherwise, they are @@ -1793,6 +1798,10 @@ enum netdev_priv_flags { * * @wol_enabled: Wake-on-LAN is enabled * + * @net_notifier_list: List of per-net netdev notifier block + * that follow this device when it is moved + * to another network namespace. + * * FIXME: cleanup struct net_device such that network protocol info * moves out. */ @@ -2085,6 +2094,8 @@ struct net_device { struct lock_class_key addr_list_lock_key; bool proto_down; unsigned wol_enabled:1; + + struct list_head net_notifier_list; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -2529,6 +2540,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb); int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb); int unregister_netdevice_notifier_net(struct net *net, struct notifier_block *nb); +int register_netdevice_notifier_dev_net(struct net_device *dev, + struct notifier_block *nb, + struct netdev_net_notifier *nn); +int unregister_netdevice_notifier_dev_net(struct net_device *dev, + struct notifier_block *nb, + struct netdev_net_notifier *nn); struct netdev_notifier_info { struct net_device *dev; -- cgit v1.2.3 From a85dd3a5170c8812cd835ea968ccadf0ebf1648e Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sat, 25 Jan 2020 13:42:14 +0100 Subject: net: remove eth_change_mtu All usage of this function was removed three years ago, and the function was marked as deprecated: a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage") So I think we can remove it now. Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller --- include/linux/etherdevice.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index f6564b572d77..8801f1f986e5 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -43,7 +43,6 @@ __be16 eth_header_parse_protocol(const struct sk_buff *skb); int eth_prepare_mac_addr_change(struct net_device *dev, void *p); void eth_commit_mac_addr_change(struct net_device *dev, void *p); int eth_mac_addr(struct net_device *dev, void *p); -int eth_change_mtu(struct net_device *dev, int new_mtu); int eth_validate_addr(struct net_device *dev); struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs, -- cgit v1.2.3 From 6a94b8ccf6b77f005ab1b36a878e1d81df0c033e Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:04 +0100 Subject: ethtool: provide message mask with DEBUG_GET request Implement DEBUG_GET request to get debugging settings for a device. At the moment, only message mask corresponding to message level as reported by ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) As part of the implementation, provide symbolic names for message mask bits as ETH_SS_MSG_CLASSES string set. Signed-off-by: Michal Kubecek Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/netdevice.h | 56 ++++++++++++++++++++++++++---------- include/uapi/linux/ethtool.h | 2 ++ include/uapi/linux/ethtool_netlink.h | 14 +++++++++ 3 files changed, 57 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4626188a754b..a9c6b5c61d27 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3913,22 +3913,48 @@ void netif_device_attach(struct net_device *dev); */ enum { - NETIF_MSG_DRV = 0x0001, - NETIF_MSG_PROBE = 0x0002, - NETIF_MSG_LINK = 0x0004, - NETIF_MSG_TIMER = 0x0008, - NETIF_MSG_IFDOWN = 0x0010, - NETIF_MSG_IFUP = 0x0020, - NETIF_MSG_RX_ERR = 0x0040, - NETIF_MSG_TX_ERR = 0x0080, - NETIF_MSG_TX_QUEUED = 0x0100, - NETIF_MSG_INTR = 0x0200, - NETIF_MSG_TX_DONE = 0x0400, - NETIF_MSG_RX_STATUS = 0x0800, - NETIF_MSG_PKTDATA = 0x1000, - NETIF_MSG_HW = 0x2000, - NETIF_MSG_WOL = 0x4000, + NETIF_MSG_DRV_BIT, + NETIF_MSG_PROBE_BIT, + NETIF_MSG_LINK_BIT, + NETIF_MSG_TIMER_BIT, + NETIF_MSG_IFDOWN_BIT, + NETIF_MSG_IFUP_BIT, + NETIF_MSG_RX_ERR_BIT, + NETIF_MSG_TX_ERR_BIT, + NETIF_MSG_TX_QUEUED_BIT, + NETIF_MSG_INTR_BIT, + NETIF_MSG_TX_DONE_BIT, + NETIF_MSG_RX_STATUS_BIT, + NETIF_MSG_PKTDATA_BIT, + NETIF_MSG_HW_BIT, + NETIF_MSG_WOL_BIT, + + /* When you add a new bit above, update netif_msg_class_names array + * in net/ethtool/common.c + */ + NETIF_MSG_CLASS_COUNT, }; +/* Both ethtool_ops interface and internal driver implementation use u32 */ +static_assert(NETIF_MSG_CLASS_COUNT <= 32); + +#define __NETIF_MSG_BIT(bit) ((u32)1 << (bit)) +#define __NETIF_MSG(name) __NETIF_MSG_BIT(NETIF_MSG_ ## name ## _BIT) + +#define NETIF_MSG_DRV __NETIF_MSG(DRV) +#define NETIF_MSG_PROBE __NETIF_MSG(PROBE) +#define NETIF_MSG_LINK __NETIF_MSG(LINK) +#define NETIF_MSG_TIMER __NETIF_MSG(TIMER) +#define NETIF_MSG_IFDOWN __NETIF_MSG(IFDOWN) +#define NETIF_MSG_IFUP __NETIF_MSG(IFUP) +#define NETIF_MSG_RX_ERR __NETIF_MSG(RX_ERR) +#define NETIF_MSG_TX_ERR __NETIF_MSG(TX_ERR) +#define NETIF_MSG_TX_QUEUED __NETIF_MSG(TX_QUEUED) +#define NETIF_MSG_INTR __NETIF_MSG(INTR) +#define NETIF_MSG_TX_DONE __NETIF_MSG(TX_DONE) +#define NETIF_MSG_RX_STATUS __NETIF_MSG(RX_STATUS) +#define NETIF_MSG_PKTDATA __NETIF_MSG(PKTDATA) +#define NETIF_MSG_HW __NETIF_MSG(HW) +#define NETIF_MSG_WOL __NETIF_MSG(WOL) #define netif_msg_drv(p) ((p)->msg_enable & NETIF_MSG_DRV) #define netif_msg_probe(p) ((p)->msg_enable & NETIF_MSG_PROBE) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 116bcbf09c74..456fb2aa0fad 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -594,6 +594,7 @@ struct ethtool_pauseparam { * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS * @ETH_SS_PHY_TUNABLES: PHY tunable names * @ETH_SS_LINK_MODES: link mode names + * @ETH_SS_MSG_CLASSES: debug message class names */ enum ethtool_stringset { ETH_SS_TEST = 0, @@ -606,6 +607,7 @@ enum ethtool_stringset { ETH_SS_PHY_STATS, ETH_SS_PHY_TUNABLES, ETH_SS_LINK_MODES, + ETH_SS_MSG_CLASSES, /* add new constants above here */ ETH_SS_COUNT diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 02f82f42a889..0d39e04567cb 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -20,6 +20,7 @@ enum { ETHTOOL_MSG_LINKMODES_GET, ETHTOOL_MSG_LINKMODES_SET, ETHTOOL_MSG_LINKSTATE_GET, + ETHTOOL_MSG_DEBUG_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -35,6 +36,7 @@ enum { ETHTOOL_MSG_LINKMODES_GET_REPLY, ETHTOOL_MSG_LINKMODES_NTF, ETHTOOL_MSG_LINKSTATE_GET_REPLY, + ETHTOOL_MSG_DEBUG_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -195,6 +197,18 @@ enum { ETHTOOL_A_LINKSTATE_MAX = __ETHTOOL_A_LINKSTATE_CNT - 1 }; +/* DEBUG */ + +enum { + ETHTOOL_A_DEBUG_UNSPEC, + ETHTOOL_A_DEBUG_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_DEBUG_MSGMASK, /* bitset */ + + /* add new constants above here */ + __ETHTOOL_A_DEBUG_CNT, + ETHTOOL_A_DEBUG_MAX = __ETHTOOL_A_DEBUG_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From e54d04e3afead22d8e7d6edaaac487a1205bac39 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:07 +0100 Subject: ethtool: set message mask with DEBUG_SET request Implement DEBUG_SET netlink request to set debugging settings for a device. At the moment, only message mask corresponding to message level as set by ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) Signed-off-by: Michal Kubecek Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 0d39e04567cb..8f0b6fd277d5 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -21,6 +21,7 @@ enum { ETHTOOL_MSG_LINKMODES_SET, ETHTOOL_MSG_LINKSTATE_GET, ETHTOOL_MSG_DEBUG_GET, + ETHTOOL_MSG_DEBUG_SET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, -- cgit v1.2.3 From 0bda7af39d2ba6ef83bd18e26bcb027f1630004e Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:10 +0100 Subject: ethtool: add DEBUG_NTF notification Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message or ETHTOOL_SMSGLVL ioctl request. The notification message has the same format as reply to DEBUG_GET request. As with other ethtool notifications, netlink requests only trigger the notification if the mask is actually changed while ioctl request trigger it whenever the request results in calling the ethtool_ops handler. Signed-off-by: Michal Kubecek Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 8f0b6fd277d5..67a06b94bf28 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -38,6 +38,7 @@ enum { ETHTOOL_MSG_LINKMODES_NTF, ETHTOOL_MSG_LINKSTATE_GET_REPLY, ETHTOOL_MSG_DEBUG_GET_REPLY, + ETHTOOL_MSG_DEBUG_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, -- cgit v1.2.3 From 51ea22b04ea0c210f9ce87b8a600965dbe476bc2 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:13 +0100 Subject: ethtool: provide WoL settings with WOL_GET request Implement WOL_GET request to get wake-on-lan settings for a device, traditionally available via ETHTOOL_GWOL ioctl request. As part of the implementation, provide symbolic names for wake-on-line modes as ETH_SS_WOL_MODES string set. Signed-off-by: Michal Kubecek Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/uapi/linux/ethtool.h | 4 ++++ include/uapi/linux/ethtool_netlink.h | 15 +++++++++++++++ 2 files changed, 19 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 456fb2aa0fad..4295ebfa2f91 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -595,6 +595,7 @@ struct ethtool_pauseparam { * @ETH_SS_PHY_TUNABLES: PHY tunable names * @ETH_SS_LINK_MODES: link mode names * @ETH_SS_MSG_CLASSES: debug message class names + * @ETH_SS_WOL_MODES: wake-on-lan modes */ enum ethtool_stringset { ETH_SS_TEST = 0, @@ -608,6 +609,7 @@ enum ethtool_stringset { ETH_SS_PHY_TUNABLES, ETH_SS_LINK_MODES, ETH_SS_MSG_CLASSES, + ETH_SS_WOL_MODES, /* add new constants above here */ ETH_SS_COUNT @@ -1695,6 +1697,8 @@ static inline int ethtool_validate_duplex(__u8 duplex) #define WAKE_MAGICSECURE (1 << 6) /* only meaningful if WAKE_MAGIC */ #define WAKE_FILTER (1 << 7) +#define WOL_MODE_COUNT 8 + /* L2-L4 network traffic flow types */ #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */ #define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */ diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 67a06b94bf28..dcc5c32dc018 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -22,6 +22,7 @@ enum { ETHTOOL_MSG_LINKSTATE_GET, ETHTOOL_MSG_DEBUG_GET, ETHTOOL_MSG_DEBUG_SET, + ETHTOOL_MSG_WOL_GET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -39,6 +40,7 @@ enum { ETHTOOL_MSG_LINKSTATE_GET_REPLY, ETHTOOL_MSG_DEBUG_GET_REPLY, ETHTOOL_MSG_DEBUG_NTF, + ETHTOOL_MSG_WOL_GET_REPLY, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -211,6 +213,19 @@ enum { ETHTOOL_A_DEBUG_MAX = __ETHTOOL_A_DEBUG_CNT - 1 }; +/* WOL */ + +enum { + ETHTOOL_A_WOL_UNSPEC, + ETHTOOL_A_WOL_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_WOL_MODES, /* bitset */ + ETHTOOL_A_WOL_SOPASS, /* binary */ + + /* add new constants above here */ + __ETHTOOL_A_WOL_CNT, + ETHTOOL_A_WOL_MAX = __ETHTOOL_A_WOL_CNT - 1 +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 -- cgit v1.2.3 From 8d425b19b305a77cb1ba67b84e7c1ca547de032f Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:16 +0100 Subject: ethtool: set wake-on-lan settings with WOL_SET request Implement WOL_SET netlink request to set wake-on-lan settings. This is equivalent to ETHTOOL_SWOL ioctl request. Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index dcc5c32dc018..59de35695521 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -23,6 +23,7 @@ enum { ETHTOOL_MSG_DEBUG_GET, ETHTOOL_MSG_DEBUG_SET, ETHTOOL_MSG_WOL_GET, + ETHTOOL_MSG_WOL_SET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, -- cgit v1.2.3 From 67bffa79231f15ba372007972563cc8aa5e48bfa Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Sun, 26 Jan 2020 23:11:19 +0100 Subject: ethtool: add WOL_NTF notification Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of a device are modified using ETHTOOL_MSG_WOL_SET netlink message or ETHTOOL_SWOL ioctl request. As notifications can be received by anyone, do not include SecureOn(tm) password in notification messages. Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller --- include/uapi/linux/ethtool_netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 59de35695521..7e0b460f872c 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -42,6 +42,7 @@ enum { ETHTOOL_MSG_DEBUG_GET_REPLY, ETHTOOL_MSG_DEBUG_NTF, ETHTOOL_MSG_WOL_GET_REPLY, + ETHTOOL_MSG_WOL_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, -- cgit v1.2.3 From 41c0d917d11edf9a34461f56e14f067aadb36101 Mon Sep 17 00:00:00 2001 From: Vasundhara Volam Date: Mon, 27 Jan 2020 04:56:25 -0500 Subject: devlink: add macro for "fw.roce" Add definition and documentation for the new generic info "fw.roce". v2: Remove board.nvm_cfg since fw.psid is similar. Cc: Jiri Pirko Cc: Jakub Kicinski Signed-off-by: Vasundhara Volam Signed-off-by: Michael Chan Signed-off-by: David S. Miller --- include/net/devlink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 5e46c24bb6e6..ce5cea428fdc 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -487,6 +487,8 @@ enum devlink_param_generic_id { #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi" /* FW parameter set id */ #define DEVLINK_INFO_VERSION_GENERIC_FW_PSID "fw.psid" +/* RoCE FW version */ +#define DEVLINK_INFO_VERSION_GENERIC_FW_ROCE "fw.roce" struct devlink_region; struct devlink_info_req; -- cgit v1.2.3 From 2924e0699963b839f88f8c4e855929ea49185870 Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:07 +0200 Subject: qed: FW 8.42.2.0 Internal ram offsets modifications IRO stands for internal RAM offsets. Updating the FW binary produces different iro offsets. This file contains the different values, and a new representation of the values. Update the FW version Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/common_hsi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h index 03f59a28fefd..3f437e826a4c 100644 --- a/include/linux/qed/common_hsi.h +++ b/include/linux/qed/common_hsi.h @@ -109,8 +109,8 @@ #define MAX_NUM_LL2_TX_STATS_COUNTERS 48 #define FW_MAJOR_VERSION 8 -#define FW_MINOR_VERSION 37 -#define FW_REVISION_VERSION 7 +#define FW_MINOR_VERSION 42 +#define FW_REVISION_VERSION 2 #define FW_ENGINEERING_VERSION 0 /***********************/ -- cgit v1.2.3 From 997af5df230e3288ec1f5b332955f9be643e450b Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:12 +0200 Subject: qed: FW 8.42.2.0 Additional ll2 type LL2 queues were a limited resource due to FW constraints. This FW introduced a new resource which is a context based ll2 queue (memory on host). The additional ll2 queues are required for RDMA SRIOV. The code refers to the previous ll2 queues as ram-based or legacy, and the new queues as ctx-based. This change decreased the "legacy" ram-based queues therefore the first ll2 queue used for iWARP was converted to the ctx-based ll2 queue. This feature also exposed a bug in the DIRECT_REG_WR64 macro implementation which didn't have an effect in other use cases. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/common_hsi.h | 15 +++++++++++++-- include/linux/qed/qed_if.h | 4 +++- include/linux/qed/qed_ll2_if.h | 7 +++++++ 3 files changed, 23 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h index 3f437e826a4c..a2b7826b36f0 100644 --- a/include/linux/qed/common_hsi.h +++ b/include/linux/qed/common_hsi.h @@ -105,8 +105,15 @@ #define CORE_SPQE_PAGE_SIZE_BYTES 4096 -#define MAX_NUM_LL2_RX_QUEUES 48 -#define MAX_NUM_LL2_TX_STATS_COUNTERS 48 +/* Number of LL2 RAM based queues */ +#define MAX_NUM_LL2_RX_RAM_QUEUES 32 + +/* Number of LL2 context based queues */ +#define MAX_NUM_LL2_RX_CTX_QUEUES 208 +#define MAX_NUM_LL2_RX_QUEUES \ + (MAX_NUM_LL2_RX_RAM_QUEUES + MAX_NUM_LL2_RX_CTX_QUEUES) + +#define MAX_NUM_LL2_TX_STATS_COUNTERS 48 #define FW_MAJOR_VERSION 8 #define FW_MINOR_VERSION 42 @@ -340,6 +347,10 @@ #define DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD (DQ_PWM_OFFSET_TCM16_BASE + 1) #define DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD (DQ_PWM_OFFSET_TCM16_BASE + 3) +/* DQ_DEMS_AGG_VAL_BASE */ +#define DQ_PWM_OFFSET_TCM_LL2_PROD_UPDATE \ + (DQ_PWM_OFFSET_TCM32_BASE + DQ_TCM_AGG_VAL_SEL_REG9 - 4) + #define DQ_REGION_SHIFT (12) /* DPM */ diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index b5db1ee96d78..9bcb2f419004 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -463,7 +463,7 @@ enum qed_db_rec_space { #define DIRECT_REG_RD(reg_addr) readl((void __iomem *)(reg_addr)) -#define DIRECT_REG_WR64(reg_addr, val) writeq((u32)val, \ +#define DIRECT_REG_WR64(reg_addr, val) writeq((u64)val, \ (void __iomem *)(reg_addr)) #define QED_COALESCE_MAX 0x1FF @@ -1177,6 +1177,8 @@ struct qed_common_ops { #define GET_FIELD(value, name) \ (((value) >> (name ## _SHIFT)) & name ## _MASK) +#define DB_ADDR_SHIFT(addr) ((addr) << DB_PWM_ADDR_OFFSET_SHIFT) + /* Debug print definitions */ #define DP_ERR(cdev, fmt, ...) \ do { \ diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h index 5eb022953aca..1313c34d9a68 100644 --- a/include/linux/qed/qed_ll2_if.h +++ b/include/linux/qed/qed_ll2_if.h @@ -52,6 +52,12 @@ enum qed_ll2_conn_type { QED_LL2_TYPE_ROCE, QED_LL2_TYPE_IWARP, QED_LL2_TYPE_RESERVED3, + MAX_QED_LL2_CONN_TYPE +}; + +enum qed_ll2_rx_conn_type { + QED_LL2_RX_TYPE_LEGACY, + QED_LL2_RX_TYPE_CTX, MAX_QED_LL2_RX_CONN_TYPE }; @@ -165,6 +171,7 @@ struct qed_ll2_cbs { }; struct qed_ll2_acquire_data_inputs { + enum qed_ll2_rx_conn_type rx_conn_type; enum qed_ll2_conn_type conn_type; u16 mtu; u16 rx_num_desc; -- cgit v1.2.3 From 1392d19ff1d6ddd370cefa73b552a0262f9c35ea Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:13 +0200 Subject: qed: Add abstraction for different hsi values per chip The number of BTB blocks was modified to be different between the two chip flavors supported (BB/K2) as a result, this lead to a re-write of selecting the default hsi value based on the chip. This patch creates a lookup table for hsi values per chip rather than ask again and again for every value. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/common_hsi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h index a2b7826b36f0..718ce72e5965 100644 --- a/include/linux/qed/common_hsi.h +++ b/include/linux/qed/common_hsi.h @@ -663,8 +663,8 @@ #define PBF_MAX_CMD_LINES 3328 /* Number of BTB blocks. Each block is 256B. */ -#define BTB_MAX_BLOCKS 1440 - +#define BTB_MAX_BLOCKS_BB 1440 +#define BTB_MAX_BLOCKS_K2 1840 /*****************/ /* PRS CONSTANTS */ /*****************/ -- cgit v1.2.3 From 6459d93619b5bc21f775e7eb12bc4d051743d7aa Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:14 +0200 Subject: qed: FW 8.42.2.0 iscsi/fcoe changes - Remove struct iscsi_slow_path_hdr and field fw_cid from several structs - Remove struct iscsi_spe_func_dstry - Remove fields pbe_page_size_log and pbl_page_size_log from struct iscsi_conn_offload_param Signed-off-by: Manish Rangankar Signed-off-by: Saurav Kashyap Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/iscsi_common.h | 64 ++++++++++++++------------------------ include/linux/qed/storage_common.h | 3 +- 2 files changed, 26 insertions(+), 41 deletions(-) (limited to 'include') diff --git a/include/linux/qed/iscsi_common.h b/include/linux/qed/iscsi_common.h index 66aba505ec56..2f0a771a9176 100644 --- a/include/linux/qed/iscsi_common.h +++ b/include/linux/qed/iscsi_common.h @@ -999,7 +999,6 @@ struct iscsi_conn_offload_params { struct regpair r2tq_pbl_addr; struct regpair xhq_pbl_addr; struct regpair uhq_pbl_addr; - __le32 initial_ack; __le16 physical_q0; __le16 physical_q1; u8 flags; @@ -1011,10 +1010,10 @@ struct iscsi_conn_offload_params { #define ISCSI_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2 #define ISCSI_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0x1F #define ISCSI_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 3 - u8 pbl_page_size_log; - u8 pbe_page_size_log; u8 default_cq; + __le16 reserved0; __le32 stat_sn; + __le32 initial_ack; }; /* iSCSI connection statistics */ @@ -1029,25 +1028,14 @@ struct iscsi_conn_stats_params { __le32 reserved; }; -/* spe message header */ -struct iscsi_slow_path_hdr { - u8 op_code; - u8 flags; -#define ISCSI_SLOW_PATH_HDR_RESERVED0_MASK 0xF -#define ISCSI_SLOW_PATH_HDR_RESERVED0_SHIFT 0 -#define ISCSI_SLOW_PATH_HDR_LAYER_CODE_MASK 0x7 -#define ISCSI_SLOW_PATH_HDR_LAYER_CODE_SHIFT 4 -#define ISCSI_SLOW_PATH_HDR_RESERVED1_MASK 0x1 -#define ISCSI_SLOW_PATH_HDR_RESERVED1_SHIFT 7 -}; /* iSCSI connection update params passed by driver to FW in ISCSI update *ramrod. */ struct iscsi_conn_update_ramrod_params { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; u8 flags; #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0 @@ -1065,7 +1053,7 @@ struct iscsi_conn_update_ramrod_params { #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_DIF_ON_IMM_EN_SHIFT 6 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_LUN_MAPPER_EN_MASK 0x1 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_LUN_MAPPER_EN_SHIFT 7 - u8 reserved0[3]; + u8 reserved3[3]; __le32 max_seq_size; __le32 max_send_pdu_length; __le32 max_recv_pdu_length; @@ -1251,22 +1239,22 @@ enum iscsi_ramrod_cmd_id { /* iSCSI connection termination request */ struct iscsi_spe_conn_mac_update { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; __le16 remote_mac_addr_lo; __le16 remote_mac_addr_mid; __le16 remote_mac_addr_hi; - u8 reserved0[2]; + u8 reserved2[2]; }; /* iSCSI and TCP connection (Option 1) offload params passed by driver to FW in * iSCSI offload ramrod. */ struct iscsi_spe_conn_offload { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; struct iscsi_conn_offload_params iscsi; struct tcp_offload_params tcp; }; @@ -1275,44 +1263,36 @@ struct iscsi_spe_conn_offload { * iSCSI offload ramrod. */ struct iscsi_spe_conn_offload_option2 { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; struct iscsi_conn_offload_params iscsi; struct tcp_offload_params_opt2 tcp; }; /* iSCSI collect connection statistics request */ struct iscsi_spe_conn_statistics { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; u8 reset_stats; - u8 reserved0[7]; + u8 reserved2[7]; struct regpair stats_cnts_addr; }; /* iSCSI connection termination request */ struct iscsi_spe_conn_termination { - struct iscsi_slow_path_hdr hdr; + __le16 reserved0; __le16 conn_id; - __le32 fw_cid; + __le32 reserved1; u8 abortive; - u8 reserved0[7]; + u8 reserved2[7]; struct regpair queue_cnts_addr; struct regpair query_params_addr; }; -/* iSCSI firmware function destroy parameters */ -struct iscsi_spe_func_dstry { - struct iscsi_slow_path_hdr hdr; - __le16 reserved0; - __le32 reserved1; -}; - /* iSCSI firmware function init parameters */ struct iscsi_spe_func_init { - struct iscsi_slow_path_hdr hdr; __le16 half_way_close_timeout; u8 num_sq_pages_in_ring; u8 num_r2tq_pages_in_ring; @@ -1324,8 +1304,12 @@ struct iscsi_spe_func_init { #define ISCSI_SPE_FUNC_INIT_RESERVED0_MASK 0x7F #define ISCSI_SPE_FUNC_INIT_RESERVED0_SHIFT 1 struct iscsi_debug_modes debug_mode; - __le16 reserved1; - __le32 reserved2; + u8 params; +#define ISCSI_SPE_FUNC_INIT_MAX_SYN_RT_MASK 0xF +#define ISCSI_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT 0 +#define ISCSI_SPE_FUNC_INIT_RESERVED1_MASK 0xF +#define ISCSI_SPE_FUNC_INIT_RESERVED1_SHIFT 4 + u8 reserved2[7]; struct scsi_init_func_params func_params; struct scsi_init_func_queues q_params; }; diff --git a/include/linux/qed/storage_common.h b/include/linux/qed/storage_common.h index 505c0b48a761..9a973ffbbff5 100644 --- a/include/linux/qed/storage_common.h +++ b/include/linux/qed/storage_common.h @@ -107,8 +107,9 @@ struct scsi_drv_cmdq { struct scsi_init_func_params { __le16 num_tasks; u8 log_page_size; + u8 log_page_size_conn; u8 debug_mode; - u8 reserved2[12]; + u8 reserved2[11]; }; /* SCSI RQ/CQ/CMDQ firmware function init parameters */ -- cgit v1.2.3 From 0500a70d6e071040ffdaadebb966986afa83c5e9 Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:15 +0200 Subject: qed: FW 8.42.2.0 HSI changes This patch contains several HSI changes. The changes are part of features like RDMA VF and OVS, the patch also contains a fix to how the init code determines if the dmae is ready to be used. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/common_hsi.h | 21 +++++++----- include/linux/qed/eth_common.h | 78 ++++++++++++++++++++++++++++++------------ 2 files changed, 68 insertions(+), 31 deletions(-) (limited to 'include') diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h index 718ce72e5965..2c4737e6694a 100644 --- a/include/linux/qed/common_hsi.h +++ b/include/linux/qed/common_hsi.h @@ -76,7 +76,6 @@ #define FW_ASSERT_GENERAL_ATTN_IDX 32 -#define MAX_PINNED_CCFC 32 /* Queue Zone sizes in bytes */ #define TSTORM_QZONE_SIZE 8 @@ -139,10 +138,10 @@ #define MAX_NUM_VFS (MAX_NUM_VFS_K2) #define MAX_NUM_FUNCTIONS_BB (MAX_NUM_PFS_BB + MAX_NUM_VFS_BB) -#define MAX_NUM_FUNCTIONS (MAX_NUM_PFS + MAX_NUM_VFS) #define MAX_FUNCTION_NUMBER_BB (MAX_NUM_PFS + MAX_NUM_VFS_BB) -#define MAX_FUNCTION_NUMBER (MAX_NUM_PFS + MAX_NUM_VFS) +#define MAX_FUNCTION_NUMBER_K2 (MAX_NUM_PFS + MAX_NUM_VFS_K2) +#define MAX_NUM_FUNCTIONS (MAX_FUNCTION_NUMBER_K2) #define MAX_NUM_VPORTS_K2 (208) #define MAX_NUM_VPORTS_BB (160) @@ -229,6 +228,7 @@ #define DQ_XCM_TOE_TX_BD_PROD_CMD DQ_XCM_AGG_VAL_SEL_WORD4 #define DQ_XCM_TOE_MORE_TO_SEND_SEQ_CMD DQ_XCM_AGG_VAL_SEL_REG3 #define DQ_XCM_TOE_LOCAL_ADV_WND_SEQ_CMD DQ_XCM_AGG_VAL_SEL_REG4 +#define DQ_XCM_ROCE_ACK_EDPM_DORQ_SEQ_CMD DQ_XCM_AGG_VAL_SEL_WORD5 /* UCM agg val selection (HW) */ #define DQ_UCM_AGG_VAL_SEL_WORD0 0 @@ -406,6 +406,7 @@ /* Number of Protocol Indices per Status Block */ #define PIS_PER_SB_E4 12 +#define MAX_PIS_PER_SB PIS_PER_SB #define CAU_HC_STOPPED_STATE 3 #define CAU_HC_DISABLE_STATE 4 @@ -436,8 +437,6 @@ #define IGU_MEM_PBA_MSIX_RESERVED_UPPER 0x03ff #define IGU_CMD_INT_ACK_BASE 0x0400 -#define IGU_CMD_INT_ACK_UPPER (IGU_CMD_INT_ACK_BASE + \ - MAX_TOT_SB_PER_PATH - 1) #define IGU_CMD_INT_ACK_RESERVED_UPPER 0x05ff #define IGU_CMD_ATTN_BIT_UPD_UPPER 0x05f0 @@ -450,8 +449,6 @@ #define IGU_REG_SISR_MDPC_WOMASK_UPPER 0x05f6 #define IGU_CMD_PROD_UPD_BASE 0x0600 -#define IGU_CMD_PROD_UPD_UPPER (IGU_CMD_PROD_UPD_BASE +\ - MAX_TOT_SB_PER_PATH - 1) #define IGU_CMD_PROD_UPD_RESERVED_UPPER 0x07ff /*****************/ @@ -741,6 +738,8 @@ enum protocol_type { PROTOCOLID_PREROCE, PROTOCOLID_COMMON, PROTOCOLID_RESERVED1, + PROTOCOLID_RDMA, + PROTOCOLID_SCSI, MAX_PROTOCOL_TYPE }; @@ -761,6 +760,10 @@ union rdma_eqe_data { struct rdma_eqe_destroy_qp rdma_destroy_qp_data; }; +struct tstorm_queue_zone { + __le32 reserved[2]; +}; + /* Ustorm Queue Zone */ struct ustorm_eth_queue_zone { struct coalescing_timeset int_coalescing_timeset; @@ -883,8 +886,8 @@ struct db_l2_dpm_data { #define DB_L2_DPM_DATA_RESERVED0_SHIFT 27 #define DB_L2_DPM_DATA_SGE_NUM_MASK 0x7 #define DB_L2_DPM_DATA_SGE_NUM_SHIFT 28 -#define DB_L2_DPM_DATA_GFS_SRC_EN_MASK 0x1 -#define DB_L2_DPM_DATA_GFS_SRC_EN_SHIFT 31 +#define DB_L2_DPM_DATA_TGFS_SRC_EN_MASK 0x1 +#define DB_L2_DPM_DATA_TGFS_SRC_EN_SHIFT 31 }; /* Structure for SGE in a DPM doorbell of type DPM_L2_BD */ diff --git a/include/linux/qed/eth_common.h b/include/linux/qed/eth_common.h index d9416ad5ef59..95f5fd615852 100644 --- a/include/linux/qed/eth_common.h +++ b/include/linux/qed/eth_common.h @@ -38,9 +38,11 @@ /********************/ #define ETH_HSI_VER_MAJOR 3 -#define ETH_HSI_VER_MINOR 10 +#define ETH_HSI_VER_MINOR 11 -#define ETH_HSI_VER_NO_PKT_LEN_TUNN 5 +#define ETH_HSI_VER_NO_PKT_LEN_TUNN 5 +/* Maximum number of pinned L2 connections (CIDs) */ +#define ETH_PINNED_CONN_MAX_NUM 32 #define ETH_CACHE_LINE_SIZE 64 #define ETH_RX_CQE_GAP 32 @@ -61,6 +63,7 @@ #define ETH_TX_MIN_BDS_PER_TUNN_IPV6_WITH_EXT_PKT 3 #define ETH_TX_MIN_BDS_PER_IPV6_WITH_EXT_PKT 2 #define ETH_TX_MIN_BDS_PER_PKT_W_LOOPBACK_MODE 2 +#define ETH_TX_MIN_BDS_PER_PKT_W_VPORT_FORWARDING 4 #define ETH_TX_MAX_NON_LSO_PKT_LEN (9700 - (4 + 4 + 12 + 8)) #define ETH_TX_MAX_LSO_HDR_BYTES 510 #define ETH_TX_LSO_WINDOW_BDS_NUM (18 - 1) @@ -75,9 +78,8 @@ #define ETH_NUM_STATISTIC_COUNTERS_QUAD_VF_ZONE \ (ETH_NUM_STATISTIC_COUNTERS - 3 * MAX_NUM_VFS / 4) -/* Maximum number of buffers, used for RX packet placement */ #define ETH_RX_MAX_BUFF_PER_PKT 5 -#define ETH_RX_BD_THRESHOLD 12 +#define ETH_RX_BD_THRESHOLD 16 /* Num of MAC/VLAN filters */ #define ETH_NUM_MAC_FILTERS 512 @@ -96,24 +98,24 @@ #define ETH_RSS_ENGINE_NUM_BB 127 /* TPA constants */ -#define ETH_TPA_MAX_AGGS_NUM 64 -#define ETH_TPA_CQE_START_LEN_LIST_SIZE ETH_RX_MAX_BUFF_PER_PKT -#define ETH_TPA_CQE_CONT_LEN_LIST_SIZE 6 -#define ETH_TPA_CQE_END_LEN_LIST_SIZE 4 +#define ETH_TPA_MAX_AGGS_NUM 64 +#define ETH_TPA_CQE_START_BW_LEN_LIST_SIZE 2 +#define ETH_TPA_CQE_CONT_LEN_LIST_SIZE 6 +#define ETH_TPA_CQE_END_LEN_LIST_SIZE 4 /* Control frame check constants */ -#define ETH_CTL_FRAME_ETH_TYPE_NUM 4 +#define ETH_CTL_FRAME_ETH_TYPE_NUM 4 /* GFS constants */ #define ETH_GFT_TRASHCAN_VPORT 0x1FF /* GFT drop flow vport number */ /* Destination port mode */ -enum dest_port_mode { - DEST_PORT_PHY, - DEST_PORT_LOOPBACK, - DEST_PORT_PHY_LOOPBACK, - DEST_PORT_DROP, - MAX_DEST_PORT_MODE +enum dst_port_mode { + DST_PORT_PHY, + DST_PORT_LOOPBACK, + DST_PORT_PHY_LOOPBACK, + DST_PORT_DROP, + MAX_DST_PORT_MODE }; /* Ethernet address type */ @@ -167,8 +169,8 @@ struct eth_tx_data_2nd_bd { #define ETH_TX_DATA_2ND_BD_TUNN_INNER_L2_HDR_SIZE_W_SHIFT 0 #define ETH_TX_DATA_2ND_BD_TUNN_INNER_ETH_TYPE_MASK 0x3 #define ETH_TX_DATA_2ND_BD_TUNN_INNER_ETH_TYPE_SHIFT 4 -#define ETH_TX_DATA_2ND_BD_DEST_PORT_MODE_MASK 0x3 -#define ETH_TX_DATA_2ND_BD_DEST_PORT_MODE_SHIFT 6 +#define ETH_TX_DATA_2ND_BD_DST_PORT_MODE_MASK 0x3 +#define ETH_TX_DATA_2ND_BD_DST_PORT_MODE_SHIFT 6 #define ETH_TX_DATA_2ND_BD_START_BD_MASK 0x1 #define ETH_TX_DATA_2ND_BD_START_BD_SHIFT 8 #define ETH_TX_DATA_2ND_BD_TUNN_TYPE_MASK 0x3 @@ -244,8 +246,9 @@ struct eth_fast_path_rx_reg_cqe { struct eth_tunnel_parsing_flags tunnel_pars_flags; u8 bd_num; u8 reserved; - __le16 flow_id; - u8 reserved1[11]; + __le16 reserved2; + __le32 flow_id_or_resource_id; + u8 reserved1[7]; struct eth_pmd_flow_flags pmd_flags; }; @@ -296,9 +299,10 @@ struct eth_fast_path_rx_tpa_start_cqe { struct eth_tunnel_parsing_flags tunnel_pars_flags; u8 tpa_agg_index; u8 header_len; - __le16 ext_bd_len_list[ETH_TPA_CQE_START_LEN_LIST_SIZE]; - __le16 flow_id; - u8 reserved; + __le16 bw_ext_bd_len_list[ETH_TPA_CQE_START_BW_LEN_LIST_SIZE]; + __le16 reserved2; + __le32 flow_id_or_resource_id; + u8 reserved[3]; struct eth_pmd_flow_flags pmd_flags; }; @@ -407,6 +411,29 @@ struct eth_tx_3rd_bd { struct eth_tx_data_3rd_bd data; }; +/* The parsing information data for the forth tx bd of a given packet. */ +struct eth_tx_data_4th_bd { + u8 dst_vport_id; + u8 reserved4; + __le16 bitfields; +#define ETH_TX_DATA_4TH_BD_DST_VPORT_ID_VALID_MASK 0x1 +#define ETH_TX_DATA_4TH_BD_DST_VPORT_ID_VALID_SHIFT 0 +#define ETH_TX_DATA_4TH_BD_RESERVED1_MASK 0x7F +#define ETH_TX_DATA_4TH_BD_RESERVED1_SHIFT 1 +#define ETH_TX_DATA_4TH_BD_START_BD_MASK 0x1 +#define ETH_TX_DATA_4TH_BD_START_BD_SHIFT 8 +#define ETH_TX_DATA_4TH_BD_RESERVED2_MASK 0x7F +#define ETH_TX_DATA_4TH_BD_RESERVED2_SHIFT 9 + __le16 reserved3; +}; + +/* The forth tx bd of a given packet */ +struct eth_tx_4th_bd { + struct regpair addr; /* Single continuous buffer */ + __le16 nbytes; /* Number of bytes in this BD */ + struct eth_tx_data_4th_bd data; /* Parsing information data */ +}; + /* Complementary information for the regular tx bd of a given packet */ struct eth_tx_data_bd { __le16 reserved0; @@ -431,6 +458,7 @@ union eth_tx_bd_types { struct eth_tx_1st_bd first_bd; struct eth_tx_2nd_bd second_bd; struct eth_tx_3rd_bd third_bd; + struct eth_tx_4th_bd fourth_bd; struct eth_tx_bd reg_bd; }; @@ -443,6 +471,12 @@ enum eth_tx_tunn_type { MAX_ETH_TX_TUNN_TYPE }; +/* Mstorm Queue Zone */ +struct mstorm_eth_queue_zone { + struct eth_rx_prod_data rx_producers; + __le32 reserved[3]; +}; + /* Ystorm Queue Zone */ struct xstorm_eth_queue_zone { struct coalescing_timeset int_coalescing_timeset; -- cgit v1.2.3 From 8a52bbab39c9791480cbae86c69ad0d47f62972e Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:17 +0200 Subject: qed: Debug feature: ilt and mdump Part of the FW drop includes new debug capabilities implemented in the qed_debug file. This patch dumps additional information during ethtool -d for better debugging. The data dumped is the ilt (internal logical table) and information gathered by the management firmware incase there was a crash and driver was not able to extract the information (mdump). Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/qed_if.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index 9bcb2f419004..1b27c22d39af 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -159,6 +159,7 @@ struct qed_dcbx_get { enum qed_nvm_images { QED_NVM_IMAGE_ISCSI_CFG, QED_NVM_IMAGE_FCOE_CFG, + QED_NVM_IMAGE_MDUMP, QED_NVM_IMAGE_NVM_CFG1, QED_NVM_IMAGE_DEFAULT_CFG, QED_NVM_IMAGE_NVM_META, -- cgit v1.2.3 From 2d22bc8354b15abe413dff76cfe0f7aeb88ef9aa Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Mon, 27 Jan 2020 15:26:19 +0200 Subject: qed: FW 8.42.2.0 debug features Add to debug dump more information on the platform it was collected from (pci func, path id). Provide human readable reg fifo erros. Removed static debug arrays from HSI Functions, and move them to the hwfn. Some structures were slightly changed (removing reserved chip id for example) which lead to many long initializations being modified with one parameter less during initialization. This leads to some long diffs that don't really change anything. Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller --- include/linux/qed/qed_if.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h index 1b27c22d39af..8f29e0d8a7b3 100644 --- a/include/linux/qed/qed_if.h +++ b/include/linux/qed/qed_if.h @@ -1178,6 +1178,15 @@ struct qed_common_ops { #define GET_FIELD(value, name) \ (((value) >> (name ## _SHIFT)) & name ## _MASK) +#define GET_MFW_FIELD(name, field) \ + (((name) & (field ## _MASK)) >> (field ## _OFFSET)) + +#define SET_MFW_FIELD(name, field, value) \ + do { \ + (name) &= ~(field ## _MASK); \ + (name) |= (((value) << (field ## _OFFSET)) & (field ## _MASK));\ + } while (0) + #define DB_ADDR_SHIFT(addr) ((addr) << DB_PWM_ADDR_OFFSET_SHIFT) /* Debug print definitions */ -- cgit v1.2.3 From 6cd021a58c18a1731f7e47f83e172c0c302d65e5 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Mon, 27 Jan 2020 15:40:31 -0500 Subject: udp: segment looped gso packets correctly Multicast and broadcast packets can be looped from egress to ingress pre segmentation with dev_loopback_xmit. That function unconditionally sets ip_summed to CHECKSUM_UNNECESSARY. udp_rcv_segment segments gso packets in the udp rx path. Segmentation usually executes on egress, and does not expect packets of this type. __udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But the offsets are not correct for gso_make_checksum. UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set to the correct pseudo header checksum. Reset ip_summed to this type. (CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h) Reported-by: syzbot Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/net/udp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/udp.h b/include/net/udp.h index 44e0e52b585c..4a180f2a13e3 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -476,6 +476,9 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, if (!inet_get_convert_csum(sk)) features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM; + if (skb->pkt_type == PACKET_LOOPBACK) + skb->ip_summed = CHECKSUM_PARTIAL; + /* the GSO CB lays after the UDP one, no need to save and restore any * CB fragment */ -- cgit v1.2.3