From e7096c131e5161fa3b8e52a650d7719d2857adfd Mon Sep 17 00:00:00 2001
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Mon, 9 Dec 2019 00:27:34 +0100
Subject: net: WireGuard secure network tunnel

WireGuard is a layer 3 secure networking tunnel made specifically for
the kernel, that aims to be much simpler and easier to audit than IPsec.
Extensive documentation and description of the protocol and
considerations, along with formal proofs of the cryptography, are
available at:

  * https://www.wireguard.com/
  * https://www.wireguard.com/papers/wireguard.pdf

This commit implements WireGuard as a simple network device driver,
accessible in the usual RTNL way used by virtual network drivers. It
makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
networking subsystem APIs. It has a somewhat novel multicore queueing
system designed for maximum throughput and minimal latency of encryption
operations, but it is implemented modestly using workqueues and NAPI.
Configuration is done via generic Netlink, and following a review from
the Netlink maintainer a year ago, several high profile userspace tools
have already implemented the API.

This commit also comes with several different tests, both in-kernel
tests and out-of-kernel tests based on network namespaces, taking profit
of the fact that sockets used by WireGuard intentionally stay in the
namespace the WireGuard interface was originally created, exactly like
the semantics of userspace tun devices. See wireguard.com/netns/ for
pictures and examples.

The source code is fairly short, but rather than combining everything
into a single file, WireGuard is developed as cleanly separable files,
making auditing and comprehension easier. Things are laid out as
follows:

  * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
    cryptographic aspects of the protocol, and are mostly data-only in
    nature, taking in buffers of bytes and spitting out buffers of
    bytes. They also handle reference counting for their various shared
    pieces of data, like keys and key lists.

  * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
    ratelimiting certain types of cryptographic operations in accordance
    with particular WireGuard semantics.

  * allowedips.[ch], peerlookup.[ch]: The main lookup structures of
    WireGuard, the former being trie-like with particular semantics, an
    integral part of the design of the protocol, and the latter just
    being nice helper functions around the various hashtables we use.

  * device.[ch]: Implementation of functions for the netdevice and for
    rtnl, responsible for maintaining the life of a given interface and
    wiring it up to the rest of WireGuard.

  * peer.[ch]: Each interface has a list of peers, with helper functions
    available here for creation, destruction, and reference counting.

  * socket.[ch]: Implementation of functions related to udp_socket and
    the general set of kernel socket APIs, for sending and receiving
    ciphertext UDP packets, and taking care of WireGuard-specific sticky
    socket routing semantics for the automatic roaming.

  * netlink.[ch]: Userspace API entry point for configuring WireGuard
    peers and devices. The API has been implemented by several userspace
    tools and network management utility, and the WireGuard project
    distributes the basic wg(8) tool.

  * queueing.[ch]: Shared function on the rx and tx path for handling
    the various queues used in the multicore algorithms.

  * send.c: Handles encrypting outgoing packets in parallel on
    multiple cores, before sending them in order on a single core, via
    workqueues and ring buffers. Also handles sending handshake and cookie
    messages as part of the protocol, in parallel.

  * receive.c: Handles decrypting incoming packets in parallel on
    multiple cores, before passing them off in order to be ingested via
    the rest of the networking subsystem with GRO via the typical NAPI
    poll function. Also handles receiving handshake and cookie messages
    as part of the protocol, in parallel.

  * timers.[ch]: Uses the timer wheel to implement protocol particular
    event timeouts, and gives a set of very simple event-driven entry
    point functions for callers.

  * main.c, version.h: Initialization and deinitialization of the module.

  * selftest/*.h: Runtime unit tests for some of the most security
    sensitive functions.

  * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
    script using network namespaces.

This commit aims to be as self-contained as possible, implementing
WireGuard as a standalone module not needing much special handling or
coordination from the network subsystem. I expect for future
optimizations to the network stack to positively improve WireGuard, and
vice-versa, but for the time being, this exists as intentionally
standalone.

We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/wireguard.h | 196 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 196 insertions(+)
 create mode 100644 include/uapi/linux/wireguard.h

(limited to 'include')

diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h
new file mode 100644
index 000000000000..dd8a47c4ad11
--- /dev/null
+++ b/include/uapi/linux/wireguard.h
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
+/*
+ * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Documentation
+ * =============
+ *
+ * The below enums and macros are for interfacing with WireGuard, using generic
+ * netlink, with family WG_GENL_NAME and version WG_GENL_VERSION. It defines two
+ * methods: get and set. Note that while they share many common attributes,
+ * these two functions actually accept a slightly different set of inputs and
+ * outputs.
+ *
+ * WG_CMD_GET_DEVICE
+ * -----------------
+ *
+ * May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command should contain
+ * one but not both of:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *
+ * The kernel will then return several messages (NLM_F_MULTI) containing the
+ * following tree of nested items:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
+ *    WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
+ *    WGDEVICE_A_LISTEN_PORT: NLA_U16
+ *    WGDEVICE_A_FWMARK: NLA_U32
+ *    WGDEVICE_A_PEERS: NLA_NESTED
+ *        0: NLA_NESTED
+ *            WGPEER_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
+ *            WGPEER_A_PRESHARED_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
+ *            WGPEER_A_ENDPOINT: NLA_MIN_LEN(struct sockaddr), struct sockaddr_in or struct sockaddr_in6
+ *            WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16
+ *            WGPEER_A_LAST_HANDSHAKE_TIME: NLA_EXACT_LEN, struct __kernel_timespec
+ *            WGPEER_A_RX_BYTES: NLA_U64
+ *            WGPEER_A_TX_BYTES: NLA_U64
+ *            WGPEER_A_ALLOWEDIPS: NLA_NESTED
+ *                0: NLA_NESTED
+ *                    WGALLOWEDIP_A_FAMILY: NLA_U16
+ *                    WGALLOWEDIP_A_IPADDR: NLA_MIN_LEN(struct in_addr), struct in_addr or struct in6_addr
+ *                    WGALLOWEDIP_A_CIDR_MASK: NLA_U8
+ *                0: NLA_NESTED
+ *                    ...
+ *                0: NLA_NESTED
+ *                    ...
+ *                ...
+ *            WGPEER_A_PROTOCOL_VERSION: NLA_U32
+ *        0: NLA_NESTED
+ *            ...
+ *        ...
+ *
+ * It is possible that all of the allowed IPs of a single peer will not
+ * fit within a single netlink message. In that case, the same peer will
+ * be written in the following message, except it will only contain
+ * WGPEER_A_PUBLIC_KEY and WGPEER_A_ALLOWEDIPS. This may occur several
+ * times in a row for the same peer. It is then up to the receiver to
+ * coalesce adjacent peers. Likewise, it is possible that all peers will
+ * not fit within a single message. So, subsequent peers will be sent
+ * in following messages, except those will only contain WGDEVICE_A_IFNAME
+ * and WGDEVICE_A_PEERS. It is then up to the receiver to coalesce these
+ * messages to form the complete list of peers.
+ *
+ * Since this is an NLA_F_DUMP command, the final message will always be
+ * NLMSG_DONE, even if an error occurs. However, this NLMSG_DONE message
+ * contains an integer error code. It is either zero or a negative error
+ * code corresponding to the errno.
+ *
+ * WG_CMD_SET_DEVICE
+ * -----------------
+ *
+ * May only be called via NLM_F_REQUEST. The command should contain the
+ * following tree of nested items, containing one but not both of
+ * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current
+ *                      peers should be removed prior to adding the list below.
+ *    WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove
+ *    WGDEVICE_A_LISTEN_PORT: NLA_U16, 0 to choose randomly
+ *    WGDEVICE_A_FWMARK: NLA_U32, 0 to disable
+ *    WGDEVICE_A_PEERS: NLA_NESTED
+ *        0: NLA_NESTED
+ *            WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN
+ *            WGPEER_A_FLAGS: NLA_U32, 0 and/or WGPEER_F_REMOVE_ME if the
+ *                            specified peer should not exist at the end of the
+ *                            operation, rather than added/updated and/or
+ *                            WGPEER_F_REPLACE_ALLOWEDIPS if all current allowed
+ *                            IPs of this peer should be removed prior to adding
+ *                            the list below and/or WGPEER_F_UPDATE_ONLY if the
+ *                            peer should only be set if it already exists.
+ *            WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN, all zeros to remove
+ *            WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6
+ *            WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16, 0 to disable
+ *            WGPEER_A_ALLOWEDIPS: NLA_NESTED
+ *                0: NLA_NESTED
+ *                    WGALLOWEDIP_A_FAMILY: NLA_U16
+ *                    WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr
+ *                    WGALLOWEDIP_A_CIDR_MASK: NLA_U8
+ *                0: NLA_NESTED
+ *                    ...
+ *                0: NLA_NESTED
+ *                    ...
+ *                ...
+ *            WGPEER_A_PROTOCOL_VERSION: NLA_U32, should not be set or used at
+ *                                       all by most users of this API, as the
+ *                                       most recent protocol will be used when
+ *                                       this is unset. Otherwise, must be set
+ *                                       to 1.
+ *        0: NLA_NESTED
+ *            ...
+ *        ...
+ *
+ * It is possible that the amount of configuration data exceeds that of
+ * the maximum message length accepted by the kernel. In that case, several
+ * messages should be sent one after another, with each successive one
+ * filling in information not contained in the prior. Note that if
+ * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably
+ * should not be specified in fragments that come after, so that the list
+ * of peers is only cleared the first time but appened after. Likewise for
+ * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message
+ * of a peer, it likely should not be specified in subsequent fragments.
+ *
+ * If an error occurs, NLMSG_ERROR will reply containing an errno.
+ */
+
+#ifndef _WG_UAPI_WIREGUARD_H
+#define _WG_UAPI_WIREGUARD_H
+
+#define WG_GENL_NAME "wireguard"
+#define WG_GENL_VERSION 1
+
+#define WG_KEY_LEN 32
+
+enum wg_cmd {
+	WG_CMD_GET_DEVICE,
+	WG_CMD_SET_DEVICE,
+	__WG_CMD_MAX
+};
+#define WG_CMD_MAX (__WG_CMD_MAX - 1)
+
+enum wgdevice_flag {
+	WGDEVICE_F_REPLACE_PEERS = 1U << 0,
+	__WGDEVICE_F_ALL = WGDEVICE_F_REPLACE_PEERS
+};
+enum wgdevice_attribute {
+	WGDEVICE_A_UNSPEC,
+	WGDEVICE_A_IFINDEX,
+	WGDEVICE_A_IFNAME,
+	WGDEVICE_A_PRIVATE_KEY,
+	WGDEVICE_A_PUBLIC_KEY,
+	WGDEVICE_A_FLAGS,
+	WGDEVICE_A_LISTEN_PORT,
+	WGDEVICE_A_FWMARK,
+	WGDEVICE_A_PEERS,
+	__WGDEVICE_A_LAST
+};
+#define WGDEVICE_A_MAX (__WGDEVICE_A_LAST - 1)
+
+enum wgpeer_flag {
+	WGPEER_F_REMOVE_ME = 1U << 0,
+	WGPEER_F_REPLACE_ALLOWEDIPS = 1U << 1,
+	WGPEER_F_UPDATE_ONLY = 1U << 2,
+	__WGPEER_F_ALL = WGPEER_F_REMOVE_ME | WGPEER_F_REPLACE_ALLOWEDIPS |
+			 WGPEER_F_UPDATE_ONLY
+};
+enum wgpeer_attribute {
+	WGPEER_A_UNSPEC,
+	WGPEER_A_PUBLIC_KEY,
+	WGPEER_A_PRESHARED_KEY,
+	WGPEER_A_FLAGS,
+	WGPEER_A_ENDPOINT,
+	WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL,
+	WGPEER_A_LAST_HANDSHAKE_TIME,
+	WGPEER_A_RX_BYTES,
+	WGPEER_A_TX_BYTES,
+	WGPEER_A_ALLOWEDIPS,
+	WGPEER_A_PROTOCOL_VERSION,
+	__WGPEER_A_LAST
+};
+#define WGPEER_A_MAX (__WGPEER_A_LAST - 1)
+
+enum wgallowedip_attribute {
+	WGALLOWEDIP_A_UNSPEC,
+	WGALLOWEDIP_A_FAMILY,
+	WGALLOWEDIP_A_IPADDR,
+	WGALLOWEDIP_A_CIDR_MASK,
+	__WGALLOWEDIP_A_LAST
+};
+#define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1)
+
+#endif /* _WG_UAPI_WIREGUARD_H */
-- 
cgit v1.2.3


From b50b0580d27bc45a0637aefc8bac4d31aa85771a Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon, 25 Nov 2019 14:48:57 +0100
Subject: net: add queue argument to __skb_wait_for_more_packets and
 __skb_{,try_}recv_datagram

This will be used by ESP over TCP to handle the queue of IKE messages.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/skbuff.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e9133bcf0544..49a10f9cc538 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3459,7 +3459,8 @@ static inline void skb_frag_list_init(struct sk_buff *skb)
 	for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next)
 
 
-int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
+int __skb_wait_for_more_packets(struct sock *sk, struct sk_buff_head *queue,
+				int *err, long *timeo_p,
 				const struct sk_buff *skb);
 struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
 					  struct sk_buff_head *queue,
@@ -3468,12 +3469,16 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
 							   struct sk_buff *skb),
 					  int *off, int *err,
 					  struct sk_buff **last);
-struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags,
+struct sk_buff *__skb_try_recv_datagram(struct sock *sk,
+					struct sk_buff_head *queue,
+					unsigned int flags,
 					void (*destructor)(struct sock *sk,
 							   struct sk_buff *skb),
 					int *off, int *err,
 					struct sk_buff **last);
-struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
+struct sk_buff *__skb_recv_datagram(struct sock *sk,
+				    struct sk_buff_head *sk_queue,
+				    unsigned int flags,
 				    void (*destructor)(struct sock *sk,
 						       struct sk_buff *skb),
 				    int *off, int *err);
-- 
cgit v1.2.3


From 7b3801927e52f8621de311277f7fc727635019e7 Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon, 25 Nov 2019 14:48:58 +0100
Subject: xfrm: introduce xfrm_trans_queue_net

This will be used by TCP encapsulation to write packets to the encap
socket without holding the user socket's lock. Without this reinjection,
we're already holding the lock of the user socket, and then try to lock
the encap socket as well when we enqueue the encrypted packet.

While at it, add a BUILD_BUG_ON like we usually do for skb->cb, since
it's missing for struct xfrm_trans_cb.

Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index dda3c025452e..56ff86621bb4 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1547,6 +1547,9 @@ int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload);
 int xfrm_init_state(struct xfrm_state *x);
 int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
 int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
+int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
+			 int (*finish)(struct net *, struct sock *,
+				       struct sk_buff *));
 int xfrm_trans_queue(struct sk_buff *skb,
 		     int (*finish)(struct net *, struct sock *,
 				   struct sk_buff *));
-- 
cgit v1.2.3


From e27cca96cd68fa2c6814c90f9a1cfd36bb68c593 Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon, 25 Nov 2019 14:49:02 +0100
Subject: xfrm: add espintcp (RFC 8229)

TCP encapsulation of IKE and IPsec messages (RFC 8229) is implemented
as a TCP ULP, overriding in particular the sendmsg and recvmsg
operations. A Stream Parser is used to extract messages out of the TCP
stream using the first 2 bytes as length marker. Received IKE messages
are put on "ike_queue", waiting to be dequeued by the custom recvmsg
implementation. Received ESP messages are sent to XFRM, like with UDP
encapsulation.

Some of this code is taken from the original submission by Herbert
Xu. Currently, only IPv4 is supported, like for UDP encapsulation.

Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/espintcp.h   | 39 +++++++++++++++++++++++++++++++++++++++
 include/net/xfrm.h       |  1 +
 include/uapi/linux/udp.h |  1 +
 3 files changed, 41 insertions(+)
 create mode 100644 include/net/espintcp.h

(limited to 'include')

diff --git a/include/net/espintcp.h b/include/net/espintcp.h
new file mode 100644
index 000000000000..dd7026a00066
--- /dev/null
+++ b/include/net/espintcp.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NET_ESPINTCP_H
+#define _NET_ESPINTCP_H
+
+#include <net/strparser.h>
+#include <linux/skmsg.h>
+
+void __init espintcp_init(void);
+
+int espintcp_push_skb(struct sock *sk, struct sk_buff *skb);
+int espintcp_queue_out(struct sock *sk, struct sk_buff *skb);
+bool tcp_is_ulp_esp(struct sock *sk);
+
+struct espintcp_msg {
+	struct sk_buff *skb;
+	struct sk_msg skmsg;
+	int offset;
+	int len;
+};
+
+struct espintcp_ctx {
+	struct strparser strp;
+	struct sk_buff_head ike_queue;
+	struct sk_buff_head out_queue;
+	struct espintcp_msg partial;
+	void (*saved_data_ready)(struct sock *sk);
+	void (*saved_write_space)(struct sock *sk);
+	struct work_struct work;
+	bool tx_running;
+};
+
+static inline struct espintcp_ctx *espintcp_getctx(const struct sock *sk)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	/* RCU is only needed for diag */
+	return (__force void *)icsk->icsk_ulp_data;
+}
+#endif
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 56ff86621bb4..8f71c111e65a 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -193,6 +193,7 @@ struct xfrm_state {
 
 	/* Data for encapsulator */
 	struct xfrm_encap_tmpl	*encap;
+	struct sock __rcu	*encap_sk;
 
 	/* Data for care-of address */
 	xfrm_address_t	*coaddr;
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 30baccb6c9c4..4828794efcf8 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -42,5 +42,6 @@ struct udphdr {
 #define UDP_ENCAP_GTP0		4 /* GSM TS 09.60 */
 #define UDP_ENCAP_GTP1U		5 /* 3GPP TS 29.060 */
 #define UDP_ENCAP_RXRPC		6
+#define TCP_ENCAP_ESPINTCP	7 /* Yikes, this is really xfrm encap types. */
 
 #endif /* _UAPI_LINUX_UDP_H */
-- 
cgit v1.2.3


From 65e6d90168f3593df0ae598502bcbf20d78ff0fb Mon Sep 17 00:00:00 2001
From: "Kevin(Yudong) Yang" <yyd@google.com>
Date: Mon, 9 Dec 2019 14:19:59 -0500
Subject: net-tcp: Disable TCP ssthresh metrics cache by default

This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save"
that disables TCP ssthresh metrics cache by default. Other parts of TCP
metrics cache, e.g. rtt, cwnd, remain unchanged.

As modern networks becoming more and more dynamic, TCP metrics cache
today often causes more harm than benefits. For example, the same IP
address is often shared by different subscribers behind NAT in residential
networks. Even if the IP address is not shared by different users,
caching the slow-start threshold of a previous short flow using loss-based
congestion control (e.g. cubic) often causes the future longer flows of
the same network path to exit slow-start prematurely with abysmal
throughput.

Caching ssthresh is very risky and can lead to terrible performance.
Therefore it makes sense to make disabling ssthresh caching by
default and opt-in for specific networks by the administrators.
This practice also has worked well for several years of deployment with
CUBIC congestion control at Google.

Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/netns/ipv4.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c0c0791b1912..08b98414d94e 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -154,6 +154,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_adv_win_scale;
 	int sysctl_tcp_frto;
 	int sysctl_tcp_nometrics_save;
+	int sysctl_tcp_no_ssthresh_metrics_save;
 	int sysctl_tcp_moderate_rcvbuf;
 	int sysctl_tcp_tso_win_divisor;
 	int sysctl_tcp_workaround_signed_windows;
-- 
cgit v1.2.3


From bae141f54be83b06652c1d47e50e4e75ed4e9c7e Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri, 6 Dec 2019 22:49:34 +0100
Subject: bpf: Emit audit messages upon successful prog load and unload

Allow for audit messages to be emitted upon BPF program load and
unload for having a timeline of events. The load itself is in
syscall context, so additional info about the process initiating
the BPF prog creation can be logged and later directly correlated
to the unload event.

The only info really needed from BPF side is the globally unique
prog ID where then audit user space tooling can query / dump all
info needed about the specific BPF program right upon load event
and enrich the record, thus these changes needed here can be kept
small and non-intrusive to the core.

Raw example output:

  # auditctl -D
  # auditctl -a always,exit -F arch=x86_64 -S bpf
  # ausearch --start recent -m 1334
  ...
  ----
  time->Wed Nov 27 16:04:13 2019
  type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
  type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321   \
    success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477    \
    pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001    \
    egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf"                \
    exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf"                  \
    subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
  type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
  ----
  time->Wed Nov 27 16:04:13 2019
  type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
  ...

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
---
 include/uapi/linux/audit.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 3ad935527177..a534d71e689a 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -116,6 +116,7 @@
 #define AUDIT_FANOTIFY		1331	/* Fanotify access decision */
 #define AUDIT_TIME_INJOFFSET	1332	/* Timekeeping offset injected */
 #define AUDIT_TIME_ADJNTPVAL	1333	/* NTP value adjustment */
+#define AUDIT_BPF		1334	/* BPF subsystem */
 
 #define AUDIT_AVC		1400	/* SE Linux avc denial or grant */
 #define AUDIT_SELINUX_ERR	1401	/* Internal SE Linux Errors */
-- 
cgit v1.2.3


From a4516c7053b96fed98a0439a9226983b5275474b Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 11 Dec 2019 10:55:59 +0000
Subject: net: sfp: derive interface mode from ethtool link modes

We don't need the EEPROM ID to derive the phy interface mode as we can
derive it merely from the ethtool link modes.  Remove the EEPROM ID
argument to sfp_select_interface().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/sfp.h | 2 --
 1 file changed, 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 487fd9412d10..8d7b98c214d7 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -504,7 +504,6 @@ int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
 void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
 		       unsigned long *support);
 phy_interface_t sfp_select_interface(struct sfp_bus *bus,
-				     const struct sfp_eeprom_id *id,
 				     unsigned long *link_modes);
 
 int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo);
@@ -532,7 +531,6 @@ static inline void sfp_parse_support(struct sfp_bus *bus,
 }
 
 static inline phy_interface_t sfp_select_interface(struct sfp_bus *bus,
-						   const struct sfp_eeprom_id *id,
 						   unsigned long *link_modes)
 {
 	return PHY_INTERFACE_MODE_NA;
-- 
cgit v1.2.3


From 0fbd26a9fb6875b98fcfff523831fec47bc5e9a2 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 11 Dec 2019 10:56:04 +0000
Subject: net: sfp: add more extended compliance codes

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.  Split these constants from the enum, and
rename them to indicate that they're defined by SFF-8024.

Add and use updated SFF-8024 extended compliance code definitions for
10GBASE-T, 5GBASE-T and 2.5GBASE-T modules.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/sfp.h | 82 +++++++++++++++++++++++++++++++++++------------------
 1 file changed, 55 insertions(+), 27 deletions(-)

(limited to 'include')

diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 8d7b98c214d7..373d8b67ea86 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -275,6 +275,61 @@ struct sfp_diag {
 	__be16 cal_v_offset;
 } __packed;
 
+/* SFF8024 defined constants */
+enum {
+	SFF8024_ID_UNK			= 0x00,
+	SFF8024_ID_SFF_8472		= 0x02,
+	SFF8024_ID_SFP			= 0x03,
+	SFF8024_ID_DWDM_SFP		= 0x0b,
+	SFF8024_ID_QSFP_8438		= 0x0c,
+	SFF8024_ID_QSFP_8436_8636	= 0x0d,
+	SFF8024_ID_QSFP28_8636		= 0x11,
+
+	SFF8024_ENCODING_UNSPEC		= 0x00,
+	SFF8024_ENCODING_8B10B		= 0x01,
+	SFF8024_ENCODING_4B5B		= 0x02,
+	SFF8024_ENCODING_NRZ		= 0x03,
+	SFF8024_ENCODING_8472_MANCHESTER= 0x04,
+	SFF8024_ENCODING_8472_SONET	= 0x05,
+	SFF8024_ENCODING_8472_64B66B	= 0x06,
+	SFF8024_ENCODING_8436_MANCHESTER= 0x06,
+	SFF8024_ENCODING_8436_SONET	= 0x04,
+	SFF8024_ENCODING_8436_64B66B	= 0x05,
+	SFF8024_ENCODING_256B257B	= 0x07,
+	SFF8024_ENCODING_PAM4		= 0x08,
+
+	SFF8024_CONNECTOR_UNSPEC	= 0x00,
+	/* codes 01-05 not supportable on SFP, but some modules have single SC */
+	SFF8024_CONNECTOR_SC		= 0x01,
+	SFF8024_CONNECTOR_FIBERJACK	= 0x06,
+	SFF8024_CONNECTOR_LC		= 0x07,
+	SFF8024_CONNECTOR_MT_RJ		= 0x08,
+	SFF8024_CONNECTOR_MU		= 0x09,
+	SFF8024_CONNECTOR_SG		= 0x0a,
+	SFF8024_CONNECTOR_OPTICAL_PIGTAIL= 0x0b,
+	SFF8024_CONNECTOR_MPO_1X12	= 0x0c,
+	SFF8024_CONNECTOR_MPO_2X16	= 0x0d,
+	SFF8024_CONNECTOR_HSSDC_II	= 0x20,
+	SFF8024_CONNECTOR_COPPER_PIGTAIL= 0x21,
+	SFF8024_CONNECTOR_RJ45		= 0x22,
+	SFF8024_CONNECTOR_NOSEPARATE	= 0x23,
+	SFF8024_CONNECTOR_MXC_2X16	= 0x24,
+
+	SFF8024_ECC_UNSPEC		= 0x00,
+	SFF8024_ECC_100G_25GAUI_C2M_AOC	= 0x01,
+	SFF8024_ECC_100GBASE_SR4_25GBASE_SR = 0x02,
+	SFF8024_ECC_100GBASE_LR4_25GBASE_LR = 0x03,
+	SFF8024_ECC_100GBASE_ER4_25GBASE_ER = 0x04,
+	SFF8024_ECC_100GBASE_SR10	= 0x05,
+	SFF8024_ECC_100GBASE_CR4	= 0x0b,
+	SFF8024_ECC_25GBASE_CR_S	= 0x0c,
+	SFF8024_ECC_25GBASE_CR_N	= 0x0d,
+	SFF8024_ECC_10GBASE_T_SFI	= 0x16,
+	SFF8024_ECC_10GBASE_T_SR	= 0x1c,
+	SFF8024_ECC_5GBASE_T		= 0x1d,
+	SFF8024_ECC_2_5GBASE_T		= 0x1e,
+};
+
 /* SFP EEPROM registers */
 enum {
 	SFP_PHYS_ID			= 0x00,
@@ -309,34 +364,7 @@ enum {
 	SFP_SFF8472_COMPLIANCE		= 0x5e,
 	SFP_CC_EXT			= 0x5f,
 
-	SFP_PHYS_ID_SFF			= 0x02,
-	SFP_PHYS_ID_SFP			= 0x03,
 	SFP_PHYS_EXT_ID_SFP		= 0x04,
-	SFP_CONNECTOR_UNSPEC		= 0x00,
-	/* codes 01-05 not supportable on SFP, but some modules have single SC */
-	SFP_CONNECTOR_SC		= 0x01,
-	SFP_CONNECTOR_FIBERJACK		= 0x06,
-	SFP_CONNECTOR_LC		= 0x07,
-	SFP_CONNECTOR_MT_RJ		= 0x08,
-	SFP_CONNECTOR_MU		= 0x09,
-	SFP_CONNECTOR_SG		= 0x0a,
-	SFP_CONNECTOR_OPTICAL_PIGTAIL	= 0x0b,
-	SFP_CONNECTOR_MPO_1X12		= 0x0c,
-	SFP_CONNECTOR_MPO_2X16		= 0x0d,
-	SFP_CONNECTOR_HSSDC_II		= 0x20,
-	SFP_CONNECTOR_COPPER_PIGTAIL	= 0x21,
-	SFP_CONNECTOR_RJ45		= 0x22,
-	SFP_CONNECTOR_NOSEPARATE	= 0x23,
-	SFP_CONNECTOR_MXC_2X16		= 0x24,
-	SFP_ENCODING_UNSPEC		= 0x00,
-	SFP_ENCODING_8B10B		= 0x01,
-	SFP_ENCODING_4B5B		= 0x02,
-	SFP_ENCODING_NRZ		= 0x03,
-	SFP_ENCODING_8472_MANCHESTER	= 0x04,
-	SFP_ENCODING_8472_SONET		= 0x05,
-	SFP_ENCODING_8472_64B66B	= 0x06,
-	SFP_ENCODING_256B257B		= 0x07,
-	SFP_ENCODING_PAM4		= 0x08,
 	SFP_OPTIONS_HIGH_POWER_LEVEL	= BIT(13),
 	SFP_OPTIONS_PAGING_A2		= BIT(12),
 	SFP_OPTIONS_RETIMER		= BIT(11),
-- 
cgit v1.2.3


From 74c551ca5a0edcc9cf66a3b73fd95b9a8615bfd0 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 11 Dec 2019 10:56:09 +0000
Subject: net: sfp: add module start/stop upstream notifications

When dealing with some copper modules, we can't positively know the
module capabilities are until we have probed the PHY. Without the full
capabilities, we may end up failing a module that we could otherwise
drive with a restricted set of capabilities.

An example of this would be a module with a NBASE-T PHY plugged into
a host that supports phy interface modes 2500BASE-X and SGMII. The
PHY supports 10GBASE-R, 5000BASE-X, 2500BASE-X, SGMII interface modes,
which means a subset of the capabilities are compatible with the host.

However, reading the module EEPROM leads us to believe that the module
only supports ethtool link mode 10GBASE-T, which is incompatible with
the host - and thus results in the module being rejected.

This patch adds an extra notification which are triggered after the
SFP module's PHY probe, and a corresponding notification just before
the PHY is removed.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/sfp.h | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 373d8b67ea86..66a56396e8e3 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -507,6 +507,8 @@ struct sfp_bus;
  * @module_insert: called after a module has been detected to determine
  *   whether the module is supported for the upstream device.
  * @module_remove: called after the module has been removed.
+ * @module_start: called after the PHY probe step
+ * @module_stop: called before the PHY is removed
  * @link_down: called when the link is non-operational for whatever
  *   reason.
  * @link_up: called when the link is operational.
@@ -520,6 +522,8 @@ struct sfp_upstream_ops {
 	void (*detach)(void *priv, struct sfp_bus *bus);
 	int (*module_insert)(void *priv, const struct sfp_eeprom_id *id);
 	void (*module_remove)(void *priv);
+	int (*module_start)(void *priv);
+	void (*module_stop)(void *priv);
 	void (*link_down)(void *priv);
 	void (*link_up)(void *priv);
 	int (*connect_phy)(void *priv, struct phy_device *);
-- 
cgit v1.2.3


From 52c956003a9d5bcae1f445f9dfd42b624adb6e87 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 11 Dec 2019 10:56:45 +0000
Subject: net: phylink: delay MAC configuration for copper SFP modules

Knowing whether we need to delay the MAC configuration because a module
may have a PHY is useful to phylink to allow NBASE-T modules to work on
systems supporting no more than 2.5G speeds.

This commit allows us to delay such configuration until after the PHY
has been probed by recording the parsed capabilities, and if the module
may have a PHY, doing no more until the module_start() notification is
called.  At that point, we either have a PHY, or we don't.

We move the PHY-based setup a little later, and use the PHYs support
capabilities rather than the EEPROM parsed capabilities to determine
whether we can support the PHY.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/sfp.h | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'include')

diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 66a56396e8e3..38893e4dd0f0 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -533,6 +533,7 @@ struct sfp_upstream_ops {
 #if IS_ENABLED(CONFIG_SFP)
 int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
 		   unsigned long *support);
+bool sfp_may_have_phy(struct sfp_bus *bus, const struct sfp_eeprom_id *id);
 void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
 		       unsigned long *support);
 phy_interface_t sfp_select_interface(struct sfp_bus *bus,
@@ -556,6 +557,12 @@ static inline int sfp_parse_port(struct sfp_bus *bus,
 	return PORT_OTHER;
 }
 
+static inline bool sfp_may_have_phy(struct sfp_bus *bus,
+				    const struct sfp_eeprom_id *id)
+{
+	return false;
+}
+
 static inline void sfp_parse_support(struct sfp_bus *bus,
 				     const struct sfp_eeprom_id *id,
 				     unsigned long *support)
-- 
cgit v1.2.3


From ef343b35d46667668a099655fca4a5b2e43a5dfe Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 10 Dec 2019 11:43:03 +0100
Subject: vsock: add VMADDR_CID_LOCAL definition

The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not
used anymore, so we can reuse it for local communication
(loopback) adding the new well-know CID: VMADDR_CID_LOCAL.

Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/vm_sockets.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index 68d57c5e99bc..fd0ed7221645 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -99,11 +99,13 @@
 
 #define VMADDR_CID_HYPERVISOR 0
 
-/* This CID is specific to VMCI and can be considered reserved (even VMCI
- * doesn't use it anymore, it's a legacy value from an older release).
+/* Use this as the destination CID in an address when referring to the
+ * local communication (loopback).
+ * (This was VMADDR_CID_RESERVED, but even VMCI doesn't use it anymore,
+ * it was a legacy value from an older release).
  */
 
-#define VMADDR_CID_RESERVED 1
+#define VMADDR_CID_LOCAL 1
 
 /* Use this as the destination CID in an address when referring to the host
  * (any process other than the hypervisor).  VMCI relies on it being 2, but
-- 
cgit v1.2.3


From 0e12190578d081d4fe54d41635ec6e5a6eb0d01a Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Tue, 10 Dec 2019 11:43:04 +0100
Subject: vsock: add local transport support in the vsock core

This patch allows to register a transport able to handle
local communication (loopback).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/af_vsock.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 4206dc6d813f..b1c717286993 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -98,6 +98,8 @@ struct vsock_transport_send_notify_data {
 #define VSOCK_TRANSPORT_F_G2H		0x00000002
 /* Transport provides DGRAM communication */
 #define VSOCK_TRANSPORT_F_DGRAM		0x00000004
+/* Transport provides local (loopback) communication */
+#define VSOCK_TRANSPORT_F_LOCAL		0x00000008
 
 struct vsock_transport {
 	struct module *module;
-- 
cgit v1.2.3


From b4653342b1514cb11f25b727c689451aff02996d Mon Sep 17 00:00:00 2001
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Mon, 9 Dec 2019 13:03:40 +0300
Subject: net: Allow to show socket-specific information in
 /proc/[pid]/fdinfo/[fd]

This adds .show_fdinfo to socket_file_ops, so protocols will be able
to print their specific data in fdinfo.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/net.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/net.h b/include/linux/net.h
index 9cafb5f353a9..6451425e828f 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -171,6 +171,7 @@ struct proto_ops {
 	int		(*compat_getsockopt)(struct socket *sock, int level,
 				      int optname, char __user *optval, int __user *optlen);
 #endif
+	void		(*show_fdinfo)(struct seq_file *m, struct socket *sock);
 	int		(*sendmsg)   (struct socket *sock, struct msghdr *m,
 				      size_t total_len);
 	/* Notes for implementing recvmsg:
-- 
cgit v1.2.3


From 3c32da19a858fb1ae8a76bf899160be49f338506 Mon Sep 17 00:00:00 2001
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Mon, 9 Dec 2019 13:03:46 +0300
Subject: unix: Show number of pending scm files of receive queue in fdinfo

Unix sockets like a block box. You never know what is stored there:
there may be a file descriptor holding a mount or a block device,
or there may be whole universes with namespaces, sockets with receive
queues full of sockets etc.

The patch adds a little debug and accounts number of files (not recursive),
which is in receive queue of a unix socket. Sometimes this is useful
to determine, that socket should be investigated or which task should
be killed to put reference counter on a resourse.

v2: Pass correct argument to lockdep

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/af_unix.h | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'include')

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 3426d6dacc45..17e10fba2152 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -41,6 +41,10 @@ struct unix_skb_parms {
 	u32			consumed;
 } __randomize_layout;
 
+struct scm_stat {
+	u32 nr_fds;
+};
+
 #define UNIXCB(skb)	(*(struct unix_skb_parms *)&((skb)->cb))
 
 #define unix_state_lock(s)	spin_lock(&unix_sk(s)->lock)
@@ -65,6 +69,7 @@ struct unix_sock {
 #define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
 	wait_queue_entry_t	peer_wake;
+	struct scm_stat		scm_stat;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
-- 
cgit v1.2.3


From f74877a5457d34d604dba6dbbb13c4c05bac8b93 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Wed, 11 Dec 2019 10:58:14 +0100
Subject: rtnetlink: provide permanent hardware address in RTM_NEWLINK

Permanent hardware address of a network device was traditionally provided
via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
ethtool netlink interface, rtnetlink is much more suitable for it so let's
add it to the RTM_NEWLINK message.

Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
permanent address is all zeros (i.e. device driver did not fill it). As
permanent address is not modifiable, reject userspace requests containing
IFLA_PERM_ADDRESS attribute.

Note: we already provide permanent hardware address for bond slaves;
unfortunately we cannot drop that attribute for backward compatibility
reasons.

v5 -> v6: only add the attribute if permanent address is not zero

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_link.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8aec8769d944..1d69f637c5d6 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -169,6 +169,7 @@ enum {
 	IFLA_MAX_MTU,
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
+	IFLA_PERM_ADDRESS,
 	__IFLA_MAX
 };
 
-- 
cgit v1.2.3


From 32d5109a9d864aea3981f0b5ea736eee4e11b42a Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Wed, 11 Dec 2019 10:58:19 +0100
Subject: netlink: rename nl80211_validate_nested() to nla_validate_nested()

Function nl80211_validate_nested() is not specific to nl80211, it's
a counterpart to nla_validate_nested_deprecated() with strict validation.
For consistency with other validation and parse functions, rename it to
nla_validate_nested().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/netlink.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'include')

diff --git a/include/net/netlink.h b/include/net/netlink.h
index b140c8f1be22..56c365dc6dc7 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1735,7 +1735,7 @@ static inline void nla_nest_cancel(struct sk_buff *skb, struct nlattr *start)
 }
 
 /**
- * nla_validate_nested - Validate a stream of nested attributes
+ * __nla_validate_nested - Validate a stream of nested attributes
  * @start: container attribute
  * @maxtype: maximum attribute type to be expected
  * @policy: validation policy
@@ -1758,9 +1758,9 @@ static inline int __nla_validate_nested(const struct nlattr *start, int maxtype,
 }
 
 static inline int
-nl80211_validate_nested(const struct nlattr *start, int maxtype,
-			const struct nla_policy *policy,
-			struct netlink_ext_ack *extack)
+nla_validate_nested(const struct nlattr *start, int maxtype,
+		    const struct nla_policy *policy,
+		    struct netlink_ext_ack *extack)
 {
 	return __nla_validate_nested(start, maxtype, policy,
 				     NL_VALIDATE_STRICT, extack);
-- 
cgit v1.2.3


From 428c122f5f6b54f40bd51c47495104b534b5a57c Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Wed, 11 Dec 2019 10:58:34 +0100
Subject: ethtool: provide link mode names as a string set

Unlike e.g. netdev features, the ethtool ioctl interface requires link mode
table to be in sync between kernel and userspace for userspace to be able
to display and set all link modes supported by kernel. The way arbitrary
length bitsets are implemented in netlink interface, this will be no longer
needed.

To allow userspace to access all link modes running kernel supports, add
table of ethernet link mode names and make it available as a string set to
userspace GET_STRSET requests. Add build time check to make sure names
are defined for all modes declared in enum ethtool_link_mode_bit_indices.

Once the string set is available, make it also accessible via ioctl.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index d4591792f0b4..f44155840b07 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -593,6 +593,7 @@ struct ethtool_pauseparam {
  * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
  * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
  * @ETH_SS_PHY_TUNABLES: PHY tunable names
+ * @ETH_SS_LINK_MODES: link mode names
  */
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -604,6 +605,7 @@ enum ethtool_stringset {
 	ETH_SS_TUNABLES,
 	ETH_SS_PHY_STATS,
 	ETH_SS_PHY_TUNABLES,
+	ETH_SS_LINK_MODES,
 };
 
 /**
-- 
cgit v1.2.3


From 0290bd291cc0e0488e35e66bf39efcd7d9d9122b Mon Sep 17 00:00:00 2001
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 10 Dec 2019 09:23:51 -0500
Subject: netdev: pass the stuck queue to the timeout handler

This allows incrementing the correct timeout statistic without any mess.
Down the road, devices can learn to reset just the specific queue.

The patch was generated with the following script:

use strict;
use warnings;

our $^I = '.bak';

my @work = (
["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"],
["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"],
["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"],
["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"],
["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"],
["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"],
["drivers/net/appletalk/cops.c", "cops_timeout"],
["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"],
["drivers/net/arcnet/arcnet.c", "arcnet_timeout"],
["drivers/net/arcnet/com20020.c", "arcnet_timeout"],
["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"],
["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"],
["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"],
["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"],
["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"],
["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"],
["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"],
["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"],
["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"],
["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"],
["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"],
["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"],
["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"],
["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"],
["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"],
["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"],
["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"],
["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"],
["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"],
["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"],
["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"],
["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"],
["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"],
["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"],
["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"],
["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"],
["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"],
["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"],
["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"],
["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"],
["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"],
["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"],
["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"],
["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"],
["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"],
["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"],
["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"],
["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"],
["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"],
["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"],
["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"],
["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"],
["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"],
["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"],
["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"],
["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"],
["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"],
["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"],
["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"],
["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"],
["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"],
["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"],
["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"],
["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"],
["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"],
["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"],
["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"],
["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"],
["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"],
["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"],
["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"],
["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"],
["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"],
["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"],
["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"],
["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"],
["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"],
["drivers/net/ethernet/jme.c", "jme_tx_timeout"],
["drivers/net/ethernet/korina.c", "korina_tx_timeout"],
["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"],
["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"],
["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"],
["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"],
["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"],
["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"],
["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"],
["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"],
["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"],
["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"],
["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"],
["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"],
["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"],
["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"],
["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"],
["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"],
["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"],
["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"],
["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"],
["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"],
["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"],
["drivers/net/ethernet/realtek/atp.c", "tx_timeout"],
["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"],
["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"],
["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"],
["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"],
["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"],
["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"],
["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"],
["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"],
["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"],
["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"],
["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"],
["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"],
["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"],
["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"],
["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"],
["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"],
["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"],
["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"],
["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"],
["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"],
["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"],
["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"],
["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"],
["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"],
["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"],
["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"],
["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"],
["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"],
["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"],
["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"],
["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"],
["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"],
["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"],
["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"],
["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"],
["drivers/net/slip/slip.c", "sl_tx_timeout"],
["include/linux/usb/usbnet.h", "usbnet_tx_timeout"],
["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"],
["drivers/net/usb/catc.c", "catc_tx_timeout"],
["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"],
["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"],
["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"],
["drivers/net/usb/hso.c", "hso_net_tx_timeout"],
["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"],
["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"],
["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"],
["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"],
["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"],
["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"],
["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"],
["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"],
["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"],
["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"],
["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"],
["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"],
["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"],
["drivers/net/wan/cosa.c", "cosa_net_timeout"],
["drivers/net/wan/farsync.c", "fst_tx_timeout"],
["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"],
["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"],
["drivers/net/wan/x25_asy.c", "x25_asy_timeout"],
["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"],
["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"],
["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"],
["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"],
["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"],
["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"],
["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"],
["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"],
["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"],
["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"],
["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"],
["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"],
["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"],
["drivers/tty/synclink.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"],
["net/atm/lec.c", "lec_tx_timeout"],
["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"]
);

for my $p (@work) {
	my @pair = @$p;
	my $file = $pair[0];
	my $func = $pair[1];
	print STDERR $file , ": ", $func,"\n";
	our @ARGV = ($file);
	while (<ARGV>) {
		if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) {
			print STDERR "found $1+$2 in $file\n";
		}
		if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) {
			print STDERR "$func found in $file\n";
		}
		print;
	}
}

where the list of files and functions is simply from:

git grep ndo_tx_timeout, with manual addition of headers
in the rare cases where the function is from a header,
then manually changing the few places which actually
call ndo_tx_timeout.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Martin Habets <mhabets@solarflare.com>

changes from v9:
	fixup a forward declaration
changes from v9:
	more leftovers from v3 change
changes from v8:
        fix up a missing direct call to timeout
        rebased on net-next
changes from v7:
	fixup leftovers from v3 change
changes from v6:
	fix typo in rtl driver
changes from v5:
	add missing files (allow any net device argument name)
changes from v4:
	add a missing driver header
changes from v3:
        change queue # to unsigned
Changes from v2:
        added headers
Changes from v1:
        Fix errors found by kbuild:
        generalize the pattern a bit, to pick up
        a couple of instances missed by the previous
        version.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h  | 5 +++--
 include/linux/usb/usbnet.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9ef20389622d..30745068fb39 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1014,7 +1014,7 @@ int netdev_name_node_alt_destroy(struct net_device *dev, const char *name);
  *	Called when a user wants to change the Maximum Transfer Unit
  *	of a device.
  *
- * void (*ndo_tx_timeout)(struct net_device *dev);
+ * void (*ndo_tx_timeout)(struct net_device *dev, unsigned int txqueue);
  *	Callback used when the transmitter has not made any progress
  *	for dev->watchdog ticks.
  *
@@ -1281,7 +1281,8 @@ struct net_device_ops {
 						  int new_mtu);
 	int			(*ndo_neigh_setup)(struct net_device *dev,
 						   struct neigh_parms *);
-	void			(*ndo_tx_timeout) (struct net_device *dev);
+	void			(*ndo_tx_timeout) (struct net_device *dev,
+						   unsigned int txqueue);
 
 	void			(*ndo_get_stats64)(struct net_device *dev,
 						   struct rtnl_link_stats64 *storage);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index d8860f2d0976..b0bff3083278 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -253,7 +253,7 @@ extern int usbnet_open(struct net_device *net);
 extern int usbnet_stop(struct net_device *net);
 extern netdev_tx_t usbnet_start_xmit(struct sk_buff *skb,
 				     struct net_device *net);
-extern void usbnet_tx_timeout(struct net_device *net);
+extern void usbnet_tx_timeout(struct net_device *net, unsigned int txqueue);
 extern int usbnet_change_mtu(struct net_device *net, int new_mtu);
 
 extern int usbnet_get_endpoints(struct usbnet *, struct usb_interface *);
-- 
cgit v1.2.3


From 98e8627efcada18ac043a77b9101b4b4c768090b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Fri, 13 Dec 2019 18:51:07 +0100
Subject: bpf: Move trampoline JIT image allocation to a function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refactor the image allocation in the BPF trampoline code into a
separate function, so it can be shared with the BPF dispatcher in
upcoming commits.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com
---
 include/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35903f148be5..5d744828b399 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -475,6 +475,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
+void *bpf_jit_alloc_exec_page(void);
 #else
 static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
-- 
cgit v1.2.3


From 75ccbef6369e94ecac696a152a998a978d41376b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Fri, 13 Dec 2019 18:51:08 +0100
Subject: bpf: Introduce BPF dispatcher
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The BPF dispatcher is a multi-way branch code generator, mainly
targeted for XDP programs. When an XDP program is executed via the
bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
call has a substantial performance impact, when retpolines are
enabled. The dispatcher transform indirect calls to direct calls, and
therefore avoids the retpoline. The dispatcher is generated using the
BPF JIT, and relies on text poking provided by bpf_arch_text_poke().

The dispatcher hijacks a trampoline function it via the __fentry__ nop
of the trampoline. One dispatcher instance currently supports up to 64
dispatch points. A user creates a dispatcher with its corresponding
trampoline with the DEFINE_BPF_DISPATCHER macro.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com
---
 include/linux/bpf.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5d744828b399..53ae4a50abe4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -470,12 +470,61 @@ struct bpf_trampoline {
 	void *image;
 	u64 selector;
 };
+
+#define BPF_DISPATCHER_MAX 64 /* Fits in 2048B */
+
+struct bpf_dispatcher_prog {
+	struct bpf_prog *prog;
+	refcount_t users;
+};
+
+struct bpf_dispatcher {
+	/* dispatcher mutex */
+	struct mutex mutex;
+	void *func;
+	struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX];
+	int num_progs;
+	void *image;
+	u32 image_off;
+};
+
 #ifdef CONFIG_BPF_JIT
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
 void *bpf_jit_alloc_exec_page(void);
+#define BPF_DISPATCHER_INIT(name) {			\
+	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
+	.func = &name##func,				\
+	.progs = {},					\
+	.num_progs = 0,					\
+	.image = NULL,					\
+	.image_off = 0					\
+}
+
+#define DEFINE_BPF_DISPATCHER(name)					\
+	noinline unsigned int name##func(				\
+		const void *ctx,					\
+		const struct bpf_insn *insnsi,				\
+		unsigned int (*bpf_func)(const void *,			\
+					 const struct bpf_insn *))	\
+	{								\
+		return bpf_func(ctx, insnsi);				\
+	}								\
+	EXPORT_SYMBOL(name##func);			\
+	struct bpf_dispatcher name = BPF_DISPATCHER_INIT(name);
+#define DECLARE_BPF_DISPATCHER(name)					\
+	unsigned int name##func(					\
+		const void *ctx,					\
+		const struct bpf_insn *insnsi,				\
+		unsigned int (*bpf_func)(const void *,			\
+					 const struct bpf_insn *));	\
+	extern struct bpf_dispatcher name;
+#define BPF_DISPATCHER_FUNC(name) name##func
+#define BPF_DISPATCHER_PTR(name) (&name)
+void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
+				struct bpf_prog *to);
 #else
 static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
@@ -490,6 +539,13 @@ static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
 	return -ENOTSUPP;
 }
 static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
+#define DEFINE_BPF_DISPATCHER(name)
+#define DECLARE_BPF_DISPATCHER(name)
+#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_nopfunc
+#define BPF_DISPATCHER_PTR(name) NULL
+static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
+					      struct bpf_prog *from,
+					      struct bpf_prog *to) {}
 #endif
 
 struct bpf_func_info_aux {
-- 
cgit v1.2.3


From 7e6897f95935973c3253fd756135b5ea58043dc8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Fri, 13 Dec 2019 18:51:09 +0100
Subject: bpf, xdp: Start using the BPF dispatcher for XDP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds a BPF dispatcher for XDP. The dispatcher is updated
from the XDP control-path, dev_xdp_install(), and used when an XDP
program is run via bpf_prog_run_xdp().

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
---
 include/linux/bpf.h    | 15 +++++++++++++++
 include/linux/filter.h | 40 ++++++++++++++++++++++++----------------
 2 files changed, 39 insertions(+), 16 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 53ae4a50abe4..5970989b99d1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -488,6 +488,14 @@ struct bpf_dispatcher {
 	u32 image_off;
 };
 
+static __always_inline unsigned int bpf_dispatcher_nopfunc(
+	const void *ctx,
+	const struct bpf_insn *insnsi,
+	unsigned int (*bpf_func)(const void *,
+				 const struct bpf_insn *))
+{
+	return bpf_func(ctx, insnsi);
+}
 #ifdef CONFIG_BPF_JIT
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
@@ -997,6 +1005,8 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
 
 int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog);
 
+struct bpf_prog *bpf_prog_by_id(u32 id);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -1128,6 +1138,11 @@ static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 static inline void bpf_map_put(struct bpf_map *map)
 {
 }
+
+static inline struct bpf_prog *bpf_prog_by_id(u32 id)
+{
+	return ERR_PTR(-ENOTSUPP);
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a141cb07e76a..37ac7025031d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -559,23 +559,26 @@ struct sk_filter {
 
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
-#define BPF_PROG_RUN(prog, ctx)	({				\
-	u32 ret;						\
-	cant_sleep();						\
-	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
-		struct bpf_prog_stats *stats;			\
-		u64 start = sched_clock();			\
-		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
-		stats = this_cpu_ptr(prog->aux->stats);		\
-		u64_stats_update_begin(&stats->syncp);		\
-		stats->cnt++;					\
-		stats->nsecs += sched_clock() - start;		\
-		u64_stats_update_end(&stats->syncp);		\
-	} else {						\
-		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
-	}							\
+#define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
+	u32 ret;							\
+	cant_sleep();							\
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
+		struct bpf_prog_stats *stats;				\
+		u64 start = sched_clock();				\
+		ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
+		stats = this_cpu_ptr(prog->aux->stats);			\
+		u64_stats_update_begin(&stats->syncp);			\
+		stats->cnt++;						\
+		stats->nsecs += sched_clock() - start;			\
+		u64_stats_update_end(&stats->syncp);			\
+	} else {							\
+		ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func);	\
+	}								\
 	ret; })
 
+#define BPF_PROG_RUN(prog, ctx) __BPF_PROG_RUN(prog, ctx,		\
+					       bpf_dispatcher_nopfunc)
+
 #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
 
 struct bpf_skb_data_end {
@@ -699,6 +702,8 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 	return res;
 }
 
+DECLARE_BPF_DISPATCHER(bpf_dispatcher_xdp)
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -708,9 +713,12 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
-	return BPF_PROG_RUN(prog, xdp);
+	return __BPF_PROG_RUN(prog, xdp,
+			      BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp));
 }
 
+void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
+
 static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)
 {
 	return prog->len * sizeof(struct bpf_insn);
-- 
cgit v1.2.3


From 116eb788f57c9c35c40b29cfaa2607020de99a84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Fri, 13 Dec 2019 18:51:12 +0100
Subject: bpf, x86: Align dispatcher branch targets to 16B
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

>From Intel 64 and IA-32 Architectures Optimization Reference Manual,
3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch
targets should be 16-byte aligned.

This commits aligns branch targets according to the Intel manual.

The nops used to align branch targets make the dispatcher larger, and
therefore the number of supported dispatch points/programs are
descreased from 64 to 48.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com
---
 include/linux/bpf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5970989b99d1..d467983e61bb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -471,7 +471,7 @@ struct bpf_trampoline {
 	u64 selector;
 };
 
-#define BPF_DISPATCHER_MAX 64 /* Fits in 2048B */
+#define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
 
 struct bpf_dispatcher_prog {
 	struct bpf_prog *prog;
-- 
cgit v1.2.3


From 826f66b30c2e3d4df533e4224cb8c734afa0a17c Mon Sep 17 00:00:00 2001
From: Andy Roulin <aroulin@cumulusnetworks.com>
Date: Wed, 11 Dec 2019 14:30:58 -0800
Subject: bonding: move 802.3ad port state flags to uapi

The bond slave actor/partner operating state is exported as
bitfield to userspace, which lacks a way to interpret it, e.g.,
iproute2 only prints the state as a number:

ad_actor_oper_port_state 15

For userspace to interpret the bitfield, the bitfield definitions
should be part of the uapi. The bitfield itself is defined in the
802.3ad standard.

This commit moves the 802.3ad bitfield definitions to uapi.

Related iproute2 patches, soon to be posted upstream, use the new uapi
headers to pretty-print bond slave state, e.g., with ip -d link show

ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/uapi/linux/if_bonding.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 790585f0e61b..6829213a54c5 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -95,6 +95,16 @@
 #define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
 #define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
+/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
+#define AD_STATE_LACP_ACTIVITY   0x1
+#define AD_STATE_LACP_TIMEOUT    0x2
+#define AD_STATE_AGGREGATION     0x4
+#define AD_STATE_SYNCHRONIZATION 0x8
+#define AD_STATE_COLLECTING      0x10
+#define AD_STATE_DISTRIBUTING    0x20
+#define AD_STATE_DEFAULTED       0x40
+#define AD_STATE_EXPIRED         0x80
+
 typedef struct ifbond {
 	__s32 bond_mode;
 	__s32 num_slaves;
-- 
cgit v1.2.3


From de1799667b002331609e8ee6b924ccfd51968d99 Mon Sep 17 00:00:00 2001
From: Vivien Didelot <vivien.didelot@gmail.com>
Date: Wed, 11 Dec 2019 20:07:10 -0500
Subject: net: bridge: add STP xstats

This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
transition_fwd xstats counters to the bridge ports copied over via
netlink, providing useful information for STP.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/uapi/linux/if_bridge.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 1b3c2b643a02..4a58e3d7de46 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -156,6 +156,15 @@ struct bridge_vlan_xstats {
 	__u32 pad2;
 };
 
+struct bridge_stp_xstats {
+	__u64 transition_blk;
+	__u64 transition_fwd;
+	__u64 rx_bpdu;
+	__u64 tx_bpdu;
+	__u64 rx_tcn;
+	__u64 tx_tcn;
+};
+
 /* Bridge multicast database attributes
  * [MDBA_MDB] = {
  *     [MDBA_MDB_ENTRY] = {
@@ -262,6 +271,7 @@ enum {
 	BRIDGE_XSTATS_VLAN,
 	BRIDGE_XSTATS_MCAST,
 	BRIDGE_XSTATS_PAD,
+	BRIDGE_XSTATS_STP,
 	__BRIDGE_XSTATS_MAX
 };
 #define BRIDGE_XSTATS_MAX (__BRIDGE_XSTATS_MAX - 1)
-- 
cgit v1.2.3


From 166750bc1dd256b2184b22588fb9fe6d3fbb93ae Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko <andriin@fb.com>
Date: Fri, 13 Dec 2019 17:47:08 -0800
Subject: libbpf: Support libbpf-provided extern variables

Add support for extern variables, provided to BPF program by libbpf. Currently
the following extern variables are supported:
  - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is
    executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte
    long;
  - CONFIG_xxx values; a set of values of actual kernel config. Tristate,
    boolean, strings, and integer values are supported.

Set of possible values is determined by declared type of extern variable.
Supported types of variables are:
- Tristate values. Are represented as `enum libbpf_tristate`. Accepted values
  are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO,
  or TRI_MODULE, respectively.
- Boolean values. Are represented as bool (_Bool) types. Accepted values are
  'y' and 'n' only, turning into true/false values, respectively.
- Single-character values. Can be used both as a substritute for
  bool/tristate, or as a small-range integer:
  - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm';
  - integers in a range [-128, 127] or [0, 255] (depending on signedness of
    char in target architecture) are recognized and represented with
    respective values of char type.
- Strings. String values are declared as fixed-length char arrays. String of
  up to that length will be accepted and put in first N bytes of char array,
  with the rest of bytes zeroed out. If config string value is longer than
  space alloted, it will be truncated and warning message emitted. Char array
  is always zero terminated. String literals in config have to be enclosed in
  double quotes, just like C-style string literals.
- Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and
  unsigned variants. Libbpf enforces parsed config value to be in the
  supported range of corresponding integer type. Integers values in config can
  be:
  - decimal integers, with optional + and - signs;
  - hexadecimal integers, prefixed with 0x or 0X;
  - octal integers, starting with 0.

Config file itself is searched in /boot/config-$(uname -r) location with
fallback to /proc/config.gz, unless config path is specified explicitly
through bpf_object_open_opts' kernel_config_path option. Both gzipped and
plain text formats are supported. Libbpf adds explicit dependency on zlib
because of this, but this shouldn't be a problem, given libelf already depends
on zlib.

All detected extern variables, are put into a separate .extern internal map.
It, similarly to .rodata map, is marked as read-only from BPF program side, as
well as is frozen on load. This allows BPF verifier to track extern values as
constants and perform enhanced branch prediction and dead code elimination.
This can be relied upon for doing kernel version/feature detection and using
potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF
program, while still having a single version of BPF program running on old and
new kernels. Selftests are validating this explicitly for unexisting BPF
helper.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
---
 include/uapi/linux/btf.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index c02dec97e1ce..1a2898c482ee 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -142,7 +142,8 @@ struct btf_param {
 
 enum {
 	BTF_VAR_STATIC = 0,
-	BTF_VAR_GLOBAL_ALLOCATED,
+	BTF_VAR_GLOBAL_ALLOCATED = 1,
+	BTF_VAR_GLOBAL_EXTERN = 2,
 };
 
 /* BTF_KIND_VAR is followed by a single "struct btf_var" to describe
-- 
cgit v1.2.3


From 9429439f59cd3b82a3e2732ead5363578de97a84 Mon Sep 17 00:00:00 2001
From: Yangbo Lu <yangbo.lu@nxp.com>
Date: Thu, 12 Dec 2019 18:08:05 +0800
Subject: ptp_qoriq: export extts_clean_up() function

Export extts_clean_up() function so that dpaa2-ptp
driver is able to reuse it.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/fsl/ptp_qoriq.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h
index 992bf9fa1729..b0b743563f43 100644
--- a/include/linux/fsl/ptp_qoriq.h
+++ b/include/linux/fsl/ptp_qoriq.h
@@ -192,6 +192,7 @@ int ptp_qoriq_settime(struct ptp_clock_info *ptp,
 		      const struct timespec64 *ts);
 int ptp_qoriq_enable(struct ptp_clock_info *ptp,
 		     struct ptp_clock_request *rq, int on);
+int extts_clean_up(struct ptp_qoriq *ptp_qoriq, int index, bool update_event);
 #ifdef CONFIG_DEBUG_FS
 void ptp_qoriq_create_debugfs(struct ptp_qoriq *ptp_qoriq);
 void ptp_qoriq_remove_debugfs(struct ptp_qoriq *ptp_qoriq);
-- 
cgit v1.2.3


From 1f1c1d7c89ee538f3e36b43098e95973f8fa37db Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven@narfation.org>
Date: Fri, 13 Dec 2019 21:24:27 +0100
Subject: ipv6: Annotate bitwise IPv6 dsfield pointer cast

The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).

This triggered a warning in ipv6_get_dsfield() because sparse doesn't know
that the buffer already points to some data in the correct bitwise integer
format. This was already fixed in ipv6_change_dsfield() by the __force
attribute and can be fixed here the same way.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dsfield.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/dsfield.h b/include/net/dsfield.h
index 1a245ee10c95..a59a57ffc546 100644
--- a/include/net/dsfield.h
+++ b/include/net/dsfield.h
@@ -21,7 +21,7 @@ static inline __u8 ipv4_get_dsfield(const struct iphdr *iph)
 
 static inline __u8 ipv6_get_dsfield(const struct ipv6hdr *ipv6h)
 {
-	return ntohs(*(const __be16 *)ipv6h) >> 4;
+	return ntohs(*(__force const __be16 *)ipv6h) >> 4;
 }
 
 
-- 
cgit v1.2.3


From 54e1f08bddbe63a3c0ae44f65df2c8b895003ef4 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven@narfation.org>
Date: Fri, 13 Dec 2019 21:24:28 +0100
Subject: ipv6: Annotate ipv6_addr_is_* bitwise pointer casts

The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).

This triggered a warning in the 64 bit optimized ipv6_addr_is_*() functions
because sparse doesn't know that the buffer already points to some data in
the correct bitwise integer format. But these were correct and can
therefore be marked with __force to signalize sparse an intended cast to a
specific bitwise type.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/addrconf.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'include')

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 1bab88184d3c..a088349dd94f 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -437,7 +437,7 @@ static inline void addrconf_addr_solict_mult(const struct in6_addr *addr,
 static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
-	__be64 *p = (__be64 *)addr;
+	__be64 *p = (__force __be64 *)addr;
 	return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | (p[1] ^ cpu_to_be64(1))) == 0UL;
 #else
 	return ((addr->s6_addr32[0] ^ htonl(0xff020000)) |
@@ -449,7 +449,7 @@ static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr)
 static inline bool ipv6_addr_is_ll_all_routers(const struct in6_addr *addr)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
-	__be64 *p = (__be64 *)addr;
+	__be64 *p = (__force __be64 *)addr;
 	return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) | (p[1] ^ cpu_to_be64(2))) == 0UL;
 #else
 	return ((addr->s6_addr32[0] ^ htonl(0xff020000)) |
@@ -466,7 +466,7 @@ static inline bool ipv6_addr_is_isatap(const struct in6_addr *addr)
 static inline bool ipv6_addr_is_solict_mult(const struct in6_addr *addr)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
-	__be64 *p = (__be64 *)addr;
+	__be64 *p = (__force __be64 *)addr;
 	return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) |
 		((p[1] ^ cpu_to_be64(0x00000001ff000000UL)) &
 		 cpu_to_be64(0xffffffffff000000UL))) == 0UL;
@@ -481,7 +481,7 @@ static inline bool ipv6_addr_is_solict_mult(const struct in6_addr *addr)
 static inline bool ipv6_addr_is_all_snoopers(const struct in6_addr *addr)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
-	__be64 *p = (__be64 *)addr;
+	__be64 *p = (__force __be64 *)addr;
 
 	return ((p[0] ^ cpu_to_be64(0xff02000000000000UL)) |
 		(p[1] ^ cpu_to_be64(0x6a))) == 0UL;
-- 
cgit v1.2.3


From a2ec8b5706944d228181c8b91d815f41d6dd8e7b Mon Sep 17 00:00:00 2001
From: Josh Soref <jsoref@gmail.com>
Date: Sun, 15 Dec 2019 22:08:02 +0100
Subject: wireguard: global: fix spelling mistakes in comments

This fixes two spelling errors in source code comments.

Signed-off-by: Josh Soref <jsoref@gmail.com>
[Jason: rewrote commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/wireguard.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h
index dd8a47c4ad11..ae88be14c947 100644
--- a/include/uapi/linux/wireguard.h
+++ b/include/uapi/linux/wireguard.h
@@ -18,13 +18,13 @@
  * one but not both of:
  *
  *    WGDEVICE_A_IFINDEX: NLA_U32
- *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1
  *
  * The kernel will then return several messages (NLM_F_MULTI) containing the
  * following tree of nested items:
  *
  *    WGDEVICE_A_IFINDEX: NLA_U32
- *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1
  *    WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
  *    WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
  *    WGDEVICE_A_LISTEN_PORT: NLA_U16
@@ -77,7 +77,7 @@
  * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME:
  *
  *    WGDEVICE_A_IFINDEX: NLA_U32
- *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1
  *    WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current
  *                      peers should be removed prior to adding the list below.
  *    WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove
@@ -121,7 +121,7 @@
  * filling in information not contained in the prior. Note that if
  * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably
  * should not be specified in fragments that come after, so that the list
- * of peers is only cleared the first time but appened after. Likewise for
+ * of peers is only cleared the first time but appended after. Likewise for
  * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message
  * of a peer, it likely should not be specified in subsequent fragments.
  *
-- 
cgit v1.2.3


From 2f5e70c8ce47396bfa8f5c437574b569c02597bb Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas@wunner.de>
Date: Wed, 20 Nov 2019 12:33:59 +0100
Subject: netfilter: Document ingress hook

Amend kerneldoc of struct net_device to fix a "make htmldocs" warning:

include/linux/netdevice.h:2045: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netdevice.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 30745068fb39..0b097bbd3663 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1708,6 +1708,7 @@ enum netdev_priv_flags {
  *	@miniq_ingress:		ingress/clsact qdisc specific data for
  *				ingress processing
  *	@ingress_queue:		XXX: need comments on this one
+ *	@nf_hooks_ingress:	netfilter hooks executed for ingress packets
  *	@broadcast:		hw bcast address
  *
  *	@rx_cpu_rmap:	CPU reverse-mapping for RX completion interrupts,
-- 
cgit v1.2.3


From d4aef159394d5940bd7158ab789969dab82f7c76 Mon Sep 17 00:00:00 2001
From: Soeren Moch <smoch@web.de>
Date: Thu, 12 Dec 2019 00:52:49 +0100
Subject: brcmfmac: add support for BCM4359 SDIO chipset

BCM4359 is a 2x2 802.11 abgn+ac Dual-Band HT80 combo chip and it
supports Real Simultaneous Dual Band feature.

Based on a similar patch by: Wright Feng <wright.feng@cypress.com>

Signed-off-by: Soeren Moch <smoch@web.de>
Acked-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
---
 include/linux/mmc/sdio_ids.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h
index 08b25c02b5a1..2e9a6e4634eb 100644
--- a/include/linux/mmc/sdio_ids.h
+++ b/include/linux/mmc/sdio_ids.h
@@ -41,8 +41,10 @@
 #define SDIO_DEVICE_ID_BROADCOM_43455		0xa9bf
 #define SDIO_DEVICE_ID_BROADCOM_4354		0x4354
 #define SDIO_DEVICE_ID_BROADCOM_4356		0x4356
+#define SDIO_DEVICE_ID_BROADCOM_4359		0x4359
 #define SDIO_DEVICE_ID_CYPRESS_4373		0x4373
 #define SDIO_DEVICE_ID_CYPRESS_43012		43012
+#define SDIO_DEVICE_ID_CYPRESS_89359		0x4355
 
 #define SDIO_VENDOR_ID_INTEL			0x0089
 #define SDIO_DEVICE_ID_INTEL_IWMC3200WIMAX	0x1402
-- 
cgit v1.2.3


From 504723af0d85434be5fb6f2dde0b62644a7f1ead Mon Sep 17 00:00:00 2001
From: Jose Abreu <joabreu@synopsys.com>
Date: Wed, 18 Dec 2019 11:33:05 +0100
Subject: net: stmmac: Add basic EST support for GMAC5+

Adds the support for EST in GMAC5+ cores. This feature allows to offload
scheduling of queues opening time to the IP.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/stmmac.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

(limited to 'include')

diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index d4bcd9387136..0531afa9b21e 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -109,6 +109,18 @@ struct stmmac_axi {
 	bool axi_rb;
 };
 
+#define EST_GCL		1024
+struct stmmac_est {
+	int enable;
+	u32 btr_offset[2];
+	u32 btr[2];
+	u32 ctr[2];
+	u32 ter;
+	u32 gcl_unaligned[EST_GCL];
+	u32 gcl[EST_GCL];
+	u32 gcl_size;
+};
+
 struct stmmac_rxq_cfg {
 	u8 mode_to_use;
 	u32 chan;
@@ -139,6 +151,7 @@ struct plat_stmmacenet_data {
 	struct device_node *phylink_node;
 	struct device_node *mdio_node;
 	struct stmmac_dma_cfg *dma_cfg;
+	struct stmmac_est *est;
 	int clk_csr;
 	int has_gmac;
 	int enh_desc;
-- 
cgit v1.2.3


From 9586a992fb752b14ac18fc86fe086b0e3372fff4 Mon Sep 17 00:00:00 2001
From: Petr Machata <petrm@mellanox.com>
Date: Wed, 18 Dec 2019 14:55:08 +0000
Subject: net: pkt_cls: Clarify a comment

The bit about negating HW backlog left me scratching my head. Clarify the
comment.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pkt_cls.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e553fc80eb23..a7c5d492bc04 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -791,9 +791,8 @@ enum tc_prio_command {
 struct tc_prio_qopt_offload_params {
 	int bands;
 	u8 priomap[TC_PRIO_MAX + 1];
-	/* In case that a prio qdisc is offloaded and now is changed to a
-	 * non-offloadedable config, it needs to update the backlog & qlen
-	 * values to negate the HW backlog & qlen values (and only them).
+	/* At the point of un-offloading the Qdisc, the reported backlog and
+	 * qlen need to be reduced by the portion that is in HW.
 	 */
 	struct gnet_stats_queue *qstats;
 };
-- 
cgit v1.2.3


From dcc68b4d8084e1ac9af0d4022d6b1aff6a139a33 Mon Sep 17 00:00:00 2001
From: Petr Machata <petrm@mellanox.com>
Date: Wed, 18 Dec 2019 14:55:13 +0000
Subject: net: sch_ets: Add a new Qdisc

Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is
PRIO-like in how it is configured, meaning one needs to specify how many
bands there are, how many are strict and how many are dwrr, quanta for the
latter, and priomap.

The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/pkt_sched.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 9f1a72876212..bf5a5b1dfb0b 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1187,4 +1187,21 @@ enum {
 
 #define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
 
+/* ETS */
+
+#define TCQ_ETS_MAX_BANDS 16
+
+enum {
+	TCA_ETS_UNSPEC,
+	TCA_ETS_NBANDS,		/* u8 */
+	TCA_ETS_NSTRICT,	/* u8 */
+	TCA_ETS_QUANTA,		/* nested TCA_ETS_QUANTA_BAND */
+	TCA_ETS_QUANTA_BAND,	/* u32 */
+	TCA_ETS_PRIOMAP,	/* nested TCA_ETS_PRIOMAP_BAND */
+	TCA_ETS_PRIOMAP_BAND,	/* u8 */
+	__TCA_ETS_MAX,
+};
+
+#define TCA_ETS_MAX (__TCA_ETS_MAX - 1)
+
 #endif
-- 
cgit v1.2.3


From d35eb52bd2ac7557b62bda52668f2e64dde2cf90 Mon Sep 17 00:00:00 2001
From: Petr Machata <petrm@mellanox.com>
Date: Wed, 18 Dec 2019 14:55:15 +0000
Subject: net: sch_ets: Make the ETS qdisc offloadable

Add hooks at appropriate points to make it possible to offload the ETS
Qdisc.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h     | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 30745068fb39..7a8ed11f5d45 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -849,6 +849,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_GRED,
 	TC_SETUP_QDISC_TAPRIO,
 	TC_SETUP_FT,
+	TC_SETUP_QDISC_ETS,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a7c5d492bc04..47b115e2012a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -823,4 +823,35 @@ struct tc_root_qopt_offload {
 	bool ingress;
 };
 
+enum tc_ets_command {
+	TC_ETS_REPLACE,
+	TC_ETS_DESTROY,
+	TC_ETS_STATS,
+	TC_ETS_GRAFT,
+};
+
+struct tc_ets_qopt_offload_replace_params {
+	unsigned int bands;
+	u8 priomap[TC_PRIO_MAX + 1];
+	unsigned int quanta[TCQ_ETS_MAX_BANDS];	/* 0 for strict bands. */
+	unsigned int weights[TCQ_ETS_MAX_BANDS];
+	struct gnet_stats_queue *qstats;
+};
+
+struct tc_ets_qopt_offload_graft_params {
+	u8 band;
+	u32 child_handle;
+};
+
+struct tc_ets_qopt_offload {
+	enum tc_ets_command command;
+	u32 handle;
+	u32 parent;
+	union {
+		struct tc_ets_qopt_offload_replace_params replace_params;
+		struct tc_qopt_offload_stats stats;
+		struct tc_ets_qopt_offload_graft_params graft_params;
+	};
+};
+
 #endif
-- 
cgit v1.2.3


From 2a10ab043ac5a658225ee77852db7942de9ac4c5 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Tue, 17 Dec 2019 13:39:11 +0000
Subject: net: phy: add genphy_check_and_restart_aneg()

Add a helper for restarting autonegotiation(), similar to the clause 45
variant.  Use it in __genphy_config_aneg()

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5032d453ac66..1c4f97d2631d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1094,6 +1094,7 @@ void phy_attached_info(struct phy_device *phydev);
 int genphy_read_abilities(struct phy_device *phydev);
 int genphy_setup_forced(struct phy_device *phydev);
 int genphy_restart_aneg(struct phy_device *phydev);
+int genphy_check_and_restart_aneg(struct phy_device *phydev, bool restart);
 int genphy_config_eee_advert(struct phy_device *phydev);
 int __genphy_config_aneg(struct phy_device *phydev, bool changed);
 int genphy_aneg_done(struct phy_device *phydev);
-- 
cgit v1.2.3


From 0efc286a923874f0c243e5766cce54e9429ed949 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Tue, 17 Dec 2019 13:39:16 +0000
Subject: net: phy: provide and use genphy_read_status_fixed()

There are two drivers and generic code which contain exactly the same
code to read the status of a PHY operating without autonegotiation
enabled. Rather than duplicate this code, provide a helper to read
this information.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 1c4f97d2631d..b2105e0d72d3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1100,6 +1100,7 @@ int __genphy_config_aneg(struct phy_device *phydev, bool changed);
 int genphy_aneg_done(struct phy_device *phydev);
 int genphy_update_link(struct phy_device *phydev);
 int genphy_read_lpa(struct phy_device *phydev);
+int genphy_read_status_fixed(struct phy_device *phydev);
 int genphy_read_status(struct phy_device *phydev);
 int genphy_suspend(struct phy_device *phydev);
 int genphy_resume(struct phy_device *phydev);
-- 
cgit v1.2.3


From 8d5a49e9e31ba1ddd34a54b2351d068a90c78707 Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 17 Dec 2019 14:12:01 -0800
Subject: net/tls: add helper for testing if socket is RX offloaded

There is currently no way for driver to reliably check that
the socket it has looked up is in fact RX offloaded. Add
a helper. This allows drivers to catch misbehaving firmware.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tls.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/net/tls.h b/include/net/tls.h
index df630f5fc723..bf9eb4823933 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -641,6 +641,7 @@ int tls_sw_fallback_init(struct sock *sk,
 #ifdef CONFIG_TLS_DEVICE
 void tls_device_init(void);
 void tls_device_cleanup(void);
+void tls_device_sk_destruct(struct sock *sk);
 int tls_set_device_offload(struct sock *sk, struct tls_context *ctx);
 void tls_device_free_resources_tx(struct sock *sk);
 int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx);
@@ -649,6 +650,14 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq);
 void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq);
 int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx,
 			 struct sk_buff *skb, struct strp_msg *rxm);
+
+static inline bool tls_is_sk_rx_device_offloaded(struct sock *sk)
+{
+	if (!sk_fullsock(sk) ||
+	    smp_load_acquire(&sk->sk_destruct) != tls_device_sk_destruct)
+		return false;
+	return tls_get_ctx(sk)->rx_conf == TLS_HW;
+}
 #else
 static inline void tls_device_init(void) {}
 static inline void tls_device_cleanup(void) {}
-- 
cgit v1.2.3


From e312b9e706ed6d94f6cc9088fcd9fbd81de4525c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Thu, 19 Dec 2019 07:10:02 +0100
Subject: xsk: Make xskmap flush_list common for all map instances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The xskmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all xskmaps, which simplifies __xsk_map_flush()
and xsk_map_alloc().

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-5-bjorn.topel@gmail.com
---
 include/net/xdp_sock.h | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

(limited to 'include')

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e3780e4b74e1..48594740d67c 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -72,7 +72,6 @@ struct xdp_umem {
 
 struct xsk_map {
 	struct bpf_map map;
-	struct list_head __percpu *flush_list;
 	spinlock_t lock; /* Synchronize map updates */
 	struct xdp_sock *xsk_map[];
 };
@@ -139,9 +138,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 			     struct xdp_sock **map_entry);
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
-int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
-		       struct xdp_sock *xs);
-void __xsk_map_flush(struct bpf_map *map);
+int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp);
+void __xsk_map_flush(void);
 
 static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map,
 						     u32 key)
@@ -369,13 +367,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
-static inline int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
-				     struct xdp_sock *xs)
+static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;
 }
 
-static inline void __xsk_map_flush(struct bpf_map *map)
+static inline void __xsk_map_flush(void)
 {
 }
 
-- 
cgit v1.2.3


From 96360004b8628541f5d05a845ea213267db0b1a2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Thu, 19 Dec 2019 07:10:03 +0100
Subject: xdp: Make devmap flush_list common for all map instances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The devmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __dev_map_flush()
and dev_map_init_map().

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com
---
 include/linux/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d467983e61bb..31191804ca09 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -959,7 +959,7 @@ struct sk_buff;
 
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
-void __dev_map_flush(struct bpf_map *map);
+void __dev_map_flush(void);
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -1068,7 +1068,7 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
 	return NULL;
 }
 
-static inline void __dev_map_flush(struct bpf_map *map)
+static inline void __dev_map_flush(void)
 {
 }
 
-- 
cgit v1.2.3


From cdfafe98cabefeedbbc65af5c191c59745c03298 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Thu, 19 Dec 2019 07:10:04 +0100
Subject: xdp: Make cpumap flush_list common for all map instances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The cpumap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __cpu_map_flush()
and cpu_map_alloc().

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-7-bjorn.topel@gmail.com
---
 include/linux/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 31191804ca09..8f3e00c84f39 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -966,7 +966,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 			     struct bpf_prog *xdp_prog);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
-void __cpu_map_flush(struct bpf_map *map);
+void __cpu_map_flush(void);
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 
@@ -1097,7 +1097,7 @@ struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 	return NULL;
 }
 
-static inline void __cpu_map_flush(struct bpf_map *map)
+static inline void __cpu_map_flush(void)
 {
 }
 
-- 
cgit v1.2.3


From 332f22a60e4c3492d4953cd6f7aaa4e8bd0bba97 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Thu, 19 Dec 2019 07:10:05 +0100
Subject: xdp: Remove map_to_flush and map swap detection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that all XDP maps that can be used with bpf_redirect_map() tracks
entries to be flushed in a global fashion, there is not need to track
that the map has changed and flush from xdp_do_generic_map()
anymore. All entries will be flushed in xdp_do_flush_map().

This means that the map_to_flush can be removed, and the corresponding
checks. Moving the flush logic to one place, xdp_do_flush_map(), give
a bulking behavior and performance boost.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-8-bjorn.topel@gmail.com
---
 include/linux/filter.h | 1 -
 1 file changed, 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 37ac7025031d..69d6706fc889 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -592,7 +592,6 @@ struct bpf_redirect_info {
 	u32 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
-	struct bpf_map *map_to_flush;
 	u32 kern_flags;
 };
 
-- 
cgit v1.2.3


From 7dd68b3279f1792103d12e69933db3128c6d416e Mon Sep 17 00:00:00 2001
From: Andrey Ignatov <rdna@fb.com>
Date: Wed, 18 Dec 2019 23:44:35 -0800
Subject: bpf: Support replacing cgroup-bpf program in MULTI mode

The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
  new version(s) again in the right order. This introduces an
  interruption in the work a program is doing and may not be acceptable
  (e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
  detach the old version. This introduces the time interval when two
  versions of same program are working, what may not be acceptable if a
  program is not idempotent. It also imposes additional burden on
  program developers to make sure that two versions of their program can
  co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
  descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
  error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
  program is not attached to specified cgroup, BPF_PROG_ATTACH will
  return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
---
 include/linux/bpf-cgroup.h |  4 +++-
 include/uapi/linux/bpf.h   | 10 ++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 169fd25f6bc2..18f6a6da7c3c 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -85,6 +85,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp);
 void cgroup_bpf_offline(struct cgroup *cgrp);
 
 int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
+			struct bpf_prog *replace_prog,
 			enum bpf_attach_type type, u32 flags);
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			enum bpf_attach_type type);
@@ -93,7 +94,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 
 /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */
 int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, u32 flags);
+		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
+		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type, u32 flags);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dbbcf0b02970..7df436da542d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -231,6 +231,11 @@ enum bpf_attach_type {
  * When children program makes decision (like picking TCP CA or sock bind)
  * parent program has a chance to override it.
  *
+ * With BPF_F_ALLOW_MULTI a new program is added to the end of the list of
+ * programs for a cgroup. Though it's possible to replace an old program at
+ * any position by also specifying BPF_F_REPLACE flag and position itself in
+ * replace_bpf_fd attribute. Old program at this position will be released.
+ *
  * A cgroup with MULTI or OVERRIDE flag allows any attach flags in sub-cgroups.
  * A cgroup with NONE doesn't allow any programs in sub-cgroups.
  * Ex1:
@@ -249,6 +254,7 @@ enum bpf_attach_type {
  */
 #define BPF_F_ALLOW_OVERRIDE	(1U << 0)
 #define BPF_F_ALLOW_MULTI	(1U << 1)
+#define BPF_F_REPLACE		(1U << 2)
 
 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
  * verifier will perform strict alignment checking as if the kernel
@@ -442,6 +448,10 @@ union bpf_attr {
 		__u32		attach_bpf_fd;	/* eBPF program to attach */
 		__u32		attach_type;
 		__u32		attach_flags;
+		__u32		replace_bpf_fd;	/* previously attached eBPF
+						 * program to replace if
+						 * BPF_F_REPLACE is used
+						 */
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
-- 
cgit v1.2.3


From 03896ef1f0cb23d2742ddf486c531c700a2da7d6 Mon Sep 17 00:00:00 2001
From: Magnus Karlsson <magnus.karlsson@intel.com>
Date: Thu, 19 Dec 2019 13:39:27 +0100
Subject: xsk: Change names of validation functions

Change the names of the validation functions to better reflect what
they are doing. The uppermost ones are reading entries from the rings
and only the bottom ones validate entries. So xskq_cons_read_ is a
better prefix name.

Also change the xskq_cons_read_ functions to return a bool
as the the descriptor or address is already returned by reference
in the parameters. Everyone is using the return value as a bool
anyway.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-9-git-send-email-magnus.karlsson@intel.com
---
 include/net/xdp_sock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 48594740d67c..63f005830866 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -118,7 +118,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
 /* Used from netdev driver */
 bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
-u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
+bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
 void xsk_umem_discard_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
@@ -197,7 +197,7 @@ static inline bool xsk_umem_has_addrs_rq(struct xdp_umem *umem, u32 cnt)
 	return xsk_umem_has_addrs(umem, cnt - rq->length);
 }
 
-static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 {
 	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
 
-- 
cgit v1.2.3


From f8509aa078de0842ec1817e8026e58620cd05d3b Mon Sep 17 00:00:00 2001
From: Magnus Karlsson <magnus.karlsson@intel.com>
Date: Thu, 19 Dec 2019 13:39:28 +0100
Subject: xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to
 xsk_umem_release_addr

Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to
better reflect the new naming of the AF_XDP queue manipulation
functions. As this functions is used by drivers implementing support
for AF_XDP zero-copy, it requires a name change to these drivers. The
function xsk_umem_release_addr_rq has also changed name in the same
fashion.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com
---
 include/net/xdp_sock.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

(limited to 'include')

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 63f005830866..e86ec48ef627 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -119,7 +119,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
 /* Used from netdev driver */
 bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
 bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
-void xsk_umem_discard_addr(struct xdp_umem *umem);
+void xsk_umem_release_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
 void xsk_umem_consume_tx_done(struct xdp_umem *umem);
@@ -208,12 +208,12 @@ static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 	return addr;
 }
 
-static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem)
 {
 	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
 
 	if (!rq->length)
-		xsk_umem_discard_addr(umem);
+		xsk_umem_release_addr(umem);
 	else
 		rq->length--;
 }
@@ -258,7 +258,7 @@ static inline u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 	return NULL;
 }
 
-static inline void xsk_umem_discard_addr(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr(struct xdp_umem *umem)
 {
 }
 
@@ -332,7 +332,7 @@ static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 	return NULL;
 }
 
-static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem)
 {
 }
 
-- 
cgit v1.2.3


From 48fda74f0a9377ce2145ac5f06b6db0274880826 Mon Sep 17 00:00:00 2001
From: Oleksij Rempel <o.rempel@pengutronix.de>
Date: Wed, 18 Dec 2019 09:02:14 +0100
Subject: net: dsa: add support for Atheros AR9331 TAG format

Add support for tag format used in Atheros AR9331 built-in switch.

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dsa.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6767dc3f66c0..da5578db228e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -43,6 +43,7 @@ struct phylink_link_state;
 #define DSA_TAG_PROTO_SJA1105_VALUE		13
 #define DSA_TAG_PROTO_KSZ8795_VALUE		14
 #define DSA_TAG_PROTO_OCELOT_VALUE		15
+#define DSA_TAG_PROTO_AR9331_VALUE		16
 
 enum dsa_tag_protocol {
 	DSA_TAG_PROTO_NONE		= DSA_TAG_PROTO_NONE_VALUE,
@@ -61,6 +62,7 @@ enum dsa_tag_protocol {
 	DSA_TAG_PROTO_SJA1105		= DSA_TAG_PROTO_SJA1105_VALUE,
 	DSA_TAG_PROTO_KSZ8795		= DSA_TAG_PROTO_KSZ8795_VALUE,
 	DSA_TAG_PROTO_OCELOT		= DSA_TAG_PROTO_OCELOT_VALUE,
+	DSA_TAG_PROTO_AR9331		= DSA_TAG_PROTO_AR9331_VALUE,
 };
 
 struct packet_type;
-- 
cgit v1.2.3


From e1b5e598e5a51b453328879682b178b4acc15105 Mon Sep 17 00:00:00 2001
From: John Rutherford <john.rutherford@dektech.com.au>
Date: Thu, 19 Dec 2019 16:03:57 +1100
Subject: tipc: make legacy address flag readable over netlink

To enable iproute2/tipc to generate backwards compatible
printouts and validate command parameters for nodes using a
<z.c.n> node address, it needs to be able to read the legacy
address flag from the kernel.  The legacy address flag records
the way in which the node identity was originally specified.

The legacy address flag is requested by the netlink message
TIPC_NL_ADDR_LEGACY_GET.  If the flag is set the attribute
TIPC_NLA_NET_ADDR_LEGACY is set in the return message.

Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/tipc_netlink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
index 6c2194ab745b..dc0d23a50e69 100644
--- a/include/uapi/linux/tipc_netlink.h
+++ b/include/uapi/linux/tipc_netlink.h
@@ -65,6 +65,7 @@ enum {
 	TIPC_NL_UDP_GET_REMOTEIP,
 	TIPC_NL_KEY_SET,
 	TIPC_NL_KEY_FLUSH,
+	TIPC_NL_ADDR_LEGACY_GET,
 
 	__TIPC_NL_CMD_MAX,
 	TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1
@@ -176,6 +177,7 @@ enum {
 	TIPC_NLA_NET_ADDR,		/* u32 */
 	TIPC_NLA_NET_NODEID,		/* u64 */
 	TIPC_NLA_NET_NODEID_W1,		/* u64 */
+	TIPC_NLA_NET_ADDR_LEGACY,	/* flag */
 
 	__TIPC_NLA_NET_MAX,
 	TIPC_NLA_NET_MAX = __TIPC_NLA_NET_MAX - 1
-- 
cgit v1.2.3


From f66b53fdbb22ced1a323b22b9de84a61aacd8d18 Mon Sep 17 00:00:00 2001
From: Martin Varghese <martin.varghese@nokia.com>
Date: Sat, 21 Dec 2019 08:50:46 +0530
Subject: openvswitch: New MPLS actions for layer 2 tunnelling

The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.

The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.

POP_MPLS action is extended to support ethertype 0x6558.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/openvswitch.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index a87b44cd5590..ae2bff14e7e1 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -672,6 +672,32 @@ struct ovs_action_push_mpls {
 	__be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */
 };
 
+/**
+ * struct ovs_action_add_mpls - %OVS_ACTION_ATTR_ADD_MPLS action
+ * argument.
+ * @mpls_lse: MPLS label stack entry to push.
+ * @mpls_ethertype: Ethertype to set in the encapsulating ethernet frame.
+ * @tun_flags: MPLS tunnel attributes.
+ *
+ * The only values @mpls_ethertype should ever be given are %ETH_P_MPLS_UC and
+ * %ETH_P_MPLS_MC, indicating MPLS unicast or multicast. Other are rejected.
+ */
+struct ovs_action_add_mpls {
+	__be32 mpls_lse;
+	__be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */
+	__u16 tun_flags;
+};
+
+#define OVS_MPLS_L3_TUNNEL_FLAG_MASK  (1 << 0) /* Flag to specify the place of
+						* insertion of MPLS header.
+						* When false, the MPLS header
+						* will be inserted at the start
+						* of the packet.
+						* When true, the MPLS header
+						* will be inserted at the start
+						* of the l3 header.
+						*/
+
 /**
  * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument.
  * @vlan_tpid: Tag protocol identifier (TPID) to push.
@@ -892,6 +918,10 @@ struct check_pkt_len_arg {
  * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the packet length and execute a set
  * of actions if greater than the specified packet length, else execute
  * another set of actions.
+ * @OVS_ACTION_ATTR_ADD_MPLS: Push a new MPLS label stack entry at the
+ * start of the packet or at the start of the l3 header depending on the value
+ * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS
+ * argument.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -927,6 +957,7 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_METER,        /* u32 meter ID. */
 	OVS_ACTION_ATTR_CLONE,        /* Nested OVS_CLONE_ATTR_*.  */
 	OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
+	OVS_ACTION_ATTR_ADD_MPLS,     /* struct ovs_action_add_mpls. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
-- 
cgit v1.2.3


From 6b722237b656d045c0b9bab9966a5e46604258ba Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 23 Dec 2019 15:28:12 +0200
Subject: net: fib_notifier: Add temporary events to the FIB notification chain

Subsequent patches are going to simplify the IPv6 route offload API,
which will only use three events - replace, delete and append.

Introduce a temporary version of replace and delete in order to make the
conversion easier to review. Note that append does not need a temporary
version, as it is currently not used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/fib_notifier.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index 6d59221ff05a..b3c54325caec 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -23,6 +23,8 @@ enum fib_event_type {
 	FIB_EVENT_NH_DEL,
 	FIB_EVENT_VIF_ADD,
 	FIB_EVENT_VIF_DEL,
+	FIB_EVENT_ENTRY_REPLACE_TMP,
+	FIB_EVENT_ENTRY_DEL_TMP,
 };
 
 struct fib_notifier_ops {
-- 
cgit v1.2.3


From d2f0c9b11410f9c6a07c126f8a215b0b81cdcf6c Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 23 Dec 2019 15:28:17 +0200
Subject: ipv6: Handle route deletion notification

For the purpose of route offload, when a single route is deleted, it is
only of interest if it is the first route in the node or if it is
sibling to such a route.

In the first case, distinguish between several possibilities:

1. Route is the last route in the node. Emit a delete notification

2. Route is followed by a non-multipath route. Emit a replace
notification for the non-multipath route.

3. Route is followed by a multipath route. Emit a replace notification
for the multipath route.

In the second case, only emit a delete notification to ensure the route
is no longer used as a valid nexthop.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_fib.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index f1535f172935..b579faea41e9 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -487,6 +487,7 @@ int call_fib6_multipath_entry_notifiers(struct net *net,
 					struct fib6_info *rt,
 					unsigned int nsiblings,
 					struct netlink_ext_ack *extack);
+int call_fib6_entry_notifiers_replace(struct net *net, struct fib6_info *rt);
 void fib6_rt_update(struct net *net, struct fib6_info *rt,
 		    struct nl_info *info);
 void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
-- 
cgit v1.2.3


From caafb2509fac1432849650826953dd88b7cbe374 Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 23 Dec 2019 15:28:20 +0200
Subject: ipv6: Remove old route notifications and convert listeners

Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/fib_notifier.h | 2 --
 1 file changed, 2 deletions(-)

(limited to 'include')

diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index b3c54325caec..6d59221ff05a 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -23,8 +23,6 @@ enum fib_event_type {
 	FIB_EVENT_NH_DEL,
 	FIB_EVENT_VIF_ADD,
 	FIB_EVENT_VIF_DEL,
-	FIB_EVENT_ENTRY_REPLACE_TMP,
-	FIB_EVENT_ENTRY_DEL_TMP,
 };
 
 struct fib_notifier_ops {
-- 
cgit v1.2.3


From 0e5dafc8a6e540c0145b61545c557c43be70af10 Mon Sep 17 00:00:00 2001
From: Richard Cochran <richardcochran@gmail.com>
Date: Wed, 25 Dec 2019 18:16:09 -0800
Subject: net: phy: Introduce helper functions for time stamping support.

Some parts of the networking stack and at least one driver test fields
within the 'struct phy_device' in order to query time stamping
capabilities and to invoke time stamping methods.  This patch adds a
functional interface around the time stamping fields.  This will allow
insulating the callers from future changes to the details of the time
stamping implemenation.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7d530b3f8855..0248f5e9939d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -936,6 +936,66 @@ static inline bool phy_polling_mode(struct phy_device *phydev)
 	return phydev->irq == PHY_POLL;
 }
 
+/**
+ * phy_has_hwtstamp - Tests whether a PHY time stamp configuration.
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_has_hwtstamp(struct phy_device *phydev)
+{
+	return phydev && phydev->drv && phydev->drv->hwtstamp;
+}
+
+/**
+ * phy_has_rxtstamp - Tests whether a PHY supports receive time stamping.
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_has_rxtstamp(struct phy_device *phydev)
+{
+	return phydev && phydev->drv && phydev->drv->rxtstamp;
+}
+
+/**
+ * phy_has_tsinfo - Tests whether a PHY reports time stamping and/or
+ * PTP hardware clock capabilities.
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_has_tsinfo(struct phy_device *phydev)
+{
+	return phydev && phydev->drv && phydev->drv->ts_info;
+}
+
+/**
+ * phy_has_txtstamp - Tests whether a PHY supports transmit time stamping.
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_has_txtstamp(struct phy_device *phydev)
+{
+	return phydev && phydev->drv && phydev->drv->txtstamp;
+}
+
+static inline int phy_hwtstamp(struct phy_device *phydev, struct ifreq *ifr)
+{
+	return phydev->drv->hwtstamp(phydev, ifr);
+}
+
+static inline bool phy_rxtstamp(struct phy_device *phydev, struct sk_buff *skb,
+				int type)
+{
+	return phydev->drv->rxtstamp(phydev, skb, type);
+}
+
+static inline int phy_ts_info(struct phy_device *phydev,
+			      struct ethtool_ts_info *tsinfo)
+{
+	return phydev->drv->ts_info(phydev, tsinfo);
+}
+
+static inline void phy_txtstamp(struct phy_device *phydev, struct sk_buff *skb,
+				int type)
+{
+	phydev->drv->txtstamp(phydev, skb, type);
+}
+
 /**
  * phy_is_internal - Convenience function for testing if a PHY is internal
  * @phydev: the phy_device struct
-- 
cgit v1.2.3


From 4715f65ffa0520af0680dbfbedbe349f175adaf4 Mon Sep 17 00:00:00 2001
From: Richard Cochran <richardcochran@gmail.com>
Date: Wed, 25 Dec 2019 18:16:15 -0800
Subject: net: Introduce a new MII time stamping interface.

Currently the stack supports time stamping in PHY devices.  However,
there are newer, non-PHY devices that can snoop an MII bus and provide
time stamps.  In order to support such devices, this patch introduces
a new interface to be used by both PHY and non-PHY devices.

In addition, the one and only user of the old PHY time stamping API is
converted to the new interface.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/mii_timestamper.h | 58 +++++++++++++++++++++++++++++++++++++++++
 include/linux/phy.h             | 41 +++++++----------------------
 2 files changed, 68 insertions(+), 31 deletions(-)
 create mode 100644 include/linux/mii_timestamper.h

(limited to 'include')

diff --git a/include/linux/mii_timestamper.h b/include/linux/mii_timestamper.h
new file mode 100644
index 000000000000..36002386029c
--- /dev/null
+++ b/include/linux/mii_timestamper.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Support for generic time stamping devices on MII buses.
+ * Copyright (C) 2018 Richard Cochran <richardcochran@gmail.com>
+ */
+#ifndef _LINUX_MII_TIMESTAMPER_H
+#define _LINUX_MII_TIMESTAMPER_H
+
+#include <linux/device.h>
+#include <linux/ethtool.h>
+#include <linux/skbuff.h>
+
+struct phy_device;
+
+/**
+ * struct mii_timestamper - Callback interface to MII time stamping devices.
+ *
+ * @rxtstamp:	Requests a Rx timestamp for 'skb'.  If the skb is accepted,
+ *		the MII time stamping device promises to deliver it using
+ *		netif_rx() as soon as a timestamp becomes available. One of
+ *		the PTP_CLASS_ values is passed in 'type'.  The function
+ *		must return true if the skb is accepted for delivery.
+ *
+ * @txtstamp:	Requests a Tx timestamp for 'skb'.  The MII time stamping
+ *		device promises to deliver it using skb_complete_tx_timestamp()
+ *		as soon as a timestamp becomes available. One of the PTP_CLASS_
+ *		values is passed in 'type'.
+ *
+ * @hwtstamp:	Handles SIOCSHWTSTAMP ioctl for hardware time stamping.
+ *
+ * @link_state: Allows the device to respond to changes in the link
+ *		state.  The caller invokes this function while holding
+ *		the phy_device mutex.
+ *
+ * @ts_info:	Handles ethtool queries for hardware time stamping.
+ *
+ * Drivers for PHY time stamping devices should embed their
+ * mii_timestamper within a private structure, obtaining a reference
+ * to it using container_of().
+ */
+struct mii_timestamper {
+	bool (*rxtstamp)(struct mii_timestamper *mii_ts,
+			 struct sk_buff *skb, int type);
+
+	void (*txtstamp)(struct mii_timestamper *mii_ts,
+			 struct sk_buff *skb, int type);
+
+	int  (*hwtstamp)(struct mii_timestamper *mii_ts,
+			 struct ifreq *ifreq);
+
+	void (*link_state)(struct mii_timestamper *mii_ts,
+			   struct phy_device *phydev);
+
+	int  (*ts_info)(struct mii_timestamper *mii_ts,
+			struct ethtool_ts_info *ts_info);
+};
+
+#endif
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0248f5e9939d..30e599c454db 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -17,6 +17,7 @@
 #include <linux/linkmode.h>
 #include <linux/mdio.h>
 #include <linux/mii.h>
+#include <linux/mii_timestamper.h>
 #include <linux/module.h>
 #include <linux/timer.h>
 #include <linux/workqueue.h>
@@ -441,6 +442,7 @@ struct phy_device {
 	struct sfp_bus *sfp_bus;
 	struct phylink *phylink;
 	struct net_device *attached_dev;
+	struct mii_timestamper *mii_ts;
 
 	u8 mdix;
 	u8 mdix_ctrl;
@@ -546,29 +548,6 @@ struct phy_driver {
 	 */
 	int (*match_phy_device)(struct phy_device *phydev);
 
-	/* Handles ethtool queries for hardware time stamping. */
-	int (*ts_info)(struct phy_device *phydev, struct ethtool_ts_info *ti);
-
-	/* Handles SIOCSHWTSTAMP ioctl for hardware time stamping. */
-	int  (*hwtstamp)(struct phy_device *phydev, struct ifreq *ifr);
-
-	/*
-	 * Requests a Rx timestamp for 'skb'. If the skb is accepted,
-	 * the phy driver promises to deliver it using netif_rx() as
-	 * soon as a timestamp becomes available. One of the
-	 * PTP_CLASS_ values is passed in 'type'. The function must
-	 * return true if the skb is accepted for delivery.
-	 */
-	bool (*rxtstamp)(struct phy_device *dev, struct sk_buff *skb, int type);
-
-	/*
-	 * Requests a Tx timestamp for 'skb'. The phy driver promises
-	 * to deliver it using skb_complete_tx_timestamp() as soon as a
-	 * timestamp becomes available. One of the PTP_CLASS_ values
-	 * is passed in 'type'.
-	 */
-	void (*txtstamp)(struct phy_device *dev, struct sk_buff *skb, int type);
-
 	/* Some devices (e.g. qnap TS-119P II) require PHY register changes to
 	 * enable Wake on LAN, so set_wol is provided to be called in the
 	 * ethernet driver's set_wol function. */
@@ -942,7 +921,7 @@ static inline bool phy_polling_mode(struct phy_device *phydev)
  */
 static inline bool phy_has_hwtstamp(struct phy_device *phydev)
 {
-	return phydev && phydev->drv && phydev->drv->hwtstamp;
+	return phydev && phydev->mii_ts && phydev->mii_ts->hwtstamp;
 }
 
 /**
@@ -951,7 +930,7 @@ static inline bool phy_has_hwtstamp(struct phy_device *phydev)
  */
 static inline bool phy_has_rxtstamp(struct phy_device *phydev)
 {
-	return phydev && phydev->drv && phydev->drv->rxtstamp;
+	return phydev && phydev->mii_ts && phydev->mii_ts->rxtstamp;
 }
 
 /**
@@ -961,7 +940,7 @@ static inline bool phy_has_rxtstamp(struct phy_device *phydev)
  */
 static inline bool phy_has_tsinfo(struct phy_device *phydev)
 {
-	return phydev && phydev->drv && phydev->drv->ts_info;
+	return phydev && phydev->mii_ts && phydev->mii_ts->ts_info;
 }
 
 /**
@@ -970,30 +949,30 @@ static inline bool phy_has_tsinfo(struct phy_device *phydev)
  */
 static inline bool phy_has_txtstamp(struct phy_device *phydev)
 {
-	return phydev && phydev->drv && phydev->drv->txtstamp;
+	return phydev && phydev->mii_ts && phydev->mii_ts->txtstamp;
 }
 
 static inline int phy_hwtstamp(struct phy_device *phydev, struct ifreq *ifr)
 {
-	return phydev->drv->hwtstamp(phydev, ifr);
+	return phydev->mii_ts->hwtstamp(phydev->mii_ts, ifr);
 }
 
 static inline bool phy_rxtstamp(struct phy_device *phydev, struct sk_buff *skb,
 				int type)
 {
-	return phydev->drv->rxtstamp(phydev, skb, type);
+	return phydev->mii_ts->rxtstamp(phydev->mii_ts, skb, type);
 }
 
 static inline int phy_ts_info(struct phy_device *phydev,
 			      struct ethtool_ts_info *tsinfo)
 {
-	return phydev->drv->ts_info(phydev, tsinfo);
+	return phydev->mii_ts->ts_info(phydev->mii_ts, tsinfo);
 }
 
 static inline void phy_txtstamp(struct phy_device *phydev, struct sk_buff *skb,
 				int type)
 {
-	phydev->drv->txtstamp(phydev, skb, type);
+	phydev->mii_ts->txtstamp(phydev->mii_ts, skb, type);
 }
 
 /**
-- 
cgit v1.2.3


From 767ff483731502a0fc34f34a3a0851aca175eb71 Mon Sep 17 00:00:00 2001
From: Richard Cochran <richardcochran@gmail.com>
Date: Wed, 25 Dec 2019 18:16:16 -0800
Subject: net: Add a layer for non-PHY MII time stamping drivers.

While PHY time stamping drivers can simply attach their interface
directly to the PHY instance, stand alone drivers require support in
order to manage their services.  Non-PHY MII time stamping drivers
have a control interface over another bus like I2C, SPI, UART, or via
a memory mapped peripheral.  The controller device will be associated
with one or more time stamping channels, each of which sits snoops in
on a MII bus.

This patch provides a glue layer that will enable time stamping
channels to find their controlling device.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/mii_timestamper.h | 63 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

(limited to 'include')

diff --git a/include/linux/mii_timestamper.h b/include/linux/mii_timestamper.h
index 36002386029c..fa940bbaf8ae 100644
--- a/include/linux/mii_timestamper.h
+++ b/include/linux/mii_timestamper.h
@@ -33,10 +33,15 @@ struct phy_device;
  *		the phy_device mutex.
  *
  * @ts_info:	Handles ethtool queries for hardware time stamping.
+ * @device:	Remembers the device to which the instance belongs.
  *
  * Drivers for PHY time stamping devices should embed their
  * mii_timestamper within a private structure, obtaining a reference
  * to it using container_of().
+ *
+ * Drivers for non-PHY time stamping devices should return a pointer
+ * to a mii_timestamper from the probe_channel() callback of their
+ * mii_timestamping_ctrl interface.
  */
 struct mii_timestamper {
 	bool (*rxtstamp)(struct mii_timestamper *mii_ts,
@@ -53,6 +58,64 @@ struct mii_timestamper {
 
 	int  (*ts_info)(struct mii_timestamper *mii_ts,
 			struct ethtool_ts_info *ts_info);
+
+	struct device *device;
+};
+
+/**
+ * struct mii_timestamping_ctrl - MII time stamping controller interface.
+ *
+ * @probe_channel:	Callback into the controller driver announcing the
+ *			presence of the 'port' channel.  The 'device' field
+ *			had been passed to register_mii_tstamp_controller().
+ *			The driver must return either a pointer to a valid
+ *			MII timestamper instance or PTR_ERR.
+ *
+ * @release_channel:	Releases an instance obtained via .probe_channel.
+ */
+struct mii_timestamping_ctrl {
+	struct mii_timestamper *(*probe_channel)(struct device *device,
+						 unsigned int port);
+	void (*release_channel)(struct device *device,
+				struct mii_timestamper *mii_ts);
 };
 
+#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
+
+int register_mii_tstamp_controller(struct device *device,
+				   struct mii_timestamping_ctrl *ctrl);
+
+void unregister_mii_tstamp_controller(struct device *device);
+
+struct mii_timestamper *register_mii_timestamper(struct device_node *node,
+						 unsigned int port);
+
+void unregister_mii_timestamper(struct mii_timestamper *mii_ts);
+
+#else
+
+static inline
+int register_mii_tstamp_controller(struct device *device,
+				   struct mii_timestamping_ctrl *ctrl)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void unregister_mii_tstamp_controller(struct device *device)
+{
+}
+
+static inline
+struct mii_timestamper *register_mii_timestamper(struct device_node *node,
+						 unsigned int port)
+{
+	return NULL;
+}
+
+static inline void unregister_mii_timestamper(struct mii_timestamper *mii_ts)
+{
+}
+
+#endif
+
 #endif
-- 
cgit v1.2.3


From b6fd7b96366769651ab23988607ce9c5c9042cdb Mon Sep 17 00:00:00 2001
From: Richard Cochran <richardcochran@gmail.com>
Date: Wed, 25 Dec 2019 18:16:19 -0800
Subject: net: Introduce peer to peer one step PTP time stamping.

The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages.  Up until now, hardware with P2P one step has
been rare, and kernel support was lacking.  This patch adds support of
the mode in anticipation of new hardware developments.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/net_tstamp.h | 8 ++++++++
 1 file changed, 8 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index e5b39721c6e4..f96e650d0af9 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -90,6 +90,14 @@ enum hwtstamp_tx_types {
 	 * queue.
 	 */
 	HWTSTAMP_TX_ONESTEP_SYNC,
+
+	/*
+	 * Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time
+	 * stamp insertion directly into PDelay_Resp packets. In this
+	 * case, neither transmitted Sync nor PDelay_Resp packets will
+	 * receive a time stamp via the socket error queue.
+	 */
+	HWTSTAMP_TX_ONESTEP_P2P,
 };
 
 /* possible values for hwtstamp_config->rx_filter */
-- 
cgit v1.2.3


From c14ceb0ec727187f71a487a592ffa91767fed66e Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw@strlen.de>
Date: Wed, 18 Dec 2019 12:05:21 +0100
Subject: netfilter: nft_meta: add support for slave device ifindex matching

Allow to match on vrf slave ifindex or name.

In case there was no slave interface involved, store 0 in the
destination register just like existing iif/oif matching.

sdif(name) is restricted to the ipv4/ipv6 input and forward hooks,
as it depends on ip(6) stack parsing/storing info in skb->cb[].

Cc: Martin Willi <martin@strongswan.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index bb9b049310df..e237ecbdcd8a 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -805,6 +805,8 @@ enum nft_exthdr_attributes {
  * @NFT_META_TIME_NS: time since epoch (in nanoseconds)
  * @NFT_META_TIME_DAY: day of week (from 0 = Sunday to 6 = Saturday)
  * @NFT_META_TIME_HOUR: hour of day (in seconds)
+ * @NFT_META_SDIF: slave device interface index
+ * @NFT_META_SDIFNAME: slave device interface name
  */
 enum nft_meta_keys {
 	NFT_META_LEN,
@@ -840,6 +842,8 @@ enum nft_meta_keys {
 	NFT_META_TIME_NS,
 	NFT_META_TIME_DAY,
 	NFT_META_TIME_HOUR,
+	NFT_META_SDIF,
+	NFT_META_SDIFNAME,
 };
 
 /**
-- 
cgit v1.2.3


From f643ee295c1c63bc117fb052d4da681354d6f732 Mon Sep 17 00:00:00 2001
From: Kevin Kou <qdkevin.kou@gmail.com>
Date: Thu, 26 Dec 2019 12:29:17 +0000
Subject: sctp: move trace_sctp_probe_path into sctp_outq_sack

The original patch bringed in the "SCTP ACK tracking trace event"
feature was committed at Dec.20, 2017, it replaced jprobe usage
with trace events, and bringed in two trace events, one is
TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
The original patch intended to trigger the trace_sctp_probe_path in
TRACE_EVENT(sctp_probe) as below code,

+TRACE_EVENT(sctp_probe,
+
+	TP_PROTO(const struct sctp_endpoint *ep,
+		 const struct sctp_association *asoc,
+		 struct sctp_chunk *chunk),
+
+	TP_ARGS(ep, asoc, chunk),
+
+	TP_STRUCT__entry(
+		__field(__u64, asoc)
+		__field(__u32, mark)
+		__field(__u16, bind_port)
+		__field(__u16, peer_port)
+		__field(__u32, pathmtu)
+		__field(__u32, rwnd)
+		__field(__u16, unack_data)
+	),
+
+	TP_fast_assign(
+		struct sk_buff *skb = chunk->skb;
+
+		__entry->asoc = (unsigned long)asoc;
+		__entry->mark = skb->mark;
+		__entry->bind_port = ep->base.bind_addr.port;
+		__entry->peer_port = asoc->peer.port;
+		__entry->pathmtu = asoc->pathmtu;
+		__entry->rwnd = asoc->peer.rwnd;
+		__entry->unack_data = asoc->unack_data;
+
+		if (trace_sctp_probe_path_enabled()) {
+			struct sctp_transport *sp;
+
+			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
+					    transports) {
+				trace_sctp_probe_path(sp, asoc);
+			}
+		}
+	),

But I found it did not work when I did testing, and trace_sctp_probe_path
had no output, I finally found that there is trace buffer lock
operation(trace_event_buffer_reserve) in include/trace/trace_events.h:

static notrace void							\
trace_event_raw_event_##call(void *__data, proto)			\
{									\
	struct trace_event_file *trace_file = __data;			\
	struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
	struct trace_event_buffer fbuffer;				\
	struct trace_event_raw_##call *entry;				\
	int __data_size;						\
									\
	if (trace_trigger_soft_disabled(trace_file))			\
		return;							\
									\
	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
									\
	entry = trace_event_buffer_reserve(&fbuffer, trace_file,	\
				 sizeof(*entry) + __data_size);		\
									\
	if (!entry)							\
		return;							\
									\
	tstruct								\
									\
	{ assign; }							\
									\
	trace_event_buffer_commit(&fbuffer);				\
}

The reason caused no output of trace_sctp_probe_path is that
trace_sctp_probe_path written in TP_fast_assign part of
TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
trace_event_buffer_reserve() when compiler expands Macro,

        entry = trace_event_buffer_reserve(&fbuffer, trace_file,        \
                                 sizeof(*entry) + __data_size);         \
                                                                        \
        if (!entry)                                                     \
                return;                                                 \
                                                                        \
        tstruct                                                         \
                                                                        \
        { assign; }                                                     \

so trace_sctp_probe_path finally can not acquire trace_event_buffer
and return no output, that is to say the nest of tracepoint entry function
is not allowed. The function call flow is:

trace_sctp_probe()
-> trace_event_raw_event_sctp_probe()
 -> lock buffer
 -> trace_sctp_probe_path()
   -> trace_event_raw_event_sctp_probe_path()  --nested
   -> buffer has been locked and return no output.

This patch is to remove trace_sctp_probe_path from the TP_fast_assign
part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
and trigger sctp_probe_path_trace in sctp_outq_sack.

After this patch, you can enable both events individually,
  # cd /sys/kernel/debug/tracing
  # echo 1 > events/sctp/sctp_probe/enable
  # echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

  # echo 1 > events/sctp/enable

Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/trace/events/sctp.h | 9 ---------
 1 file changed, 9 deletions(-)

(limited to 'include')

diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h
index 7475c7be165a..d4aac3436595 100644
--- a/include/trace/events/sctp.h
+++ b/include/trace/events/sctp.h
@@ -75,15 +75,6 @@ TRACE_EVENT(sctp_probe,
 		__entry->pathmtu = asoc->pathmtu;
 		__entry->rwnd = asoc->peer.rwnd;
 		__entry->unack_data = asoc->unack_data;
-
-		if (trace_sctp_probe_path_enabled()) {
-			struct sctp_transport *sp;
-
-			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
-					    transports) {
-				trace_sctp_probe_path(sp, asoc);
-			}
-		}
 	),
 
 	TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
-- 
cgit v1.2.3


From c1e469902640ed0a82d0b8df5951199c17a49e3c Mon Sep 17 00:00:00 2001
From: Andy Roulin <aroulin@cumulusnetworks.com>
Date: Thu, 26 Dec 2019 05:41:57 -0800
Subject: bonding: rename AD_STATE_* to LACP_STATE_*

As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.

Fixes: 826f66b30c2e3 ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_bonding.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 6829213a54c5..45f3750aa861 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -96,14 +96,14 @@
 #define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
 /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
-#define AD_STATE_LACP_ACTIVITY   0x1
-#define AD_STATE_LACP_TIMEOUT    0x2
-#define AD_STATE_AGGREGATION     0x4
-#define AD_STATE_SYNCHRONIZATION 0x8
-#define AD_STATE_COLLECTING      0x10
-#define AD_STATE_DISTRIBUTING    0x20
-#define AD_STATE_DEFAULTED       0x40
-#define AD_STATE_EXPIRED         0x80
+#define LACP_STATE_LACP_ACTIVITY   0x1
+#define LACP_STATE_LACP_TIMEOUT    0x2
+#define LACP_STATE_AGGREGATION     0x4
+#define LACP_STATE_SYNCHRONIZATION 0x8
+#define LACP_STATE_COLLECTING      0x10
+#define LACP_STATE_DISTRIBUTING    0x20
+#define LACP_STATE_DEFAULTED       0x40
+#define LACP_STATE_EXPIRED         0x80
 
 typedef struct ifbond {
 	__s32 bond_mode;
-- 
cgit v1.2.3


From 2b4a8990b7df55875745a80a609a1ceaaf51f322 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:18 +0100
Subject: ethtool: introduce ethtool netlink interface

Basic genetlink and init infrastructure for the netlink interface, register
genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
make the build optional. Add initial overall interface description into
Documentation/networking/ethtool-netlink.rst, further patches will add more
detailed information.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/ethtool_netlink.h      |  9 +++++++++
 include/uapi/linux/ethtool_netlink.h | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 include/linux/ethtool_netlink.h
 create mode 100644 include/uapi/linux/ethtool_netlink.h

(limited to 'include')

diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
new file mode 100644
index 000000000000..f27e92b5f344
--- /dev/null
+++ b/include/linux/ethtool_netlink.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _LINUX_ETHTOOL_NETLINK_H_
+#define _LINUX_ETHTOOL_NETLINK_H_
+
+#include <uapi/linux/ethtool_netlink.h>
+#include <linux/ethtool.h>
+
+#endif /* _LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
new file mode 100644
index 000000000000..3c93276ba066
--- /dev/null
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * include/uapi/linux/ethtool_netlink.h - netlink interface for ethtool
+ *
+ * See Documentation/networking/ethtool-netlink.txt in kernel source tree for
+ * doucumentation of the interface.
+ */
+
+#ifndef _UAPI_LINUX_ETHTOOL_NETLINK_H_
+#define _UAPI_LINUX_ETHTOOL_NETLINK_H_
+
+#include <linux/ethtool.h>
+
+/* message types - userspace to kernel */
+enum {
+	ETHTOOL_MSG_USER_NONE,
+
+	/* add new constants above here */
+	__ETHTOOL_MSG_USER_CNT,
+	ETHTOOL_MSG_USER_MAX = __ETHTOOL_MSG_USER_CNT - 1
+};
+
+/* message types - kernel to userspace */
+enum {
+	ETHTOOL_MSG_KERNEL_NONE,
+
+	/* add new constants above here */
+	__ETHTOOL_MSG_KERNEL_CNT,
+	ETHTOOL_MSG_KERNEL_MAX = __ETHTOOL_MSG_KERNEL_CNT - 1
+};
+
+/* generic netlink info */
+#define ETHTOOL_GENL_NAME "ethtool"
+#define ETHTOOL_GENL_VERSION 1
+
+#endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
-- 
cgit v1.2.3


From 041b1c5d4a53e97fc9e029ae32469552ca12cb9b Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:23 +0100
Subject: ethtool: helper functions for netlink interface

Add common request/reply header definition and helpers to parse request
header and fill reply header. Provide ethnl_update_* helpers to update
structure members from request attributes (to be used for *_SET requests).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 3c93276ba066..82fc3b5f41c9 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -29,6 +29,27 @@ enum {
 	ETHTOOL_MSG_KERNEL_MAX = __ETHTOOL_MSG_KERNEL_CNT - 1
 };
 
+/* request header */
+
+/* use compact bitsets in reply */
+#define ETHTOOL_FLAG_COMPACT_BITSETS	(1 << 0)
+/* provide optional reply for SET or ACT requests */
+#define ETHTOOL_FLAG_OMIT_REPLY	(1 << 1)
+
+#define ETHTOOL_FLAG_ALL (ETHTOOL_FLAG_COMPACT_BITSETS | \
+			  ETHTOOL_FLAG_OMIT_REPLY)
+
+enum {
+	ETHTOOL_A_HEADER_UNSPEC,
+	ETHTOOL_A_HEADER_DEV_INDEX,		/* u32 */
+	ETHTOOL_A_HEADER_DEV_NAME,		/* string */
+	ETHTOOL_A_HEADER_FLAGS,			/* u32 - ETHTOOL_FLAG_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_HEADER_CNT,
+	ETHTOOL_A_HEADER_MAX = __ETHTOOL_A_HEADER_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From 10b518d4e6dd5390e40f7d8de0f08753c1195a7e Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:28 +0100
Subject: ethtool: netlink bitset handling

The ethtool netlink code uses common framework for passing arbitrary
length bit sets to allow future extensions. A bitset can be a list (only
one bitmap) or can consist of value and mask pair (used e.g. when client
want to modify only some bits). A bitset can use one of two formats:
verbose (bit by bit) or compact.

Verbose format consists of bitset size (number of bits), list flag and
an array of bit nests, telling which bits are part of the list or which
bits are in the mask and which of them are to be set. In requests, bits
can be identified by index (position) or by name. In replies, kernel
provides both index and name. Verbose format is suitable for "one shot"
applications like standard ethtool command as it avoids the need to
either keep bit names (e.g. link modes) in sync with kernel or having to
add an extra roundtrip for string set request (e.g. for private flags).

Compact format uses one (list) or two (value/mask) arrays of 32-bit
words to store the bitmap(s). It is more suitable for long running
applications (ethtool in monitor mode or network management daemons)
which can retrieve the names once and then pass only compact bitmaps to
save space.

Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS
flag in request header tells kernel which format to use in reply.
Notifications always use compact format.

As some code uses arrays of unsigned long for internal representation and
some arrays of u32 (or even a single u32), two sets of parse/compose
helpers are introduced. To avoid code duplication, helpers for unsigned
long arrays are implemented as wrappers around helpers for u32 arrays.
There are two reasons for this choice: (1) u32 arrays are more frequent in
ethtool code and (2) unsigned long array can be always interpreted as an
u32 array on little endian 64-bit and all 32-bit architectures while we
would need special handling for odd number of u32 words in the opposite
direction.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 82fc3b5f41c9..951203049615 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -50,6 +50,41 @@ enum {
 	ETHTOOL_A_HEADER_MAX = __ETHTOOL_A_HEADER_CNT - 1
 };
 
+/* bit sets */
+
+enum {
+	ETHTOOL_A_BITSET_BIT_UNSPEC,
+	ETHTOOL_A_BITSET_BIT_INDEX,		/* u32 */
+	ETHTOOL_A_BITSET_BIT_NAME,		/* string */
+	ETHTOOL_A_BITSET_BIT_VALUE,		/* flag */
+
+	/* add new constants above here */
+	__ETHTOOL_A_BITSET_BIT_CNT,
+	ETHTOOL_A_BITSET_BIT_MAX = __ETHTOOL_A_BITSET_BIT_CNT - 1
+};
+
+enum {
+	ETHTOOL_A_BITSET_BITS_UNSPEC,
+	ETHTOOL_A_BITSET_BITS_BIT,		/* nest - _A_BITSET_BIT_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_BITSET_BITS_CNT,
+	ETHTOOL_A_BITSET_BITS_MAX = __ETHTOOL_A_BITSET_BITS_CNT - 1
+};
+
+enum {
+	ETHTOOL_A_BITSET_UNSPEC,
+	ETHTOOL_A_BITSET_NOMASK,		/* flag */
+	ETHTOOL_A_BITSET_SIZE,			/* u32 */
+	ETHTOOL_A_BITSET_BITS,			/* nest - _A_BITSET_BITS_* */
+	ETHTOOL_A_BITSET_VALUE,			/* binary */
+	ETHTOOL_A_BITSET_MASK,			/* binary */
+
+	/* add new constants above here */
+	__ETHTOOL_A_BITSET_CNT,
+	ETHTOOL_A_BITSET_MAX = __ETHTOOL_A_BITSET_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From 6b08d6c146f4c5ed451c45339c10feb06d619db2 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:33 +0100
Subject: ethtool: support for netlink notifications

Add infrastructure for ethtool netlink notifications. There is only one
multicast group "monitor" which is used to notify userspace about changes
and actions performed. Notification messages (types using suffix _NTF)
share the format with replies to GET requests.

Notifications are supposed to be broadcasted on every configuration change,
whether it is done using the netlink interface or ioctl one. Netlink SET
requests only trigger a notification if some data is actually changed.

To trigger an ethtool notification, both ethtool netlink and external code
use ethtool_notify() helper. This helper requires RTNL to be held and may
sleep. Handlers sending messages for specific notification message types
are registered in ethnl_notify_handlers array. As notifications can be
triggered from other code, ethnl_ok flag is used to prevent an attempt to
send notification before genetlink family is registered.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/ethtool_netlink.h      | 5 +++++
 include/linux/netdevice.h            | 9 +++++++++
 include/uapi/linux/ethtool_netlink.h | 2 ++
 3 files changed, 16 insertions(+)

(limited to 'include')

diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
index f27e92b5f344..c98f6852c8eb 100644
--- a/include/linux/ethtool_netlink.h
+++ b/include/linux/ethtool_netlink.h
@@ -5,5 +5,10 @@
 
 #include <uapi/linux/ethtool_netlink.h>
 #include <linux/ethtool.h>
+#include <linux/netdevice.h>
+
+enum ethtool_multicast_groups {
+	ETHNL_MCGRP_MONITOR,
+};
 
 #endif /* _LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 469a297b58c0..f007155ae8f4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4393,6 +4393,15 @@ struct netdev_notifier_bonding_info {
 void netdev_bonding_info_change(struct net_device *dev,
 				struct netdev_bonding_info *bonding_info);
 
+#if IS_ENABLED(CONFIG_ETHTOOL_NETLINK)
+void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data);
+#else
+static inline void ethtool_notify(struct net_device *dev, unsigned int cmd,
+				  const void *data)
+{
+}
+#endif
+
 static inline
 struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features)
 {
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 951203049615..d530ccb6f7e6 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -89,4 +89,6 @@ enum {
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
 
+#define ETHTOOL_MCGRP_MONITOR_NAME "monitor"
+
 #endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
-- 
cgit v1.2.3


From 71921690f9745fef60a2bad425f30adf8cdc9da0 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:43 +0100
Subject: ethtool: provide string sets with STRSET_GET request

Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.

There are three types of requests:

  - no NLM_F_DUMP, no device: get "global" stringsets
  - no NLM_F_DUMP, with device: get string sets related to the device
  - NLM_F_DUMP, no device: get device related string sets for all devices

Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool.h         |  3 ++
 include/uapi/linux/ethtool_netlink.h | 56 ++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index f44155840b07..116bcbf09c74 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -606,6 +606,9 @@ enum ethtool_stringset {
 	ETH_SS_PHY_STATS,
 	ETH_SS_PHY_TUNABLES,
 	ETH_SS_LINK_MODES,
+
+	/* add new constants above here */
+	ETH_SS_COUNT
 };
 
 /**
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d530ccb6f7e6..cabef1fec42a 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -14,6 +14,7 @@
 /* message types - userspace to kernel */
 enum {
 	ETHTOOL_MSG_USER_NONE,
+	ETHTOOL_MSG_STRSET_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -23,6 +24,7 @@ enum {
 /* message types - kernel to userspace */
 enum {
 	ETHTOOL_MSG_KERNEL_NONE,
+	ETHTOOL_MSG_STRSET_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -85,6 +87,60 @@ enum {
 	ETHTOOL_A_BITSET_MAX = __ETHTOOL_A_BITSET_CNT - 1
 };
 
+/* string sets */
+
+enum {
+	ETHTOOL_A_STRING_UNSPEC,
+	ETHTOOL_A_STRING_INDEX,			/* u32 */
+	ETHTOOL_A_STRING_VALUE,			/* string */
+
+	/* add new constants above here */
+	__ETHTOOL_A_STRING_CNT,
+	ETHTOOL_A_STRING_MAX = __ETHTOOL_A_STRING_CNT - 1
+};
+
+enum {
+	ETHTOOL_A_STRINGS_UNSPEC,
+	ETHTOOL_A_STRINGS_STRING,		/* nest - _A_STRINGS_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_STRINGS_CNT,
+	ETHTOOL_A_STRINGS_MAX = __ETHTOOL_A_STRINGS_CNT - 1
+};
+
+enum {
+	ETHTOOL_A_STRINGSET_UNSPEC,
+	ETHTOOL_A_STRINGSET_ID,			/* u32 */
+	ETHTOOL_A_STRINGSET_COUNT,		/* u32 */
+	ETHTOOL_A_STRINGSET_STRINGS,		/* nest - _A_STRINGS_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_STRINGSET_CNT,
+	ETHTOOL_A_STRINGSET_MAX = __ETHTOOL_A_STRINGSET_CNT - 1
+};
+
+enum {
+	ETHTOOL_A_STRINGSETS_UNSPEC,
+	ETHTOOL_A_STRINGSETS_STRINGSET,		/* nest - _A_STRINGSET_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_STRINGSETS_CNT,
+	ETHTOOL_A_STRINGSETS_MAX = __ETHTOOL_A_STRINGSETS_CNT - 1
+};
+
+/* STRSET */
+
+enum {
+	ETHTOOL_A_STRSET_UNSPEC,
+	ETHTOOL_A_STRSET_HEADER,		/* nest - _A_HEADER_* */
+	ETHTOOL_A_STRSET_STRINGSETS,		/* nest - _A_STRINGSETS_* */
+	ETHTOOL_A_STRSET_COUNTS_ONLY,		/* flag */
+
+	/* add new constants above here */
+	__ETHTOOL_A_STRSET_CNT,
+	ETHTOOL_A_STRSET_MAX = __ETHTOOL_A_STRSET_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From 459e0b81b37043545d90629fdfb243444151e77d Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:48 +0100
Subject: ethtool: provide link settings with LINKINFO_GET request

Implement LINKINFO_GET netlink request to get basic link settings provided
by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands.

This request provides settings not directly related to autonegotiation and
link mode selection: physical port, phy MDIO address, MDI(-X) status,
MDI(-X) control and transceiver.

LINKINFO_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index cabef1fec42a..1966532993e5 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -15,6 +15,7 @@
 enum {
 	ETHTOOL_MSG_USER_NONE,
 	ETHTOOL_MSG_STRSET_GET,
+	ETHTOOL_MSG_LINKINFO_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -25,6 +26,7 @@ enum {
 enum {
 	ETHTOOL_MSG_KERNEL_NONE,
 	ETHTOOL_MSG_STRSET_GET_REPLY,
+	ETHTOOL_MSG_LINKINFO_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -141,6 +143,22 @@ enum {
 	ETHTOOL_A_STRSET_MAX = __ETHTOOL_A_STRSET_CNT - 1
 };
 
+/* LINKINFO */
+
+enum {
+	ETHTOOL_A_LINKINFO_UNSPEC,
+	ETHTOOL_A_LINKINFO_HEADER,		/* nest - _A_HEADER_* */
+	ETHTOOL_A_LINKINFO_PORT,		/* u8 */
+	ETHTOOL_A_LINKINFO_PHYADDR,		/* u8 */
+	ETHTOOL_A_LINKINFO_TP_MDIX,		/* u8 */
+	ETHTOOL_A_LINKINFO_TP_MDIX_CTRL,	/* u8 */
+	ETHTOOL_A_LINKINFO_TRANSCEIVER,		/* u8 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_LINKINFO_CNT,
+	ETHTOOL_A_LINKINFO_MAX = __ETHTOOL_A_LINKINFO_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From a53f3d41e4d3df46aba254ba51e7fbe45470fece Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:55:53 +0100
Subject: ethtool: set link settings with LINKINFO_SET request

Implement LINKINFO_SET netlink request to set link settings queried by
LINKINFO_GET message.

Only physical port, phy MDIO address and MDI(-X) control can be set,
attempt to modify MDI(-X) status and transceiver is rejected.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 1966532993e5..5b7806a5bef8 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -16,6 +16,7 @@ enum {
 	ETHTOOL_MSG_USER_NONE,
 	ETHTOOL_MSG_STRSET_GET,
 	ETHTOOL_MSG_LINKINFO_GET,
+	ETHTOOL_MSG_LINKINFO_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
-- 
cgit v1.2.3


From 73286734c1b0d009fd317c2a74e173cb22233dad Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:56:03 +0100
Subject: ethtool: add LINKINFO_NTF notification

Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link
settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or
ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKINFO_GET
request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKINFO_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 5b7806a5bef8..d530fa30de36 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -28,6 +28,7 @@ enum {
 	ETHTOOL_MSG_KERNEL_NONE,
 	ETHTOOL_MSG_STRSET_GET_REPLY,
 	ETHTOOL_MSG_LINKINFO_GET_REPLY,
+	ETHTOOL_MSG_LINKINFO_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
-- 
cgit v1.2.3


From f625aa9be8c10f2e4dc677837e240730a25feda7 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:56:08 +0100
Subject: ethtool: provide link mode information with LINKMODES_GET request

Implement LINKMODES_GET netlink request to get link modes related
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.

This request provides supported, advertised and peer advertised link modes,
autonegotiation flag, speed and duplex.

LINKMODES_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/ethtool_netlink.h      |  3 +++
 include/uapi/linux/ethtool_netlink.h | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)

(limited to 'include')

diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
index c98f6852c8eb..d01b77887f82 100644
--- a/include/linux/ethtool_netlink.h
+++ b/include/linux/ethtool_netlink.h
@@ -7,6 +7,9 @@
 #include <linux/ethtool.h>
 #include <linux/netdevice.h>
 
+#define __ETHTOOL_LINK_MODE_MASK_NWORDS \
+	DIV_ROUND_UP(__ETHTOOL_LINK_MODE_MASK_NBITS, 32)
+
 enum ethtool_multicast_groups {
 	ETHNL_MCGRP_MONITOR,
 };
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d530fa30de36..dc1cae052eee 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -17,6 +17,7 @@ enum {
 	ETHTOOL_MSG_STRSET_GET,
 	ETHTOOL_MSG_LINKINFO_GET,
 	ETHTOOL_MSG_LINKINFO_SET,
+	ETHTOOL_MSG_LINKMODES_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -29,6 +30,7 @@ enum {
 	ETHTOOL_MSG_STRSET_GET_REPLY,
 	ETHTOOL_MSG_LINKINFO_GET_REPLY,
 	ETHTOOL_MSG_LINKINFO_NTF,
+	ETHTOOL_MSG_LINKMODES_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -161,6 +163,22 @@ enum {
 	ETHTOOL_A_LINKINFO_MAX = __ETHTOOL_A_LINKINFO_CNT - 1
 };
 
+/* LINKMODES */
+
+enum {
+	ETHTOOL_A_LINKMODES_UNSPEC,
+	ETHTOOL_A_LINKMODES_HEADER,		/* nest - _A_HEADER_* */
+	ETHTOOL_A_LINKMODES_AUTONEG,		/* u8 */
+	ETHTOOL_A_LINKMODES_OURS,		/* bitset */
+	ETHTOOL_A_LINKMODES_PEER,		/* bitset */
+	ETHTOOL_A_LINKMODES_SPEED,		/* u32 */
+	ETHTOOL_A_LINKMODES_DUPLEX,		/* u8 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_LINKMODES_CNT,
+	ETHTOOL_A_LINKMODES_MAX = __ETHTOOL_A_LINKMODES_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From bfbcfe2032e70bd8598d680d39ac177d507e39ac Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:56:13 +0100
Subject: ethtool: set link modes related data with LINKMODES_SET request

Implement LINKMODES_SET netlink request to set advertised linkmodes and
related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do.

The request allows setting autonegotiation flag, speed, duplex and
advertised link modes.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index dc1cae052eee..cddf978b98df 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -18,6 +18,7 @@ enum {
 	ETHTOOL_MSG_LINKINFO_GET,
 	ETHTOOL_MSG_LINKINFO_SET,
 	ETHTOOL_MSG_LINKMODES_GET,
+	ETHTOOL_MSG_LINKMODES_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
-- 
cgit v1.2.3


From 1b1b1847c8505df2e1dd8804838526bed22a8bd4 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:56:18 +0100
Subject: ethtool: add LINKMODES_NTF notification

Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link
settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET
netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKMODES_GET
request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKMODES_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index cddf978b98df..35948df6d6e3 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -32,6 +32,7 @@ enum {
 	ETHTOOL_MSG_LINKINFO_GET_REPLY,
 	ETHTOOL_MSG_LINKINFO_NTF,
 	ETHTOOL_MSG_LINKMODES_GET_REPLY,
+	ETHTOOL_MSG_LINKMODES_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
-- 
cgit v1.2.3


From 3d2b847fb99cf2b28aa046e486636e555bc6ed1c Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Fri, 27 Dec 2019 15:56:23 +0100
Subject: ethtool: provide link state with LINKSTATE_GET request

Implement LINKSTATE_GET netlink request to get link state information.

At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
is returned.

LINKSTATE_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 35948df6d6e3..02f82f42a889 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -19,6 +19,7 @@ enum {
 	ETHTOOL_MSG_LINKINFO_SET,
 	ETHTOOL_MSG_LINKMODES_GET,
 	ETHTOOL_MSG_LINKMODES_SET,
+	ETHTOOL_MSG_LINKSTATE_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -33,6 +34,7 @@ enum {
 	ETHTOOL_MSG_LINKINFO_NTF,
 	ETHTOOL_MSG_LINKMODES_GET_REPLY,
 	ETHTOOL_MSG_LINKMODES_NTF,
+	ETHTOOL_MSG_LINKSTATE_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -181,6 +183,18 @@ enum {
 	ETHTOOL_A_LINKMODES_MAX = __ETHTOOL_A_LINKMODES_CNT - 1
 };
 
+/* LINKSTATE */
+
+enum {
+	ETHTOOL_A_LINKSTATE_UNSPEC,
+	ETHTOOL_A_LINKSTATE_HEADER,		/* nest - _A_HEADER_* */
+	ETHTOOL_A_LINKSTATE_LINK,		/* u8 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_LINKSTATE_CNT,
+	ETHTOOL_A_LINKSTATE_MAX = __ETHTOOL_A_LINKSTATE_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From 544fed47af4d2174ac0b550e9c8da15c2dfdb117 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 27 Dec 2019 15:02:28 +0200
Subject: ptp: introduce ptp_cancel_worker_sync

In order to effectively use the PTP kernel thread for tasks such as
timestamping packets, allow the user control over stopping it, which is
needed e.g. when the timestamping queues must be drained.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/ptp_clock_kernel.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index 93cc4f1d444a..c64a1ef87240 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -243,6 +243,13 @@ int ptp_find_pin(struct ptp_clock *ptp,
 
 int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay);
 
+/**
+ * ptp_cancel_worker_sync() - cancel ptp auxiliary clock
+ *
+ * @ptp:     The clock obtained from ptp_clock_register().
+ */
+void ptp_cancel_worker_sync(struct ptp_clock *ptp);
+
 #else
 static inline struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
 						   struct device *parent)
@@ -260,6 +267,8 @@ static inline int ptp_find_pin(struct ptp_clock *ptp,
 static inline int ptp_schedule_worker(struct ptp_clock *ptp,
 				      unsigned long delay)
 { return -EOPNOTSUPP; }
+static inline void ptp_cancel_worker_sync(struct ptp_clock *ptp)
+{ }
 
 #endif
 
-- 
cgit v1.2.3


From 1e762bd278d2a70bc74b9cbee7f1e93bd4704fe2 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 27 Dec 2019 15:02:29 +0200
Subject: net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX
 timestamping

And move the queue of skb's waiting for RX timestamps into the ptp_data
structure, since it isn't needed if PTP is not compiled.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/dsa/sja1105.h | 2 --
 1 file changed, 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h
index 897e799dbcb9..c0b6a603ea8c 100644
--- a/include/linux/dsa/sja1105.h
+++ b/include/linux/dsa/sja1105.h
@@ -37,8 +37,6 @@
  * the structure defined in struct sja1105_private.
  */
 struct sja1105_tagger_data {
-	struct sk_buff_head skb_rxtstamp_queue;
-	struct work_struct rxtstamp_work;
 	struct sk_buff *stampable_skb;
 	/* Protects concurrent access to the meta state machine
 	 * from taggers running on multiple ports on SMP systems
-- 
cgit v1.2.3


From 68e039f966cb577c91649a02591646ac3919f8c9 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven@narfation.org>
Date: Wed, 1 Jan 2020 00:00:01 +0100
Subject: batman-adv: Update copyright years for 2020

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
---
 include/uapi/linux/batadv_packet.h | 2 +-
 include/uapi/linux/batman_adv.h    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/batadv_packet.h b/include/uapi/linux/batadv_packet.h
index 2a15f01c2243..0ae34c85ef9e 100644
--- a/include/uapi/linux/batadv_packet.h
+++ b/include/uapi/linux/batadv_packet.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
-/* Copyright (C) 2007-2019  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2007-2020  B.A.T.M.A.N. contributors:
  *
  * Marek Lindner, Simon Wunderlich
  */
diff --git a/include/uapi/linux/batman_adv.h b/include/uapi/linux/batman_adv.h
index 67f4636758af..617c180ff0c8 100644
--- a/include/uapi/linux/batman_adv.h
+++ b/include/uapi/linux/batman_adv.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: MIT */
-/* Copyright (C) 2016-2019  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2016-2020  B.A.T.M.A.N. contributors:
  *
  * Matthias Schiffer
  */
-- 
cgit v1.2.3


From dea53bb80e07b9e1641b865493908c20cb8df2ac Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Mon, 30 Dec 2019 14:14:28 -0800
Subject: tcp: Add l3index to tcp_md5sig_key and md5 functions

Add l3index to tcp_md5sig_key to represent the L3 domain of a key, and
add l3index to tcp_md5_do_add and tcp_md5_do_del to fill in the key.

With the key now based on an l3index, add the new parameter to the
lookup functions and consider the l3index when looking for a match.

The l3index comes from the skb when processing ingress packets leveraging
the helpers created for socket lookups, tcp_v4_sdif and inet_iif (and the
v6 variants). When the sdif index is set it means the packet ingressed a
device that is part of an L3 domain and inet_iif points to the VRF device.
For egress, the L3 domain is determined from the socket binding and
sk_bound_dev_if.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tcp.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

(limited to 'include')

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e460ea7f767b..7df37e2fddca 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1532,8 +1532,9 @@ struct tcp_md5sig_key {
 	struct hlist_node	node;
 	u8			keylen;
 	u8			family; /* AF_INET or AF_INET6 */
-	union tcp_md5_addr	addr;
 	u8			prefixlen;
+	union tcp_md5_addr	addr;
+	int			l3index; /* set if key added with L3 scope */
 	u8			key[TCP_MD5SIG_MAXKEYLEN];
 	struct rcu_head		rcu;
 };
@@ -1577,34 +1578,33 @@ struct tcp_md5sig_pool {
 int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key,
 			const struct sock *sk, const struct sk_buff *skb);
 int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
-		   int family, u8 prefixlen, const u8 *newkey, u8 newkeylen,
-		   gfp_t gfp);
+		   int family, u8 prefixlen, int l3index,
+		   const u8 *newkey, u8 newkeylen, gfp_t gfp);
 int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr,
-		   int family, u8 prefixlen);
+		   int family, u8 prefixlen, int l3index);
 struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk,
 					 const struct sock *addr_sk);
 
 #ifdef CONFIG_TCP_MD5SIG
 #include <linux/jump_label.h>
 extern struct static_key_false tcp_md5_needed;
-struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk,
+struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, int l3index,
 					   const union tcp_md5_addr *addr,
 					   int family);
 static inline struct tcp_md5sig_key *
-tcp_md5_do_lookup(const struct sock *sk,
-		  const union tcp_md5_addr *addr,
-		  int family)
+tcp_md5_do_lookup(const struct sock *sk, int l3index,
+		  const union tcp_md5_addr *addr, int family)
 {
 	if (!static_branch_unlikely(&tcp_md5_needed))
 		return NULL;
-	return __tcp_md5_do_lookup(sk, addr, family);
+	return __tcp_md5_do_lookup(sk, l3index, addr, family);
 }
 
 #define tcp_twsk_md5_key(twsk)	((twsk)->tw_md5_key)
 #else
-static inline struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk,
-					 const union tcp_md5_addr *addr,
-					 int family)
+static inline struct tcp_md5sig_key *
+tcp_md5_do_lookup(const struct sock *sk, int l3index,
+		  const union tcp_md5_addr *addr, int family)
 {
 	return NULL;
 }
-- 
cgit v1.2.3


From 6b102db50cdde3ba2f78631ed21222edf3a5fb51 Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Mon, 30 Dec 2019 14:14:29 -0800
Subject: net: Add device index to tcp_md5sig

Add support for userspace to specify a device index to limit the scope
of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad
is renamed to tcpm_ifindex and the new field is only checked if the new
TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index
must point to an L3 master device (e.g., VRF). The API and error
handling are setup to allow the constraint to be relaxed in the future
to any device index.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/tcp.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 74af1f759cee..d87184e673ca 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -317,14 +317,15 @@ enum {
 #define TCP_MD5SIG_MAXKEYLEN	80
 
 /* tcp_md5sig extension flags for TCP_MD5SIG_EXT */
-#define TCP_MD5SIG_FLAG_PREFIX		1	/* address prefix length */
+#define TCP_MD5SIG_FLAG_PREFIX		0x1	/* address prefix length */
+#define TCP_MD5SIG_FLAG_IFINDEX		0x2	/* ifindex set */
 
 struct tcp_md5sig {
 	struct __kernel_sockaddr_storage tcpm_addr;	/* address associated */
 	__u8	tcpm_flags;				/* extension flags */
 	__u8	tcpm_prefixlen;				/* address prefix */
 	__u16	tcpm_keylen;				/* key length */
-	__u32	__tcpm_pad;				/* zero */
+	int	tcpm_ifindex;				/* device index for scope */
 	__u8	tcpm_key[TCP_MD5SIG_MAXKEYLEN];		/* key (binary) */
 };
 
-- 
cgit v1.2.3


From b39c78b2aa09cae05f3a48c11f67b3add0d604de Mon Sep 17 00:00:00 2001
From: Li RongQing <lirongqing@baidu.com>
Date: Fri, 3 Jan 2020 11:51:00 +0800
Subject: net: remove the check argument from __skb_gro_checksum_convert

The argument is always ignored, so remove it.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2fd19fb8826d..2741aa35bec6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2826,16 +2826,16 @@ static inline bool __skb_gro_checksum_convert_check(struct sk_buff *skb)
 }
 
 static inline void __skb_gro_checksum_convert(struct sk_buff *skb,
-					      __sum16 check, __wsum pseudo)
+					      __wsum pseudo)
 {
 	NAPI_GRO_CB(skb)->csum = ~pseudo;
 	NAPI_GRO_CB(skb)->csum_valid = 1;
 }
 
-#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo)	\
+#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo)	\
 do {									\
 	if (__skb_gro_checksum_convert_check(skb))			\
-		__skb_gro_checksum_convert(skb, check,			\
+		__skb_gro_checksum_convert(skb, 			\
 					   compute_pseudo(skb, proto));	\
 } while (0)
 
-- 
cgit v1.2.3


From 36278a5d4d354e5d5610aa728831db9e03cc3d8d Mon Sep 17 00:00:00 2001
From: Alain Michaud <alainm@chromium.org>
Date: Wed, 11 Dec 2019 01:54:43 +0000
Subject: Bluetooth: Adding a bt_dev_warn_ratelimited macro.

The macro will be used to display rate limited warning messages in the
log.

Signed-off-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/bluetooth/bluetooth.h | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index fabee6db0abb..bd2675266859 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -129,6 +129,8 @@ void bt_warn(const char *fmt, ...);
 __printf(1, 2)
 void bt_err(const char *fmt, ...);
 __printf(1, 2)
+void bt_warn_ratelimited(const char *fmt, ...);
+__printf(1, 2)
 void bt_err_ratelimited(const char *fmt, ...);
 
 #define BT_INFO(fmt, ...)	bt_info(fmt "\n", ##__VA_ARGS__)
@@ -147,6 +149,8 @@ void bt_err_ratelimited(const char *fmt, ...);
 #define bt_dev_dbg(hdev, fmt, ...)				\
 	BT_DBG("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 
+#define bt_dev_warn_ratelimited(hdev, fmt, ...)			\
+	bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 #define bt_dev_err_ratelimited(hdev, fmt, ...)			\
 	BT_ERR_RATELIMITED("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 
-- 
cgit v1.2.3


From 657cc646475b721f5c5bab82e7fd43302c7c8358 Mon Sep 17 00:00:00 2001
From: Marcel Holtmann <marcel@holtmann.org>
Date: Wed, 11 Dec 2019 11:34:36 +0100
Subject: Bluetooth: Remove usage of BT_ERR_RATELIMITED macro

The macro is really not needed and can be replaced with either usage of
bt_err_ratelimited or bt_dev_err_ratelimited.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
---
 include/net/bluetooth/bluetooth.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index bd2675266859..e42bb8e03c09 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -138,8 +138,6 @@ void bt_err_ratelimited(const char *fmt, ...);
 #define BT_ERR(fmt, ...)	bt_err(fmt "\n", ##__VA_ARGS__)
 #define BT_DBG(fmt, ...)	pr_debug(fmt "\n", ##__VA_ARGS__)
 
-#define BT_ERR_RATELIMITED(fmt, ...) bt_err_ratelimited(fmt "\n", ##__VA_ARGS__)
-
 #define bt_dev_info(hdev, fmt, ...)				\
 	BT_INFO("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 #define bt_dev_warn(hdev, fmt, ...)				\
@@ -152,7 +150,7 @@ void bt_err_ratelimited(const char *fmt, ...);
 #define bt_dev_warn_ratelimited(hdev, fmt, ...)			\
 	bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 #define bt_dev_err_ratelimited(hdev, fmt, ...)			\
-	BT_ERR_RATELIMITED("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
+	bt_err_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__)
 
 /* Connection and socket states */
 enum {
-- 
cgit v1.2.3


From 1efd927d660e6ab02a9cd32fbbe3c7dc47980132 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Thu, 2 Jan 2020 15:00:55 -0800
Subject: Bluetooth: Add support for LE PHY Update Complete event

This handles LE PHY Update Complete event and store both tx_phy and
rx_phy into hci_conn.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/bluetooth/hci.h      | 8 ++++++++
 include/net/bluetooth/hci_core.h | 2 ++
 2 files changed, 10 insertions(+)

(limited to 'include')

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 5bc1e30dedde..07b6ecedc6ce 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -2186,6 +2186,14 @@ struct hci_ev_le_direct_adv_info {
 	__s8	 rssi;
 } __packed;
 
+#define HCI_EV_LE_PHY_UPDATE_COMPLETE	0x0c
+struct hci_ev_le_phy_update_complete {
+	__u8  status;
+	__u16 handle;
+	__u8  tx_phy;
+	__u8  rx_phy;
+} __packed;
+
 #define HCI_EV_LE_EXT_ADV_REPORT    0x0d
 struct hci_ev_le_ext_adv_report {
 	__le16 	 evt_type;
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index b689aceb636b..faebe3859931 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -493,6 +493,8 @@ struct hci_conn {
 	__u16		le_supv_timeout;
 	__u8		le_adv_data[HCI_MAX_AD_LENGTH];
 	__u8		le_adv_data_len;
+	__u8		le_tx_phy;
+	__u8		le_rx_phy;
 	__s8		rssi;
 	__s8		tx_power;
 	__s8		max_tx_power;
-- 
cgit v1.2.3


From 14a65084f9310ba6a4017c365f9c9820b099dde5 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzk@kernel.org>
Date: Fri, 3 Jan 2020 18:11:30 +0100
Subject: net: ethernet: sxgbe: Rename Samsung to lowercase

Fix up inconsistent usage of upper and lowercase letters in "Samsung"
name.

"SAMSUNG" is not an abbreviation but a regular trademarked name.
Therefore it should be written with lowercase letters starting with
capital letter.

Although advertisement materials usually use uppercase "SAMSUNG", the
lowercase version is used in all legal aspects (e.g. on Wikipedia and in
privacy/legal statements on
https://www.samsung.com/semiconductor/privacy-global/).

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/sxgbe_platform.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/sxgbe_platform.h b/include/linux/sxgbe_platform.h
index 85ec745767bd..966146f7267a 100644
--- a/include/linux/sxgbe_platform.h
+++ b/include/linux/sxgbe_platform.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * 10G controller driver for Samsung EXYNOS SoCs
+ * 10G controller driver for Samsung Exynos SoCs
  *
  * Copyright (C) 2013 Samsung Electronics Co., Ltd.
  *		http://www.samsung.com
-- 
cgit v1.2.3


From c114574ebfdf42f826776f717c8056a00fa94881 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Fri, 3 Jan 2020 20:43:17 +0000
Subject: net: phy: add PHY_INTERFACE_MODE_10GBASER

Recent discussion has revealed that the use of PHY_INTERFACE_MODE_10GKR
is incorrect. Add a 10GBASE-R definition, document both the -R and -KR
versions, and the fact that 10GKR was used incorrectly.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 30e599c454db..5932bb8e9c35 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -100,9 +100,11 @@ typedef enum {
 	PHY_INTERFACE_MODE_2500BASEX,
 	PHY_INTERFACE_MODE_RXAUI,
 	PHY_INTERFACE_MODE_XAUI,
-	/* 10GBASE-KR, XFI, SFI - single lane 10G Serdes */
-	PHY_INTERFACE_MODE_10GKR,
+	/* 10GBASE-R, XFI, SFI - single lane 10G Serdes */
+	PHY_INTERFACE_MODE_10GBASER,
 	PHY_INTERFACE_MODE_USXGMII,
+	/* 10GBASE-KR - with Clause 73 AN */
+	PHY_INTERFACE_MODE_10GKR,
 	PHY_INTERFACE_MODE_MAX,
 } phy_interface_t;
 
@@ -176,10 +178,12 @@ static inline const char *phy_modes(phy_interface_t interface)
 		return "rxaui";
 	case PHY_INTERFACE_MODE_XAUI:
 		return "xaui";
-	case PHY_INTERFACE_MODE_10GKR:
-		return "10gbase-kr";
+	case PHY_INTERFACE_MODE_10GBASER:
+		return "10gbase-r";
 	case PHY_INTERFACE_MODE_USXGMII:
 		return "usxgmii";
+	case PHY_INTERFACE_MODE_10GKR:
+		return "10gbase-kr";
 	default:
 		return "unknown";
 	}
-- 
cgit v1.2.3


From 0a51826c6e05c5b6cc423b376b81c311e9e485b0 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Sat, 4 Jan 2020 02:37:09 +0200
Subject: net: dsa: sja1105: Always send through management routes in slot 0

I finally found out how the 4 management route slots are supposed to
be used, but.. it's not worth it.

The description from the comment I've just deleted in this commit is
still true: when more than 1 management slot is active at the same time,
the switch will match frames incoming [from the CPU port] on the lowest
numbered management slot that matches the frame's DMAC.

My issue was that one was not supposed to statically assign each port a
slot. Yes, there are 4 slots and also 4 non-CPU ports, but that is a
mere coincidence.

Instead, the switch can be used like this: every management frame gets a
slot at the right of the most recently assigned slot:

Send mgmt frame 1 through S0:    S0 x  x  x
Send mgmt frame 2 through S1:    S0 S1 x  x
Send mgmt frame 3 through S2:    S0 S1 S2 x
Send mgmt frame 4 through S3:    S0 S1 S2 S3

The difference compared to the old usage is that the transmission of
frames 1-4 doesn't need to wait until the completion of the management
route. It is safe to use a slot to the right of the most recently used
one, because by protocol nobody will program a slot to your left and
"steal" your route towards the correct egress port.

So there is a potential throughput benefit here.

But mgmt frame 5 has no more free slot to use, so it has to wait until
_all_ of S0, S1, S2, S3 are full, in order to use S0 again.

And that's actually exactly the problem: I was looking for something
that would bring more predictable transmission latency, but this is
exactly the opposite: 3 out of 4 frames would be transmitted quicker,
but the 4th would draw the short straw and have a worse worst-case
latency than before.

Useless.

Things are made even worse by PTP TX timestamping, which is something I
won't go deeply into here. Suffice to say that the fact there is a
driver-level lock on the SPI bus offsets any potential throughput gains
that parallelism might bring.

So there's no going back to the multi-slot scheme, remove the
"mgmt_slot" variable from sja1105_port and the dummy static assignment
made at probe time.

While passing by, also remove the assignment to casc_port altogether.
Don't pretend that we support cascaded setups.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/dsa/sja1105.h | 1 -
 1 file changed, 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h
index c0b6a603ea8c..317e05b2584b 100644
--- a/include/linux/dsa/sja1105.h
+++ b/include/linux/dsa/sja1105.h
@@ -56,7 +56,6 @@ struct sja1105_port {
 	struct sja1105_tagger_data *data;
 	struct dsa_port *dp;
 	bool hwts_tx_en;
-	int mgmt_slot;
 };
 
 #endif /* _NET_DSA_SJA1105_H */
-- 
cgit v1.2.3


From a68578c20a9667463ee3000402b21644ea62d753 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Sat, 4 Jan 2020 02:37:10 +0200
Subject: net: dsa: Make deferred_xmit private to sja1105

There are 3 things that are wrong with the DSA deferred xmit mechanism:

1. Its introduction has made the DSA hotpath ever so slightly more
   inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
   to be initialized to false for every transmitted frame, in order to
   figure out whether the driver requested deferral or not (a very rare
   occasion, rare even for the only driver that does use this mechanism:
   sja1105). That was necessary to avoid kfree_skb from freeing the skb.

2. Because L2 PTP is a link-local protocol like STP, it requires
   management routes and deferred xmit with this switch. But as opposed
   to STP, the deferred work mechanism needs to schedule the packet
   rather quickly for the TX timstamp to be collected in time and sent
   to user space. But there is no provision for controlling the
   scheduling priority of this deferred xmit workqueue. Too bad this is
   a rather specific requirement for a feature that nobody else uses
   (more below).

3. Perhaps most importantly, it makes the DSA core adhere a bit too
   much to the NXP company-wide policy "Innovate Where It Doesn't
   Matter". The sja1105 is probably the only DSA switch that requires
   some frames sent from the CPU to be routed to the slave port via an
   out-of-band configuration (register write) rather than in-band (DSA
   tag). And there are indeed very good reasons to not want to do that:
   if that out-of-band register is at the other end of a slow bus such
   as SPI, then you limit that Ethernet flow's throughput to effectively
   the throughput of the SPI bus. So hardware vendors should definitely
   not be encouraged to design this way. We do _not_ want more
   widespread use of this mechanism.

Luckily we have a solution for each of the 3 issues:

For 1, we can just remove that variable in the skb->cb and counteract
the effect of kfree_skb with skb_get, much to the same effect. The
advantage, of course, being that anybody who doesn't use deferred xmit
doesn't need to do any extra operation in the hotpath.

For 2, we can create a kernel thread for each port's deferred xmit work.
If the user switch ports are named swp0, swp1, swp2, the kernel threads
will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
character length limit on kernel thread names). With this, the user can
change the scheduling priority with chrt $(pidof swp2_xmit).

For 3, we can actually move the entire implementation to the sja1105
driver.

So this patch deletes the generic implementation from the DSA core and
adds a new one, more adequate to the requirements of PTP TX
timestamping, in sja1105_main.c.

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/dsa/sja1105.h | 3 +++
 include/net/dsa.h           | 9 ---------
 2 files changed, 3 insertions(+), 9 deletions(-)

(limited to 'include')

diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h
index 317e05b2584b..fa5735c353cd 100644
--- a/include/linux/dsa/sja1105.h
+++ b/include/linux/dsa/sja1105.h
@@ -53,6 +53,9 @@ struct sja1105_skb_cb {
 	((struct sja1105_skb_cb *)DSA_SKB_CB_PRIV(skb))
 
 struct sja1105_port {
+	struct kthread_worker *xmit_worker;
+	struct kthread_work xmit_work;
+	struct sk_buff_head xmit_queue;
 	struct sja1105_tagger_data *data;
 	struct dsa_port *dp;
 	bool hwts_tx_en;
diff --git a/include/net/dsa.h b/include/net/dsa.h
index da5578db228e..23b1c58656d4 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -90,7 +90,6 @@ struct dsa_device_ops {
 
 struct dsa_skb_cb {
 	struct sk_buff *clone;
-	bool deferred_xmit;
 };
 
 struct __dsa_skb_cb {
@@ -192,9 +191,6 @@ struct dsa_port {
 	struct phylink		*pl;
 	struct phylink_config	pl_config;
 
-	struct work_struct	xmit_work;
-	struct sk_buff_head	xmit_queue;
-
 	struct list_head list;
 
 	/*
@@ -564,11 +560,6 @@ struct dsa_switch_ops {
 	bool	(*port_rxtstamp)(struct dsa_switch *ds, int port,
 				 struct sk_buff *skb, unsigned int type);
 
-	/*
-	 * Deferred frame Tx
-	 */
-	netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port,
-					  struct sk_buff *skb);
 	/* Devlink parameters */
 	int	(*devlink_param_get)(struct dsa_switch *ds, u32 id,
 				     struct devlink_param_gset_ctx *ctx);
-- 
cgit v1.2.3


From 6c930994503d9b5bd34b1329427dd7d3d6d37cd4 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 6 Jan 2020 03:34:09 +0200
Subject: mii: Add helpers for parsing SGMII auto-negotiation

Typically a MAC PCS auto-configures itself after it receives the
negotiated copper-side link settings from the PHY, but some MAC devices
are more special and need manual interpretation of the SGMII AN result.

In other cases, the PCS exposes the entire tx_config_reg base page as it
is transmitted on the wire during auto-negotiation, so it makes sense to
be able to decode the equivalent lp_advertised bit mask from the raw u16
(of course, "lp" considering the PCS to be the local PHY).

Therefore, add the bit definitions for the SGMII registers 4 and 5
(local device ability, link partner ability), as well as a link_mode
conversion helper that can be used to feed the AN results into
phy_resolve_aneg_linkmode.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/mii.h      | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/mii.h | 12 ++++++++++++
 2 files changed, 62 insertions(+)

(limited to 'include')

diff --git a/include/linux/mii.h b/include/linux/mii.h
index 4ce8901a1af6..18c6208f56fc 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -372,6 +372,56 @@ static inline u32 mii_lpa_to_ethtool_lpa_x(u32 lpa)
 	return result | mii_adv_to_ethtool_adv_x(lpa);
 }
 
+/**
+ * mii_lpa_mod_linkmode_adv_sgmii
+ * @lp_advertising: pointer to destination link mode.
+ * @lpa: value of the MII_LPA register
+ *
+ * A small helper function that translates MII_LPA bits to
+ * linkmode advertisement settings for SGMII.
+ * Leaves other bits unchanged.
+ */
+static inline void
+mii_lpa_mod_linkmode_lpa_sgmii(unsigned long *lp_advertising, u32 lpa)
+{
+	u32 speed_duplex = lpa & LPA_SGMII_DPX_SPD_MASK;
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_1000HALF);
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_1000FULL);
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_100HALF);
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_100FULL);
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_10HALF);
+
+	linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, lp_advertising,
+			 speed_duplex == LPA_SGMII_10FULL);
+}
+
+/**
+ * mii_lpa_to_linkmode_adv_sgmii
+ * @advertising: pointer to destination link mode.
+ * @lpa: value of the MII_LPA register
+ *
+ * A small helper function that translates MII_ADVERTISE bits
+ * to linkmode advertisement settings when in SGMII mode.
+ * Clears the old value of advertising.
+ */
+static inline void mii_lpa_to_linkmode_lpa_sgmii(unsigned long *lp_advertising,
+						 u32 lpa)
+{
+	linkmode_zero(lp_advertising);
+
+	mii_lpa_mod_linkmode_lpa_sgmii(lp_advertising, lpa);
+}
+
 /**
  * mii_adv_mod_linkmode_adv_t
  * @advertising:pointer to destination link mode.
diff --git a/include/uapi/linux/mii.h b/include/uapi/linux/mii.h
index 51b48e4be1f2..0b9c3beda345 100644
--- a/include/uapi/linux/mii.h
+++ b/include/uapi/linux/mii.h
@@ -131,6 +131,18 @@
 #define NWAYTEST_LOOPBACK	0x0100	/* Enable loopback for N-way   */
 #define NWAYTEST_RESV2		0xfe00	/* Unused...                   */
 
+/* MAC and PHY tx_config_Reg[15:0] for SGMII in-band auto-negotiation.*/
+#define ADVERTISE_SGMII		0x0001	/* MAC can do SGMII            */
+#define LPA_SGMII		0x0001	/* PHY can do SGMII            */
+#define LPA_SGMII_DPX_SPD_MASK	0x1C00	/* SGMII duplex and speed bits */
+#define LPA_SGMII_10HALF	0x0000	/* Can do 10mbps half-duplex   */
+#define LPA_SGMII_10FULL	0x1000	/* Can do 10mbps full-duplex   */
+#define LPA_SGMII_100HALF	0x0400	/* Can do 100mbps half-duplex  */
+#define LPA_SGMII_100FULL	0x1400	/* Can do 100mbps full-duplex  */
+#define LPA_SGMII_1000HALF	0x0800	/* Can do 1000mbps half-duplex */
+#define LPA_SGMII_1000FULL	0x1800	/* Can do 1000mbps full-duplex */
+#define LPA_SGMII_LINK		0x8000	/* PHY link with copper-side partner */
+
 /* 1000BASE-T Control register */
 #define ADVERTISE_1000FULL	0x0200  /* Advertise 1000BASE-T full duplex */
 #define ADVERTISE_1000HALF	0x0100  /* Advertise 1000BASE-T half duplex */
-- 
cgit v1.2.3


From 1511ed0a0167f523a84b4e727372a5d2ce1b6c2f Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 6 Jan 2020 03:34:11 +0200
Subject: net: phylink: add support for polling MAC PCS

Some MAC PCS blocks are unable to provide interrupts when their status
changes. As we already have support in phylink for polling status, use
this to provide a hook for MACs to enable polling mode.

The patch idea was picked up from Russell King's suggestion on the macb
phylink patch thread here [0] but the implementation was changed.
Instead of introducing a new phylink_start_poll() function, which would
make the implementation cumbersome for common PHYLINK implementations
for multiple types of devices, like DSA, just add a boolean property to
the phylink_config structure, which is just as backwards-compatible.

https://lkml.org/lkml/2019/12/16/603

Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phylink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index fed5488e3c75..523209e70947 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -63,10 +63,12 @@ enum phylink_op_type {
  * struct phylink_config - PHYLINK configuration structure
  * @dev: a pointer to a struct device associated with the MAC
  * @type: operation type of PHYLINK instance
+ * @pcs_poll: MAC PCS cannot provide link change interrupt
  */
 struct phylink_config {
 	struct device *dev;
 	enum phylink_op_type type;
+	bool pcs_poll;
 };
 
 /**
-- 
cgit v1.2.3


From 787cac3f5a650fd3184a41c5a27a2fe9ded833aa Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 6 Jan 2020 03:34:12 +0200
Subject: net: dsa: Pass pcs_poll flag from driver to PHYLINK

The DSA drivers that implement .phylink_mac_link_state should normally
register an interrupt for the PCS, from which they should call
phylink_mac_change(). However not all switches implement this, and those
who don't should set this flag in dsa_switch in the .setup callback, so
that PHYLINK will poll for a few ms until the in-band AN link timer
expires and the PCS state settles.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dsa.h | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'include')

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 23b1c58656d4..0c39fed8cd99 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -279,6 +279,11 @@ struct dsa_switch {
 	 */
 	bool			vlan_filtering;
 
+	/* MAC PCS does not provide link state change interrupt, and requires
+	 * polling. Flag passed on to PHYLINK.
+	 */
+	bool			pcs_poll;
+
 	size_t num_ports;
 };
 
-- 
cgit v1.2.3


From 6517798dd3432a0002109809bf74e4fcf9bb0c7d Mon Sep 17 00:00:00 2001
From: Claudiu Manoil <claudiu.manoil@nxp.com>
Date: Mon, 6 Jan 2020 03:34:13 +0200
Subject: enetc: Make MDIO accessors more generic and export to
 include/linux/fsl

Within the LS1028A SoC, the register map for the ENETC MDIO controller
is instantiated a few times: for the central (external) MDIO controller,
for the internal bus of each standalone ENETC port, and for the internal
bus of the Felix switch.

Refactoring is needed to support multiple MDIO buses from multiple
drivers. The enetc_hw structure is made an opaque type and a smaller
enetc_mdio_priv is created.

'mdio_base' - MDIO registers base address - is being parameterized, to
be able to work with different MDIO register bases.

The ENETC MDIO bus operations are exported from the fsl-enetc-mdio
kernel object, the same that registers the central MDIO controller (the
dedicated PF). The ENETC main driver has been changed to select it, and
use its exported helpers to further register its private MDIO bus. The
DSA Felix driver will do the same.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/fsl/enetc_mdio.h | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 include/linux/fsl/enetc_mdio.h

(limited to 'include')

diff --git a/include/linux/fsl/enetc_mdio.h b/include/linux/fsl/enetc_mdio.h
new file mode 100644
index 000000000000..4875dd38af7e
--- /dev/null
+++ b/include/linux/fsl/enetc_mdio.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) */
+/* Copyright 2019 NXP */
+
+#ifndef _FSL_ENETC_MDIO_H_
+#define _FSL_ENETC_MDIO_H_
+
+#include <linux/phy.h>
+
+/* PCS registers */
+#define ENETC_PCS_LINK_TIMER1			0x12
+#define ENETC_PCS_LINK_TIMER1_VAL		0x06a0
+#define ENETC_PCS_LINK_TIMER2			0x13
+#define ENETC_PCS_LINK_TIMER2_VAL		0x0003
+#define ENETC_PCS_IF_MODE			0x14
+#define ENETC_PCS_IF_MODE_SGMII_EN		BIT(0)
+#define ENETC_PCS_IF_MODE_USE_SGMII_AN		BIT(1)
+#define ENETC_PCS_IF_MODE_SGMII_SPEED(x)	(((x) << 2) & GENMASK(3, 2))
+
+/* Not a mistake, the SerDes PLL needs to be set at 3.125 GHz by Reset
+ * Configuration Word (RCW, outside Linux control) for 2.5G SGMII mode. The PCS
+ * still thinks it's at gigabit.
+ */
+enum enetc_pcs_speed {
+	ENETC_PCS_SPEED_10	= 0,
+	ENETC_PCS_SPEED_100	= 1,
+	ENETC_PCS_SPEED_1000	= 2,
+	ENETC_PCS_SPEED_2500	= 2,
+};
+
+struct enetc_hw;
+
+struct enetc_mdio_priv {
+	struct enetc_hw *hw;
+	int mdio_base;
+};
+
+#if IS_REACHABLE(CONFIG_FSL_ENETC_MDIO)
+
+int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum);
+int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum, u16 value);
+struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs);
+
+#else
+
+static inline int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum)
+{ return -EINVAL; }
+static inline int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum,
+				   u16 value)
+{ return -EINVAL; }
+struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
+{ return ERR_PTR(-EINVAL); }
+
+#endif
+
+#endif
-- 
cgit v1.2.3


From ee50d07c9fc8155b5a3c6c29eae1459a12cf2fb4 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 6 Jan 2020 03:34:15 +0200
Subject: net: mscc: ocelot: make phy_mode a member of the common struct
 ocelot_port

The Ocelot switchdev driver and the Felix DSA one need it for different
reasons. Felix (or at least the VSC9959 instantiation in NXP LS1028A) is
integrated with the traditional NXP Layerscape PCS design which does not
support runtime configuration of SerDes protocol. So it needs to
pre-validate the phy-mode from the device tree and prevent PHYLINK from
attempting to change it. For this, it needs to cache it in a private
variable.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/soc/mscc/ocelot.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 64cbbbe74a36..068f96b1a83e 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -420,6 +420,8 @@ struct ocelot_port {
 	u8				ptp_cmd;
 	struct sk_buff_head		tx_skbs;
 	u8				ts_id;
+
+	phy_interface_t			phy_mode;
 };
 
 struct ocelot {
-- 
cgit v1.2.3


From 964ee5c82b770c2d8a5ccefeee3384c1061ce3ae Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 6 Jan 2020 03:34:16 +0200
Subject: net: mscc: ocelot: export ANA, DEV and QSYS registers to
 include/soc/mscc

Since the Felix DSA driver is implementing its own PHYLINK instance due
to SoC differences, it needs access to the few registers that are
common, mainly for flow control.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/soc/mscc/ocelot_ana.h  | 625 +++++++++++++++++++++++++++++++++++++++++
 include/soc/mscc/ocelot_dev.h  | 275 ++++++++++++++++++
 include/soc/mscc/ocelot_qsys.h | 270 ++++++++++++++++++
 3 files changed, 1170 insertions(+)
 create mode 100644 include/soc/mscc/ocelot_ana.h
 create mode 100644 include/soc/mscc/ocelot_dev.h
 create mode 100644 include/soc/mscc/ocelot_qsys.h

(limited to 'include')

diff --git a/include/soc/mscc/ocelot_ana.h b/include/soc/mscc/ocelot_ana.h
new file mode 100644
index 000000000000..841c6ec22b64
--- /dev/null
+++ b/include/soc/mscc/ocelot_ana.h
@@ -0,0 +1,625 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_ANA_H_
+#define _MSCC_OCELOT_ANA_H_
+
+#define ANA_ANAGEFIL_B_DOM_EN                             BIT(22)
+#define ANA_ANAGEFIL_B_DOM_VAL                            BIT(21)
+#define ANA_ANAGEFIL_AGE_LOCKED                           BIT(20)
+#define ANA_ANAGEFIL_PID_EN                               BIT(19)
+#define ANA_ANAGEFIL_PID_VAL(x)                           (((x) << 14) & GENMASK(18, 14))
+#define ANA_ANAGEFIL_PID_VAL_M                            GENMASK(18, 14)
+#define ANA_ANAGEFIL_PID_VAL_X(x)                         (((x) & GENMASK(18, 14)) >> 14)
+#define ANA_ANAGEFIL_VID_EN                               BIT(13)
+#define ANA_ANAGEFIL_VID_VAL(x)                           ((x) & GENMASK(12, 0))
+#define ANA_ANAGEFIL_VID_VAL_M                            GENMASK(12, 0)
+
+#define ANA_STORMLIMIT_CFG_RSZ                            0x4
+
+#define ANA_STORMLIMIT_CFG_STORM_RATE(x)                  (((x) << 3) & GENMASK(6, 3))
+#define ANA_STORMLIMIT_CFG_STORM_RATE_M                   GENMASK(6, 3)
+#define ANA_STORMLIMIT_CFG_STORM_RATE_X(x)                (((x) & GENMASK(6, 3)) >> 3)
+#define ANA_STORMLIMIT_CFG_STORM_UNIT                     BIT(2)
+#define ANA_STORMLIMIT_CFG_STORM_MODE(x)                  ((x) & GENMASK(1, 0))
+#define ANA_STORMLIMIT_CFG_STORM_MODE_M                   GENMASK(1, 0)
+
+#define ANA_AUTOAGE_AGE_FAST                              BIT(21)
+#define ANA_AUTOAGE_AGE_PERIOD(x)                         (((x) << 1) & GENMASK(20, 1))
+#define ANA_AUTOAGE_AGE_PERIOD_M                          GENMASK(20, 1)
+#define ANA_AUTOAGE_AGE_PERIOD_X(x)                       (((x) & GENMASK(20, 1)) >> 1)
+#define ANA_AUTOAGE_AUTOAGE_LOCKED                        BIT(0)
+
+#define ANA_MACTOPTIONS_REDUCED_TABLE                     BIT(1)
+#define ANA_MACTOPTIONS_SHADOW                            BIT(0)
+
+#define ANA_AGENCTRL_FID_MASK(x)                          (((x) << 12) & GENMASK(23, 12))
+#define ANA_AGENCTRL_FID_MASK_M                           GENMASK(23, 12)
+#define ANA_AGENCTRL_FID_MASK_X(x)                        (((x) & GENMASK(23, 12)) >> 12)
+#define ANA_AGENCTRL_IGNORE_DMAC_FLAGS                    BIT(11)
+#define ANA_AGENCTRL_IGNORE_SMAC_FLAGS                    BIT(10)
+#define ANA_AGENCTRL_FLOOD_SPECIAL                        BIT(9)
+#define ANA_AGENCTRL_FLOOD_IGNORE_VLAN                    BIT(8)
+#define ANA_AGENCTRL_MIRROR_CPU                           BIT(7)
+#define ANA_AGENCTRL_LEARN_CPU_COPY                       BIT(6)
+#define ANA_AGENCTRL_LEARN_FWD_KILL                       BIT(5)
+#define ANA_AGENCTRL_LEARN_IGNORE_VLAN                    BIT(4)
+#define ANA_AGENCTRL_CPU_CPU_KILL_ENA                     BIT(3)
+#define ANA_AGENCTRL_GREEN_COUNT_MODE                     BIT(2)
+#define ANA_AGENCTRL_YELLOW_COUNT_MODE                    BIT(1)
+#define ANA_AGENCTRL_RED_COUNT_MODE                       BIT(0)
+
+#define ANA_FLOODING_RSZ                                  0x4
+
+#define ANA_FLOODING_FLD_UNICAST(x)                       (((x) << 12) & GENMASK(17, 12))
+#define ANA_FLOODING_FLD_UNICAST_M                        GENMASK(17, 12)
+#define ANA_FLOODING_FLD_UNICAST_X(x)                     (((x) & GENMASK(17, 12)) >> 12)
+#define ANA_FLOODING_FLD_BROADCAST(x)                     (((x) << 6) & GENMASK(11, 6))
+#define ANA_FLOODING_FLD_BROADCAST_M                      GENMASK(11, 6)
+#define ANA_FLOODING_FLD_BROADCAST_X(x)                   (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FLOODING_FLD_MULTICAST(x)                     ((x) & GENMASK(5, 0))
+#define ANA_FLOODING_FLD_MULTICAST_M                      GENMASK(5, 0)
+
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL(x)                 (((x) << 18) & GENMASK(23, 18))
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_M                  GENMASK(23, 18)
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_X(x)               (((x) & GENMASK(23, 18)) >> 18)
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA(x)                 (((x) << 12) & GENMASK(17, 12))
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA_M                  GENMASK(17, 12)
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA_X(x)               (((x) & GENMASK(17, 12)) >> 12)
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL(x)                 (((x) << 6) & GENMASK(11, 6))
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_M                  GENMASK(11, 6)
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_X(x)               (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FLOODING_IPMC_FLD_MC6_DATA(x)                 ((x) & GENMASK(5, 0))
+#define ANA_FLOODING_IPMC_FLD_MC6_DATA_M                  GENMASK(5, 0)
+
+#define ANA_SFLOW_CFG_RSZ                                 0x4
+
+#define ANA_SFLOW_CFG_SF_RATE(x)                          (((x) << 2) & GENMASK(13, 2))
+#define ANA_SFLOW_CFG_SF_RATE_M                           GENMASK(13, 2)
+#define ANA_SFLOW_CFG_SF_RATE_X(x)                        (((x) & GENMASK(13, 2)) >> 2)
+#define ANA_SFLOW_CFG_SF_SAMPLE_RX                        BIT(1)
+#define ANA_SFLOW_CFG_SF_SAMPLE_TX                        BIT(0)
+
+#define ANA_PORT_MODE_RSZ                                 0x4
+
+#define ANA_PORT_MODE_REDTAG_PARSE_CFG                    BIT(3)
+#define ANA_PORT_MODE_VLAN_PARSE_CFG(x)                   (((x) << 1) & GENMASK(2, 1))
+#define ANA_PORT_MODE_VLAN_PARSE_CFG_M                    GENMASK(2, 1)
+#define ANA_PORT_MODE_VLAN_PARSE_CFG_X(x)                 (((x) & GENMASK(2, 1)) >> 1)
+#define ANA_PORT_MODE_L3_PARSE_CFG                        BIT(0)
+
+#define ANA_CUT_THRU_CFG_RSZ                              0x4
+
+#define ANA_PGID_PGID_RSZ                                 0x4
+
+#define ANA_PGID_PGID_PGID(x)                             ((x) & GENMASK(11, 0))
+#define ANA_PGID_PGID_PGID_M                              GENMASK(11, 0)
+#define ANA_PGID_PGID_CPUQ_DST_PGID(x)                    (((x) << 27) & GENMASK(29, 27))
+#define ANA_PGID_PGID_CPUQ_DST_PGID_M                     GENMASK(29, 27)
+#define ANA_PGID_PGID_CPUQ_DST_PGID_X(x)                  (((x) & GENMASK(29, 27)) >> 27)
+
+#define ANA_TABLES_MACHDATA_VID(x)                        (((x) << 16) & GENMASK(28, 16))
+#define ANA_TABLES_MACHDATA_VID_M                         GENMASK(28, 16)
+#define ANA_TABLES_MACHDATA_VID_X(x)                      (((x) & GENMASK(28, 16)) >> 16)
+#define ANA_TABLES_MACHDATA_MACHDATA(x)                   ((x) & GENMASK(15, 0))
+#define ANA_TABLES_MACHDATA_MACHDATA_M                    GENMASK(15, 0)
+
+#define ANA_TABLES_STREAMDATA_SSID_VALID                  BIT(16)
+#define ANA_TABLES_STREAMDATA_SSID(x)                     (((x) << 9) & GENMASK(15, 9))
+#define ANA_TABLES_STREAMDATA_SSID_M                      GENMASK(15, 9)
+#define ANA_TABLES_STREAMDATA_SSID_X(x)                   (((x) & GENMASK(15, 9)) >> 9)
+#define ANA_TABLES_STREAMDATA_SFID_VALID                  BIT(8)
+#define ANA_TABLES_STREAMDATA_SFID(x)                     ((x) & GENMASK(7, 0))
+#define ANA_TABLES_STREAMDATA_SFID_M                      GENMASK(7, 0)
+
+#define ANA_TABLES_MACACCESS_MAC_CPU_COPY                 BIT(15)
+#define ANA_TABLES_MACACCESS_SRC_KILL                     BIT(14)
+#define ANA_TABLES_MACACCESS_IGNORE_VLAN                  BIT(13)
+#define ANA_TABLES_MACACCESS_AGED_FLAG                    BIT(12)
+#define ANA_TABLES_MACACCESS_VALID                        BIT(11)
+#define ANA_TABLES_MACACCESS_ENTRYTYPE(x)                 (((x) << 9) & GENMASK(10, 9))
+#define ANA_TABLES_MACACCESS_ENTRYTYPE_M                  GENMASK(10, 9)
+#define ANA_TABLES_MACACCESS_ENTRYTYPE_X(x)               (((x) & GENMASK(10, 9)) >> 9)
+#define ANA_TABLES_MACACCESS_DEST_IDX(x)                  (((x) << 3) & GENMASK(8, 3))
+#define ANA_TABLES_MACACCESS_DEST_IDX_M                   GENMASK(8, 3)
+#define ANA_TABLES_MACACCESS_DEST_IDX_X(x)                (((x) & GENMASK(8, 3)) >> 3)
+#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD(x)             ((x) & GENMASK(2, 0))
+#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD_M              GENMASK(2, 0)
+#define MACACCESS_CMD_IDLE                     0
+#define MACACCESS_CMD_LEARN                    1
+#define MACACCESS_CMD_FORGET                   2
+#define MACACCESS_CMD_AGE                      3
+#define MACACCESS_CMD_GET_NEXT                 4
+#define MACACCESS_CMD_INIT                     5
+#define MACACCESS_CMD_READ                     6
+#define MACACCESS_CMD_WRITE                    7
+
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK(x)           (((x) << 2) & GENMASK(13, 2))
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_M            GENMASK(13, 2)
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_X(x)         (((x) & GENMASK(13, 2)) >> 2)
+#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD(x)             ((x) & GENMASK(1, 0))
+#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD_M              GENMASK(1, 0)
+#define ANA_TABLES_VLANACCESS_CMD_IDLE                    0x0
+#define ANA_TABLES_VLANACCESS_CMD_WRITE                   0x2
+#define ANA_TABLES_VLANACCESS_CMD_INIT                    0x3
+
+#define ANA_TABLES_VLANTIDX_VLAN_SEC_FWD_ENA              BIT(17)
+#define ANA_TABLES_VLANTIDX_VLAN_FLOOD_DIS                BIT(16)
+#define ANA_TABLES_VLANTIDX_VLAN_PRIV_VLAN                BIT(15)
+#define ANA_TABLES_VLANTIDX_VLAN_LEARN_DISABLED           BIT(14)
+#define ANA_TABLES_VLANTIDX_VLAN_MIRROR                   BIT(13)
+#define ANA_TABLES_VLANTIDX_VLAN_SRC_CHK                  BIT(12)
+#define ANA_TABLES_VLANTIDX_V_INDEX(x)                    ((x) & GENMASK(11, 0))
+#define ANA_TABLES_VLANTIDX_V_INDEX_M                     GENMASK(11, 0)
+
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK(x)           (((x) << 2) & GENMASK(8, 2))
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_M            GENMASK(8, 2)
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_X(x)         (((x) & GENMASK(8, 2)) >> 2)
+#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD(x)             ((x) & GENMASK(1, 0))
+#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD_M              GENMASK(1, 0)
+
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI(x)                 (((x) << 21) & GENMASK(28, 21))
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_M                  GENMASK(28, 21)
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_X(x)               (((x) & GENMASK(28, 21)) >> 21)
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI(x)                  (((x) << 15) & GENMASK(20, 15))
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_M                   GENMASK(20, 15)
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_X(x)                (((x) & GENMASK(20, 15)) >> 15)
+#define ANA_TABLES_ISDXTIDX_ISDX_ES0_KEY_ENA              BIT(14)
+#define ANA_TABLES_ISDXTIDX_ISDX_FORCE_ENA                BIT(10)
+#define ANA_TABLES_ISDXTIDX_ISDX_INDEX(x)                 ((x) & GENMASK(7, 0))
+#define ANA_TABLES_ISDXTIDX_ISDX_INDEX_M                  GENMASK(7, 0)
+
+#define ANA_TABLES_ENTRYLIM_RSZ                           0x4
+
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM(x)                   (((x) << 14) & GENMASK(17, 14))
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM_M                    GENMASK(17, 14)
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM_X(x)                 (((x) & GENMASK(17, 14)) >> 14)
+#define ANA_TABLES_ENTRYLIM_ENTRYSTAT(x)                  ((x) & GENMASK(13, 0))
+#define ANA_TABLES_ENTRYLIM_ENTRYSTAT_M                   GENMASK(13, 0)
+
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM(x)        (((x) << 4) & GENMASK(31, 4))
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_M         GENMASK(31, 4)
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_X(x)      (((x) & GENMASK(31, 4)) >> 4)
+#define ANA_TABLES_STREAMACCESS_SEQ_GEN_REC_ENA           BIT(3)
+#define ANA_TABLES_STREAMACCESS_GEN_REC_TYPE              BIT(2)
+#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD(x)         ((x) & GENMASK(1, 0))
+#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD_M          GENMASK(1, 0)
+
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS(x)       (((x) << 30) & GENMASK(31, 30))
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_M        GENMASK(31, 30)
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_X(x)     (((x) & GENMASK(31, 30)) >> 30)
+#define ANA_TABLES_STREAMTIDX_S_INDEX(x)                  (((x) << 16) & GENMASK(22, 16))
+#define ANA_TABLES_STREAMTIDX_S_INDEX_M                   GENMASK(22, 16)
+#define ANA_TABLES_STREAMTIDX_S_INDEX_X(x)                (((x) & GENMASK(22, 16)) >> 16)
+#define ANA_TABLES_STREAMTIDX_FORCE_SF_BEHAVIOUR          BIT(14)
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN(x)          (((x) << 8) & GENMASK(13, 8))
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_M           GENMASK(13, 8)
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_X(x)        (((x) & GENMASK(13, 8)) >> 8)
+#define ANA_TABLES_STREAMTIDX_RESET_ON_ROGUE              BIT(7)
+#define ANA_TABLES_STREAMTIDX_REDTAG_POP                  BIT(6)
+#define ANA_TABLES_STREAMTIDX_STREAM_SPLIT                BIT(5)
+#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2(x)           ((x) & GENMASK(4, 0))
+#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2_M            GENMASK(4, 0)
+
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK(x)                 (((x) << 16) & GENMASK(22, 16))
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_M                  GENMASK(22, 16)
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_X(x)               (((x) & GENMASK(22, 16)) >> 16)
+#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK(x)            ((x) & GENMASK(6, 0))
+#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK_M             GENMASK(6, 0)
+
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK(x)             (((x) << 1) & GENMASK(7, 1))
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_M              GENMASK(7, 1)
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_X(x)           (((x) & GENMASK(7, 1)) >> 1)
+#define ANA_TABLES_SFID_MASK_IGR_SRCPORT_MATCH_ENA        BIT(0)
+
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_MATCH_ENA          BIT(22)
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO(x)                 (((x) << 19) & GENMASK(21, 19))
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_M                  GENMASK(21, 19)
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_X(x)               (((x) & GENMASK(21, 19)) >> 19)
+#define ANA_TABLES_SFIDACCESS_FORCE_BLOCK                 BIT(18)
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN(x)              (((x) << 2) & GENMASK(17, 2))
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_M               GENMASK(17, 2)
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_X(x)            (((x) & GENMASK(17, 2)) >> 2)
+#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(x)             ((x) & GENMASK(1, 0))
+#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD_M              GENMASK(1, 0)
+
+#define ANA_TABLES_SFIDTIDX_SGID_VALID                    BIT(26)
+#define ANA_TABLES_SFIDTIDX_SGID(x)                       (((x) << 18) & GENMASK(25, 18))
+#define ANA_TABLES_SFIDTIDX_SGID_M                        GENMASK(25, 18)
+#define ANA_TABLES_SFIDTIDX_SGID_X(x)                     (((x) & GENMASK(25, 18)) >> 18)
+#define ANA_TABLES_SFIDTIDX_POL_ENA                       BIT(17)
+#define ANA_TABLES_SFIDTIDX_POL_IDX(x)                    (((x) << 8) & GENMASK(16, 8))
+#define ANA_TABLES_SFIDTIDX_POL_IDX_M                     GENMASK(16, 8)
+#define ANA_TABLES_SFIDTIDX_POL_IDX_X(x)                  (((x) & GENMASK(16, 8)) >> 8)
+#define ANA_TABLES_SFIDTIDX_SFID_INDEX(x)                 ((x) & GENMASK(7, 0))
+#define ANA_TABLES_SFIDTIDX_SFID_INDEX_M                  GENMASK(7, 0)
+
+#define ANA_MSTI_STATE_RSZ                                0x4
+
+#define ANA_OAM_UPM_LM_CNT_RSZ                            0x4
+
+#define ANA_SG_ACCESS_CTRL_SGID(x)                        ((x) & GENMASK(7, 0))
+#define ANA_SG_ACCESS_CTRL_SGID_M                         GENMASK(7, 0)
+#define ANA_SG_ACCESS_CTRL_CONFIG_CHANGE                  BIT(28)
+
+#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB(x)          ((x) & GENMASK(15, 0))
+#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB_M           GENMASK(15, 0)
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH(x)                (((x) << 16) & GENMASK(18, 16))
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_M                 GENMASK(18, 16)
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_X(x)              (((x) & GENMASK(18, 16)) >> 16)
+#define ANA_SG_CONFIG_REG_3_GATE_ENABLE                   BIT(20)
+#define ANA_SG_CONFIG_REG_3_INIT_IPS(x)                   (((x) << 24) & GENMASK(27, 24))
+#define ANA_SG_CONFIG_REG_3_INIT_IPS_M                    GENMASK(27, 24)
+#define ANA_SG_CONFIG_REG_3_INIT_IPS_X(x)                 (((x) & GENMASK(27, 24)) >> 24)
+#define ANA_SG_CONFIG_REG_3_INIT_GATE_STATE               BIT(28)
+
+#define ANA_SG_GCL_GS_CONFIG_RSZ                          0x4
+
+#define ANA_SG_GCL_GS_CONFIG_IPS(x)                       ((x) & GENMASK(3, 0))
+#define ANA_SG_GCL_GS_CONFIG_IPS_M                        GENMASK(3, 0)
+#define ANA_SG_GCL_GS_CONFIG_GATE_STATE                   BIT(4)
+
+#define ANA_SG_GCL_TI_CONFIG_RSZ                          0x4
+
+#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB(x)       ((x) & GENMASK(15, 0))
+#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB_M        GENMASK(15, 0)
+#define ANA_SG_STATUS_REG_3_GATE_STATE                    BIT(16)
+#define ANA_SG_STATUS_REG_3_IPS(x)                        (((x) << 20) & GENMASK(23, 20))
+#define ANA_SG_STATUS_REG_3_IPS_M                         GENMASK(23, 20)
+#define ANA_SG_STATUS_REG_3_IPS_X(x)                      (((x) & GENMASK(23, 20)) >> 20)
+#define ANA_SG_STATUS_REG_3_CONFIG_PENDING                BIT(24)
+
+#define ANA_PORT_VLAN_CFG_GSZ                             0x100
+
+#define ANA_PORT_VLAN_CFG_VLAN_VID_AS_ISDX                BIT(21)
+#define ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA                  BIT(20)
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT(x)                 (((x) << 18) & GENMASK(19, 18))
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_M                  GENMASK(19, 18)
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_X(x)               (((x) & GENMASK(19, 18)) >> 18)
+#define ANA_PORT_VLAN_CFG_VLAN_INNER_TAG_ENA              BIT(17)
+#define ANA_PORT_VLAN_CFG_VLAN_TAG_TYPE                   BIT(16)
+#define ANA_PORT_VLAN_CFG_VLAN_DEI                        BIT(15)
+#define ANA_PORT_VLAN_CFG_VLAN_PCP(x)                     (((x) << 12) & GENMASK(14, 12))
+#define ANA_PORT_VLAN_CFG_VLAN_PCP_M                      GENMASK(14, 12)
+#define ANA_PORT_VLAN_CFG_VLAN_PCP_X(x)                   (((x) & GENMASK(14, 12)) >> 12)
+#define ANA_PORT_VLAN_CFG_VLAN_VID(x)                     ((x) & GENMASK(11, 0))
+#define ANA_PORT_VLAN_CFG_VLAN_VID_M                      GENMASK(11, 0)
+
+#define ANA_PORT_DROP_CFG_GSZ                             0x100
+
+#define ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA               BIT(6)
+#define ANA_PORT_DROP_CFG_DROP_S_TAGGED_ENA               BIT(5)
+#define ANA_PORT_DROP_CFG_DROP_C_TAGGED_ENA               BIT(4)
+#define ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA          BIT(3)
+#define ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA          BIT(2)
+#define ANA_PORT_DROP_CFG_DROP_NULL_MAC_ENA               BIT(1)
+#define ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA                BIT(0)
+
+#define ANA_PORT_QOS_CFG_GSZ                              0x100
+
+#define ANA_PORT_QOS_CFG_DP_DEFAULT_VAL                   BIT(8)
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL(x)               (((x) << 5) & GENMASK(7, 5))
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_M                GENMASK(7, 5)
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_X(x)             (((x) & GENMASK(7, 5)) >> 5)
+#define ANA_PORT_QOS_CFG_QOS_DSCP_ENA                     BIT(4)
+#define ANA_PORT_QOS_CFG_QOS_PCP_ENA                      BIT(3)
+#define ANA_PORT_QOS_CFG_DSCP_TRANSLATE_ENA               BIT(2)
+#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG(x)                 ((x) & GENMASK(1, 0))
+#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG_M                  GENMASK(1, 0)
+
+#define ANA_PORT_VCAP_CFG_GSZ                             0x100
+
+#define ANA_PORT_VCAP_CFG_S1_ENA                          BIT(14)
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA(x)              (((x) << 11) & GENMASK(13, 11))
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_M               GENMASK(13, 11)
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_X(x)            (((x) & GENMASK(13, 11)) >> 11)
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA(x)        (((x) << 8) & GENMASK(10, 8))
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_M         GENMASK(10, 8)
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_X(x)      (((x) & GENMASK(10, 8)) >> 8)
+#define ANA_PORT_VCAP_CFG_PAG_VAL(x)                      ((x) & GENMASK(7, 0))
+#define ANA_PORT_VCAP_CFG_PAG_VAL_M                       GENMASK(7, 0)
+
+#define ANA_PORT_VCAP_S1_KEY_CFG_GSZ                      0x100
+#define ANA_PORT_VCAP_S1_KEY_CFG_RSZ                      0x4
+
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG(x)        (((x) << 4) & GENMASK(6, 4))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_M         GENMASK(6, 4)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_X(x)      (((x) & GENMASK(6, 4)) >> 4)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG(x)        (((x) << 2) & GENMASK(3, 2))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_M         GENMASK(3, 2)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_X(x)      (((x) & GENMASK(3, 2)) >> 2)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG(x)      ((x) & GENMASK(1, 0))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG_M       GENMASK(1, 0)
+
+#define ANA_PORT_VCAP_S2_CFG_GSZ                          0x100
+
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA(x)        (((x) << 17) & GENMASK(18, 17))
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_M         GENMASK(18, 17)
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_X(x)      (((x) & GENMASK(18, 17)) >> 17)
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA(x)      (((x) << 15) & GENMASK(16, 15))
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_M       GENMASK(16, 15)
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_X(x)    (((x) & GENMASK(16, 15)) >> 15)
+#define ANA_PORT_VCAP_S2_CFG_S2_ENA                       BIT(14)
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS(x)               (((x) << 12) & GENMASK(13, 12))
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_M                GENMASK(13, 12)
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_X(x)             (((x) & GENMASK(13, 12)) >> 12)
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS(x)                (((x) << 10) & GENMASK(11, 10))
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_M                 GENMASK(11, 10)
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_X(x)              (((x) & GENMASK(11, 10)) >> 10)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS(x)          (((x) << 8) & GENMASK(9, 8))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_M           GENMASK(9, 8)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_X(x)        (((x) & GENMASK(9, 8)) >> 8)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS(x)           (((x) << 6) & GENMASK(7, 6))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_M            GENMASK(7, 6)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_X(x)         (((x) & GENMASK(7, 6)) >> 6)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG(x)                (((x) << 2) & GENMASK(5, 2))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_M                 GENMASK(5, 2)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_X(x)              (((x) & GENMASK(5, 2)) >> 2)
+#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS(x)                ((x) & GENMASK(1, 0))
+#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS_M                 GENMASK(1, 0)
+
+#define ANA_PORT_PCP_DEI_MAP_GSZ                          0x100
+#define ANA_PORT_PCP_DEI_MAP_RSZ                          0x4
+
+#define ANA_PORT_PCP_DEI_MAP_DP_PCP_DEI_VAL               BIT(3)
+#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL(x)           ((x) & GENMASK(2, 0))
+#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL_M            GENMASK(2, 0)
+
+#define ANA_PORT_CPU_FWD_CFG_GSZ                          0x100
+
+#define ANA_PORT_CPU_FWD_CFG_CPU_VRAP_REDIR_ENA           BIT(7)
+#define ANA_PORT_CPU_FWD_CFG_CPU_MLD_REDIR_ENA            BIT(6)
+#define ANA_PORT_CPU_FWD_CFG_CPU_IGMP_REDIR_ENA           BIT(5)
+#define ANA_PORT_CPU_FWD_CFG_CPU_IPMC_CTRL_COPY_ENA       BIT(4)
+#define ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA             BIT(3)
+#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_DROP_ENA       BIT(2)
+#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_REDIR_ENA      BIT(1)
+#define ANA_PORT_CPU_FWD_CFG_CPU_OAM_ENA                  BIT(0)
+
+#define ANA_PORT_CPU_FWD_BPDU_CFG_GSZ                     0x100
+
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA(x)        (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_M         GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_X(x)      (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(x)       ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA_M        GENMASK(15, 0)
+
+#define ANA_PORT_CPU_FWD_GARP_CFG_GSZ                     0x100
+
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA(x)        (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_M         GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_X(x)      (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA(x)       ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA_M        GENMASK(15, 0)
+
+#define ANA_PORT_CPU_FWD_CCM_CFG_GSZ                      0x100
+
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA(x)          (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_M           GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_X(x)        (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA(x)         ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA_M          GENMASK(15, 0)
+
+#define ANA_PORT_PORT_CFG_GSZ                             0x100
+
+#define ANA_PORT_PORT_CFG_SRC_MIRROR_ENA                  BIT(15)
+#define ANA_PORT_PORT_CFG_LIMIT_DROP                      BIT(14)
+#define ANA_PORT_PORT_CFG_LIMIT_CPU                       BIT(13)
+#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_DROP            BIT(12)
+#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_CPU             BIT(11)
+#define ANA_PORT_PORT_CFG_LEARNDROP                       BIT(10)
+#define ANA_PORT_PORT_CFG_LEARNCPU                        BIT(9)
+#define ANA_PORT_PORT_CFG_LEARNAUTO                       BIT(8)
+#define ANA_PORT_PORT_CFG_LEARN_ENA                       BIT(7)
+#define ANA_PORT_PORT_CFG_RECV_ENA                        BIT(6)
+#define ANA_PORT_PORT_CFG_PORTID_VAL(x)                   (((x) << 2) & GENMASK(5, 2))
+#define ANA_PORT_PORT_CFG_PORTID_VAL_M                    GENMASK(5, 2)
+#define ANA_PORT_PORT_CFG_PORTID_VAL_X(x)                 (((x) & GENMASK(5, 2)) >> 2)
+#define ANA_PORT_PORT_CFG_USE_B_DOM_TBL                   BIT(1)
+#define ANA_PORT_PORT_CFG_LSR_MODE                        BIT(0)
+
+#define ANA_PORT_POL_CFG_GSZ                              0x100
+
+#define ANA_PORT_POL_CFG_POL_CPU_REDIR_8021               BIT(19)
+#define ANA_PORT_POL_CFG_POL_CPU_REDIR_IP                 BIT(18)
+#define ANA_PORT_POL_CFG_PORT_POL_ENA                     BIT(17)
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA(x)                 (((x) << 9) & GENMASK(16, 9))
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_M                  GENMASK(16, 9)
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_X(x)               (((x) & GENMASK(16, 9)) >> 9)
+#define ANA_PORT_POL_CFG_POL_ORDER(x)                     ((x) & GENMASK(8, 0))
+#define ANA_PORT_POL_CFG_POL_ORDER_M                      GENMASK(8, 0)
+
+#define ANA_PORT_PTP_CFG_GSZ                              0x100
+
+#define ANA_PORT_PTP_CFG_PTP_BACKPLANE_MODE               BIT(0)
+
+#define ANA_PORT_PTP_DLY1_CFG_GSZ                         0x100
+
+#define ANA_PORT_PTP_DLY2_CFG_GSZ                         0x100
+
+#define ANA_PORT_SFID_CFG_GSZ                             0x100
+#define ANA_PORT_SFID_CFG_RSZ                             0x4
+
+#define ANA_PORT_SFID_CFG_SFID_VALID                      BIT(8)
+#define ANA_PORT_SFID_CFG_SFID(x)                         ((x) & GENMASK(7, 0))
+#define ANA_PORT_SFID_CFG_SFID_M                          GENMASK(7, 0)
+
+#define ANA_PFC_PFC_CFG_GSZ                               0x40
+
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA(x)                     (((x) << 2) & GENMASK(9, 2))
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA_M                      GENMASK(9, 2)
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA_X(x)                   (((x) & GENMASK(9, 2)) >> 2)
+#define ANA_PFC_PFC_CFG_FC_LINK_SPEED(x)                  ((x) & GENMASK(1, 0))
+#define ANA_PFC_PFC_CFG_FC_LINK_SPEED_M                   GENMASK(1, 0)
+
+#define ANA_PFC_PFC_TIMER_GSZ                             0x40
+#define ANA_PFC_PFC_TIMER_RSZ                             0x4
+
+#define ANA_IPT_OAM_MEP_CFG_GSZ                           0x8
+
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P(x)                  (((x) << 6) & GENMASK(10, 6))
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_M                   GENMASK(10, 6)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_X(x)                (((x) & GENMASK(10, 6)) >> 6)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX(x)                    (((x) << 1) & GENMASK(5, 1))
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_M                     GENMASK(5, 1)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_X(x)                  (((x) & GENMASK(5, 1)) >> 1)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_ENA                   BIT(0)
+
+#define ANA_IPT_IPT_GSZ                                   0x8
+
+#define ANA_IPT_IPT_IPT_CFG(x)                            (((x) << 15) & GENMASK(16, 15))
+#define ANA_IPT_IPT_IPT_CFG_M                             GENMASK(16, 15)
+#define ANA_IPT_IPT_IPT_CFG_X(x)                          (((x) & GENMASK(16, 15)) >> 15)
+#define ANA_IPT_IPT_ISDX_P(x)                             (((x) << 7) & GENMASK(14, 7))
+#define ANA_IPT_IPT_ISDX_P_M                              GENMASK(14, 7)
+#define ANA_IPT_IPT_ISDX_P_X(x)                           (((x) & GENMASK(14, 7)) >> 7)
+#define ANA_IPT_IPT_PPT_IDX(x)                            ((x) & GENMASK(6, 0))
+#define ANA_IPT_IPT_PPT_IDX_M                             GENMASK(6, 0)
+
+#define ANA_PPT_PPT_RSZ                                   0x4
+
+#define ANA_FID_MAP_FID_MAP_RSZ                           0x4
+
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL(x)                  (((x) << 6) & GENMASK(11, 6))
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL_M                   GENMASK(11, 6)
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL_X(x)                (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FID_MAP_FID_MAP_FID_B_VAL(x)                  ((x) & GENMASK(5, 0))
+#define ANA_FID_MAP_FID_MAP_FID_B_VAL_M                   GENMASK(5, 0)
+
+#define ANA_AGGR_CFG_AC_RND_ENA                           BIT(7)
+#define ANA_AGGR_CFG_AC_DMAC_ENA                          BIT(6)
+#define ANA_AGGR_CFG_AC_SMAC_ENA                          BIT(5)
+#define ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA                  BIT(4)
+#define ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA                    BIT(3)
+#define ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA                    BIT(2)
+#define ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA                    BIT(1)
+#define ANA_AGGR_CFG_AC_ISDX_ENA                          BIT(0)
+
+#define ANA_CPUQ_CFG_CPUQ_MLD(x)                          (((x) << 27) & GENMASK(29, 27))
+#define ANA_CPUQ_CFG_CPUQ_MLD_M                           GENMASK(29, 27)
+#define ANA_CPUQ_CFG_CPUQ_MLD_X(x)                        (((x) & GENMASK(29, 27)) >> 27)
+#define ANA_CPUQ_CFG_CPUQ_IGMP(x)                         (((x) << 24) & GENMASK(26, 24))
+#define ANA_CPUQ_CFG_CPUQ_IGMP_M                          GENMASK(26, 24)
+#define ANA_CPUQ_CFG_CPUQ_IGMP_X(x)                       (((x) & GENMASK(26, 24)) >> 24)
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL(x)                    (((x) << 21) & GENMASK(23, 21))
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_M                     GENMASK(23, 21)
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_X(x)                  (((x) & GENMASK(23, 21)) >> 21)
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE(x)                    (((x) << 18) & GENMASK(20, 18))
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_M                     GENMASK(20, 18)
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_X(x)                  (((x) & GENMASK(20, 18)) >> 18)
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE(x)              (((x) << 15) & GENMASK(17, 15))
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_M               GENMASK(17, 15)
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_X(x)            (((x) & GENMASK(17, 15)) >> 15)
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY(x)                     (((x) << 12) & GENMASK(14, 12))
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_M                      GENMASK(14, 12)
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_X(x)                   (((x) & GENMASK(14, 12)) >> 12)
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY(x)                     (((x) << 9) & GENMASK(11, 9))
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_M                      GENMASK(11, 9)
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_X(x)                   (((x) & GENMASK(11, 9)) >> 9)
+#define ANA_CPUQ_CFG_CPUQ_LRN(x)                          (((x) << 6) & GENMASK(8, 6))
+#define ANA_CPUQ_CFG_CPUQ_LRN_M                           GENMASK(8, 6)
+#define ANA_CPUQ_CFG_CPUQ_LRN_X(x)                        (((x) & GENMASK(8, 6)) >> 6)
+#define ANA_CPUQ_CFG_CPUQ_MIRROR(x)                       (((x) << 3) & GENMASK(5, 3))
+#define ANA_CPUQ_CFG_CPUQ_MIRROR_M                        GENMASK(5, 3)
+#define ANA_CPUQ_CFG_CPUQ_MIRROR_X(x)                     (((x) & GENMASK(5, 3)) >> 3)
+#define ANA_CPUQ_CFG_CPUQ_SFLOW(x)                        ((x) & GENMASK(2, 0))
+#define ANA_CPUQ_CFG_CPUQ_SFLOW_M                         GENMASK(2, 0)
+
+#define ANA_CPUQ_8021_CFG_RSZ                             0x4
+
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL(x)                (((x) << 6) & GENMASK(8, 6))
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_M                 GENMASK(8, 6)
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_X(x)              (((x) & GENMASK(8, 6)) >> 6)
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL(x)                (((x) << 3) & GENMASK(5, 3))
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_M                 GENMASK(5, 3)
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_X(x)              (((x) & GENMASK(5, 3)) >> 3)
+#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL(x)                 ((x) & GENMASK(2, 0))
+#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL_M                  GENMASK(2, 0)
+
+#define ANA_DSCP_CFG_RSZ                                  0x4
+
+#define ANA_DSCP_CFG_DP_DSCP_VAL                          BIT(11)
+#define ANA_DSCP_CFG_QOS_DSCP_VAL(x)                      (((x) << 8) & GENMASK(10, 8))
+#define ANA_DSCP_CFG_QOS_DSCP_VAL_M                       GENMASK(10, 8)
+#define ANA_DSCP_CFG_QOS_DSCP_VAL_X(x)                    (((x) & GENMASK(10, 8)) >> 8)
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL(x)                (((x) << 2) & GENMASK(7, 2))
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_M                 GENMASK(7, 2)
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_X(x)              (((x) & GENMASK(7, 2)) >> 2)
+#define ANA_DSCP_CFG_DSCP_TRUST_ENA                       BIT(1)
+#define ANA_DSCP_CFG_DSCP_REWR_ENA                        BIT(0)
+
+#define ANA_DSCP_REWR_CFG_RSZ                             0x4
+
+#define ANA_VCAP_RNG_TYPE_CFG_RSZ                         0x4
+
+#define ANA_VCAP_RNG_VAL_CFG_RSZ                          0x4
+
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL(x)          (((x) << 16) & GENMASK(31, 16))
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_M           GENMASK(31, 16)
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_X(x)        (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL(x)          ((x) & GENMASK(15, 0))
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL_M           GENMASK(15, 0)
+
+#define ANA_VRAP_CFG_VRAP_VLAN_AWARE_ENA                  BIT(12)
+#define ANA_VRAP_CFG_VRAP_VID(x)                          ((x) & GENMASK(11, 0))
+#define ANA_VRAP_CFG_VRAP_VID_M                           GENMASK(11, 0)
+
+#define ANA_DISCARD_CFG_DROP_TAGGING_ISDX0                BIT(3)
+#define ANA_DISCARD_CFG_DROP_CTRLPROT_ISDX0               BIT(2)
+#define ANA_DISCARD_CFG_DROP_TAGGING_S2_ENA               BIT(1)
+#define ANA_DISCARD_CFG_DROP_CTRLPROT_S2_ENA              BIT(0)
+
+#define ANA_FID_CFG_VID_MC_ENA                            BIT(0)
+
+#define ANA_POL_PIR_CFG_GSZ                               0x20
+
+#define ANA_POL_PIR_CFG_PIR_RATE(x)                       (((x) << 6) & GENMASK(20, 6))
+#define ANA_POL_PIR_CFG_PIR_RATE_M                        GENMASK(20, 6)
+#define ANA_POL_PIR_CFG_PIR_RATE_X(x)                     (((x) & GENMASK(20, 6)) >> 6)
+#define ANA_POL_PIR_CFG_PIR_BURST(x)                      ((x) & GENMASK(5, 0))
+#define ANA_POL_PIR_CFG_PIR_BURST_M                       GENMASK(5, 0)
+
+#define ANA_POL_CIR_CFG_GSZ                               0x20
+
+#define ANA_POL_CIR_CFG_CIR_RATE(x)                       (((x) << 6) & GENMASK(20, 6))
+#define ANA_POL_CIR_CFG_CIR_RATE_M                        GENMASK(20, 6)
+#define ANA_POL_CIR_CFG_CIR_RATE_X(x)                     (((x) & GENMASK(20, 6)) >> 6)
+#define ANA_POL_CIR_CFG_CIR_BURST(x)                      ((x) & GENMASK(5, 0))
+#define ANA_POL_CIR_CFG_CIR_BURST_M                       GENMASK(5, 0)
+
+#define ANA_POL_MODE_CFG_GSZ                              0x20
+
+#define ANA_POL_MODE_CFG_IPG_SIZE(x)                      (((x) << 5) & GENMASK(9, 5))
+#define ANA_POL_MODE_CFG_IPG_SIZE_M                       GENMASK(9, 5)
+#define ANA_POL_MODE_CFG_IPG_SIZE_X(x)                    (((x) & GENMASK(9, 5)) >> 5)
+#define ANA_POL_MODE_CFG_FRM_MODE(x)                      (((x) << 3) & GENMASK(4, 3))
+#define ANA_POL_MODE_CFG_FRM_MODE_M                       GENMASK(4, 3)
+#define ANA_POL_MODE_CFG_FRM_MODE_X(x)                    (((x) & GENMASK(4, 3)) >> 3)
+#define ANA_POL_MODE_CFG_DLB_COUPLED                      BIT(2)
+#define ANA_POL_MODE_CFG_CIR_ENA                          BIT(1)
+#define ANA_POL_MODE_CFG_OVERSHOOT_ENA                    BIT(0)
+
+#define ANA_POL_PIR_STATE_GSZ                             0x20
+
+#define ANA_POL_CIR_STATE_GSZ                             0x20
+
+#define ANA_POL_STATE_GSZ                                 0x20
+
+#define ANA_POL_FLOWC_RSZ                                 0x4
+
+#define ANA_POL_FLOWC_POL_FLOWC                           BIT(0)
+
+#define ANA_POL_HYST_POL_FC_HYST(x)                       (((x) << 4) & GENMASK(9, 4))
+#define ANA_POL_HYST_POL_FC_HYST_M                        GENMASK(9, 4)
+#define ANA_POL_HYST_POL_FC_HYST_X(x)                     (((x) & GENMASK(9, 4)) >> 4)
+#define ANA_POL_HYST_POL_STOP_HYST(x)                     ((x) & GENMASK(3, 0))
+#define ANA_POL_HYST_POL_STOP_HYST_M                      GENMASK(3, 0)
+
+#define ANA_POL_MISC_CFG_POL_CLOSE_ALL                    BIT(1)
+#define ANA_POL_MISC_CFG_POL_LEAK_DIS                     BIT(0)
+
+#endif
diff --git a/include/soc/mscc/ocelot_dev.h b/include/soc/mscc/ocelot_dev.h
new file mode 100644
index 000000000000..0a50d53bbd3f
--- /dev/null
+++ b/include/soc/mscc/ocelot_dev.h
@@ -0,0 +1,275 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_DEV_H_
+#define _MSCC_OCELOT_DEV_H_
+
+#define DEV_CLOCK_CFG                                     0x0
+
+#define DEV_CLOCK_CFG_MAC_TX_RST                          BIT(7)
+#define DEV_CLOCK_CFG_MAC_RX_RST                          BIT(6)
+#define DEV_CLOCK_CFG_PCS_TX_RST                          BIT(5)
+#define DEV_CLOCK_CFG_PCS_RX_RST                          BIT(4)
+#define DEV_CLOCK_CFG_PORT_RST                            BIT(3)
+#define DEV_CLOCK_CFG_PHY_RST                             BIT(2)
+#define DEV_CLOCK_CFG_LINK_SPEED(x)                       ((x) & GENMASK(1, 0))
+#define DEV_CLOCK_CFG_LINK_SPEED_M                        GENMASK(1, 0)
+
+#define DEV_PORT_MISC                                     0x4
+
+#define DEV_PORT_MISC_FWD_ERROR_ENA                       BIT(4)
+#define DEV_PORT_MISC_FWD_PAUSE_ENA                       BIT(3)
+#define DEV_PORT_MISC_FWD_CTRL_ENA                        BIT(2)
+#define DEV_PORT_MISC_DEV_LOOP_ENA                        BIT(1)
+#define DEV_PORT_MISC_HDX_FAST_DIS                        BIT(0)
+
+#define DEV_EVENTS                                        0x8
+
+#define DEV_EEE_CFG                                       0xc
+
+#define DEV_EEE_CFG_EEE_ENA                               BIT(22)
+#define DEV_EEE_CFG_EEE_TIMER_AGE(x)                      (((x) << 15) & GENMASK(21, 15))
+#define DEV_EEE_CFG_EEE_TIMER_AGE_M                       GENMASK(21, 15)
+#define DEV_EEE_CFG_EEE_TIMER_AGE_X(x)                    (((x) & GENMASK(21, 15)) >> 15)
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP(x)                   (((x) << 8) & GENMASK(14, 8))
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_M                    GENMASK(14, 8)
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_X(x)                 (((x) & GENMASK(14, 8)) >> 8)
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF(x)                  (((x) << 1) & GENMASK(7, 1))
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_M                   GENMASK(7, 1)
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_X(x)                (((x) & GENMASK(7, 1)) >> 1)
+#define DEV_EEE_CFG_PORT_LPI                              BIT(0)
+
+#define DEV_RX_PATH_DELAY                                 0x10
+
+#define DEV_TX_PATH_DELAY                                 0x14
+
+#define DEV_PTP_PREDICT_CFG                               0x18
+
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG(x)        (((x) << 4) & GENMASK(11, 4))
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_M         GENMASK(11, 4)
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_X(x)      (((x) & GENMASK(11, 4)) >> 4)
+#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG(x)      ((x) & GENMASK(3, 0))
+#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG_M       GENMASK(3, 0)
+
+#define DEV_MAC_ENA_CFG                                   0x1c
+
+#define DEV_MAC_ENA_CFG_RX_ENA                            BIT(4)
+#define DEV_MAC_ENA_CFG_TX_ENA                            BIT(0)
+
+#define DEV_MAC_MODE_CFG                                  0x20
+
+#define DEV_MAC_MODE_CFG_FC_WORD_SYNC_ENA                 BIT(8)
+#define DEV_MAC_MODE_CFG_GIGA_MODE_ENA                    BIT(4)
+#define DEV_MAC_MODE_CFG_FDX_ENA                          BIT(0)
+
+#define DEV_MAC_MAXLEN_CFG                                0x24
+
+#define DEV_MAC_TAGS_CFG                                  0x28
+
+#define DEV_MAC_TAGS_CFG_TAG_ID(x)                        (((x) << 16) & GENMASK(31, 16))
+#define DEV_MAC_TAGS_CFG_TAG_ID_M                         GENMASK(31, 16)
+#define DEV_MAC_TAGS_CFG_TAG_ID_X(x)                      (((x) & GENMASK(31, 16)) >> 16)
+#define DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA                 BIT(2)
+#define DEV_MAC_TAGS_CFG_PB_ENA                           BIT(1)
+#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA                     BIT(0)
+
+#define DEV_MAC_ADV_CHK_CFG                               0x2c
+
+#define DEV_MAC_ADV_CHK_CFG_LEN_DROP_ENA                  BIT(0)
+
+#define DEV_MAC_IFG_CFG                                   0x30
+
+#define DEV_MAC_IFG_CFG_RESTORE_OLD_IPG_CHECK             BIT(17)
+#define DEV_MAC_IFG_CFG_REDUCED_TX_IFG                    BIT(16)
+#define DEV_MAC_IFG_CFG_TX_IFG(x)                         (((x) << 8) & GENMASK(12, 8))
+#define DEV_MAC_IFG_CFG_TX_IFG_M                          GENMASK(12, 8)
+#define DEV_MAC_IFG_CFG_TX_IFG_X(x)                       (((x) & GENMASK(12, 8)) >> 8)
+#define DEV_MAC_IFG_CFG_RX_IFG2(x)                        (((x) << 4) & GENMASK(7, 4))
+#define DEV_MAC_IFG_CFG_RX_IFG2_M                         GENMASK(7, 4)
+#define DEV_MAC_IFG_CFG_RX_IFG2_X(x)                      (((x) & GENMASK(7, 4)) >> 4)
+#define DEV_MAC_IFG_CFG_RX_IFG1(x)                        ((x) & GENMASK(3, 0))
+#define DEV_MAC_IFG_CFG_RX_IFG1_M                         GENMASK(3, 0)
+
+#define DEV_MAC_HDX_CFG                                   0x34
+
+#define DEV_MAC_HDX_CFG_BYPASS_COL_SYNC                   BIT(26)
+#define DEV_MAC_HDX_CFG_OB_ENA                            BIT(25)
+#define DEV_MAC_HDX_CFG_WEXC_DIS                          BIT(24)
+#define DEV_MAC_HDX_CFG_SEED(x)                           (((x) << 16) & GENMASK(23, 16))
+#define DEV_MAC_HDX_CFG_SEED_M                            GENMASK(23, 16)
+#define DEV_MAC_HDX_CFG_SEED_X(x)                         (((x) & GENMASK(23, 16)) >> 16)
+#define DEV_MAC_HDX_CFG_SEED_LOAD                         BIT(12)
+#define DEV_MAC_HDX_CFG_RETRY_AFTER_EXC_COL_ENA           BIT(8)
+#define DEV_MAC_HDX_CFG_LATE_COL_POS(x)                   ((x) & GENMASK(6, 0))
+#define DEV_MAC_HDX_CFG_LATE_COL_POS_M                    GENMASK(6, 0)
+
+#define DEV_MAC_DBG_CFG                                   0x38
+
+#define DEV_MAC_DBG_CFG_TBI_MODE                          BIT(4)
+#define DEV_MAC_DBG_CFG_IFG_CRS_EXT_CHK_ENA               BIT(0)
+
+#define DEV_MAC_FC_MAC_LOW_CFG                            0x3c
+
+#define DEV_MAC_FC_MAC_HIGH_CFG                           0x40
+
+#define DEV_MAC_STICKY                                    0x44
+
+#define DEV_MAC_STICKY_RX_IPG_SHRINK_STICKY               BIT(9)
+#define DEV_MAC_STICKY_RX_PREAM_SHRINK_STICKY             BIT(8)
+#define DEV_MAC_STICKY_RX_CARRIER_EXT_STICKY              BIT(7)
+#define DEV_MAC_STICKY_RX_CARRIER_EXT_ERR_STICKY          BIT(6)
+#define DEV_MAC_STICKY_RX_JUNK_STICKY                     BIT(5)
+#define DEV_MAC_STICKY_TX_RETRANSMIT_STICKY               BIT(4)
+#define DEV_MAC_STICKY_TX_JAM_STICKY                      BIT(3)
+#define DEV_MAC_STICKY_TX_FIFO_OFLW_STICKY                BIT(2)
+#define DEV_MAC_STICKY_TX_FRM_LEN_OVR_STICKY              BIT(1)
+#define DEV_MAC_STICKY_TX_ABORT_STICKY                    BIT(0)
+
+#define PCS1G_CFG                                         0x48
+
+#define PCS1G_CFG_LINK_STATUS_TYPE                        BIT(4)
+#define PCS1G_CFG_AN_LINK_CTRL_ENA                        BIT(1)
+#define PCS1G_CFG_PCS_ENA                                 BIT(0)
+
+#define PCS1G_MODE_CFG                                    0x4c
+
+#define PCS1G_MODE_CFG_UNIDIR_MODE_ENA                    BIT(4)
+#define PCS1G_MODE_CFG_SGMII_MODE_ENA                     BIT(0)
+
+#define PCS1G_SD_CFG                                      0x50
+
+#define PCS1G_SD_CFG_SD_SEL                               BIT(8)
+#define PCS1G_SD_CFG_SD_POL                               BIT(4)
+#define PCS1G_SD_CFG_SD_ENA                               BIT(0)
+
+#define PCS1G_ANEG_CFG                                    0x54
+
+#define PCS1G_ANEG_CFG_ADV_ABILITY(x)                     (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_CFG_ADV_ABILITY_M                      GENMASK(31, 16)
+#define PCS1G_ANEG_CFG_ADV_ABILITY_X(x)                   (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_CFG_SW_RESOLVE_ENA                     BIT(8)
+#define PCS1G_ANEG_CFG_ANEG_RESTART_ONE_SHOT              BIT(1)
+#define PCS1G_ANEG_CFG_ANEG_ENA                           BIT(0)
+
+#define PCS1G_ANEG_NP_CFG                                 0x58
+
+#define PCS1G_ANEG_NP_CFG_NP_TX(x)                        (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_NP_CFG_NP_TX_M                         GENMASK(31, 16)
+#define PCS1G_ANEG_NP_CFG_NP_TX_X(x)                      (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_NP_CFG_NP_LOADED_ONE_SHOT              BIT(0)
+
+#define PCS1G_LB_CFG                                      0x5c
+
+#define PCS1G_LB_CFG_RA_ENA                               BIT(4)
+#define PCS1G_LB_CFG_GMII_PHY_LB_ENA                      BIT(1)
+#define PCS1G_LB_CFG_TBI_HOST_LB_ENA                      BIT(0)
+
+#define PCS1G_DBG_CFG                                     0x60
+
+#define PCS1G_DBG_CFG_UDLT                                BIT(0)
+
+#define PCS1G_CDET_CFG                                    0x64
+
+#define PCS1G_CDET_CFG_CDET_ENA                           BIT(0)
+
+#define PCS1G_ANEG_STATUS                                 0x68
+
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY(x)               (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_M                GENMASK(31, 16)
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_X(x)             (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_STATUS_PR                              BIT(4)
+#define PCS1G_ANEG_STATUS_PAGE_RX_STICKY                  BIT(3)
+#define PCS1G_ANEG_STATUS_ANEG_COMPLETE                   BIT(0)
+
+#define PCS1G_ANEG_NP_STATUS                              0x6c
+
+#define PCS1G_LINK_STATUS                                 0x70
+
+#define PCS1G_LINK_STATUS_DELAY_VAR(x)                    (((x) << 12) & GENMASK(15, 12))
+#define PCS1G_LINK_STATUS_DELAY_VAR_M                     GENMASK(15, 12)
+#define PCS1G_LINK_STATUS_DELAY_VAR_X(x)                  (((x) & GENMASK(15, 12)) >> 12)
+#define PCS1G_LINK_STATUS_SIGNAL_DETECT                   BIT(8)
+#define PCS1G_LINK_STATUS_LINK_STATUS                     BIT(4)
+#define PCS1G_LINK_STATUS_SYNC_STATUS                     BIT(0)
+
+#define PCS1G_LINK_DOWN_CNT                               0x74
+
+#define PCS1G_STICKY                                      0x78
+
+#define PCS1G_STICKY_LINK_DOWN_STICKY                     BIT(4)
+#define PCS1G_STICKY_OUT_OF_SYNC_STICKY                   BIT(0)
+
+#define PCS1G_DEBUG_STATUS                                0x7c
+
+#define PCS1G_LPI_CFG                                     0x80
+
+#define PCS1G_LPI_CFG_QSGMII_MS_SEL                       BIT(20)
+#define PCS1G_LPI_CFG_RX_LPI_OUT_DIS                      BIT(17)
+#define PCS1G_LPI_CFG_LPI_TESTMODE                        BIT(16)
+#define PCS1G_LPI_CFG_LPI_RX_WTIM(x)                      (((x) << 4) & GENMASK(5, 4))
+#define PCS1G_LPI_CFG_LPI_RX_WTIM_M                       GENMASK(5, 4)
+#define PCS1G_LPI_CFG_LPI_RX_WTIM_X(x)                    (((x) & GENMASK(5, 4)) >> 4)
+#define PCS1G_LPI_CFG_TX_ASSERT_LPIDLE                    BIT(0)
+
+#define PCS1G_LPI_WAKE_ERROR_CNT                          0x84
+
+#define PCS1G_LPI_STATUS                                  0x88
+
+#define PCS1G_LPI_STATUS_RX_LPI_FAIL                      BIT(16)
+#define PCS1G_LPI_STATUS_RX_LPI_EVENT_STICKY              BIT(12)
+#define PCS1G_LPI_STATUS_RX_QUIET                         BIT(9)
+#define PCS1G_LPI_STATUS_RX_LPI_MODE                      BIT(8)
+#define PCS1G_LPI_STATUS_TX_LPI_EVENT_STICKY              BIT(4)
+#define PCS1G_LPI_STATUS_TX_QUIET                         BIT(1)
+#define PCS1G_LPI_STATUS_TX_LPI_MODE                      BIT(0)
+
+#define PCS1G_TSTPAT_MODE_CFG                             0x8c
+
+#define PCS1G_TSTPAT_STATUS                               0x90
+
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT(x)                (((x) << 8) & GENMASK(15, 8))
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_M                 GENMASK(15, 8)
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_X(x)              (((x) & GENMASK(15, 8)) >> 8)
+#define PCS1G_TSTPAT_STATUS_JTP_ERR                       BIT(4)
+#define PCS1G_TSTPAT_STATUS_JTP_LOCK                      BIT(0)
+
+#define DEV_PCS_FX100_CFG                                 0x94
+
+#define DEV_PCS_FX100_CFG_SD_SEL                          BIT(26)
+#define DEV_PCS_FX100_CFG_SD_POL                          BIT(25)
+#define DEV_PCS_FX100_CFG_SD_ENA                          BIT(24)
+#define DEV_PCS_FX100_CFG_LOOPBACK_ENA                    BIT(20)
+#define DEV_PCS_FX100_CFG_SWAP_MII_ENA                    BIT(16)
+#define DEV_PCS_FX100_CFG_RXBITSEL(x)                     (((x) << 12) & GENMASK(15, 12))
+#define DEV_PCS_FX100_CFG_RXBITSEL_M                      GENMASK(15, 12)
+#define DEV_PCS_FX100_CFG_RXBITSEL_X(x)                   (((x) & GENMASK(15, 12)) >> 12)
+#define DEV_PCS_FX100_CFG_SIGDET_CFG(x)                   (((x) << 9) & GENMASK(10, 9))
+#define DEV_PCS_FX100_CFG_SIGDET_CFG_M                    GENMASK(10, 9)
+#define DEV_PCS_FX100_CFG_SIGDET_CFG_X(x)                 (((x) & GENMASK(10, 9)) >> 9)
+#define DEV_PCS_FX100_CFG_LINKHYST_TM_ENA                 BIT(8)
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER(x)                (((x) << 4) & GENMASK(7, 4))
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_M                 GENMASK(7, 4)
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_X(x)              (((x) & GENMASK(7, 4)) >> 4)
+#define DEV_PCS_FX100_CFG_UNIDIR_MODE_ENA                 BIT(3)
+#define DEV_PCS_FX100_CFG_FEFCHK_ENA                      BIT(2)
+#define DEV_PCS_FX100_CFG_FEFGEN_ENA                      BIT(1)
+#define DEV_PCS_FX100_CFG_PCS_ENA                         BIT(0)
+
+#define DEV_PCS_FX100_STATUS                              0x98
+
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP(x)              (((x) << 8) & GENMASK(11, 8))
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_M               GENMASK(11, 8)
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_X(x)            (((x) & GENMASK(11, 8)) >> 8)
+#define DEV_PCS_FX100_STATUS_PCS_ERROR_STICKY             BIT(7)
+#define DEV_PCS_FX100_STATUS_FEF_FOUND_STICKY             BIT(6)
+#define DEV_PCS_FX100_STATUS_SSD_ERROR_STICKY             BIT(5)
+#define DEV_PCS_FX100_STATUS_SYNC_LOST_STICKY             BIT(4)
+#define DEV_PCS_FX100_STATUS_FEF_STATUS                   BIT(2)
+#define DEV_PCS_FX100_STATUS_SIGNAL_DETECT                BIT(1)
+#define DEV_PCS_FX100_STATUS_SYNC_STATUS                  BIT(0)
+
+#endif
diff --git a/include/soc/mscc/ocelot_qsys.h b/include/soc/mscc/ocelot_qsys.h
new file mode 100644
index 000000000000..d8c63aa761be
--- /dev/null
+++ b/include/soc/mscc/ocelot_qsys.h
@@ -0,0 +1,270 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_QSYS_H_
+#define _MSCC_OCELOT_QSYS_H_
+
+#define QSYS_PORT_MODE_RSZ                                0x4
+
+#define QSYS_PORT_MODE_DEQUEUE_DIS                        BIT(1)
+#define QSYS_PORT_MODE_DEQUEUE_LATE                       BIT(0)
+
+#define QSYS_SWITCH_PORT_MODE_RSZ                         0x4
+
+#define QSYS_SWITCH_PORT_MODE_PORT_ENA                    BIT(14)
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG(x)             (((x) << 11) & GENMASK(13, 11))
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_M              GENMASK(13, 11)
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_X(x)           (((x) & GENMASK(13, 11)) >> 11)
+#define QSYS_SWITCH_PORT_MODE_YEL_RSRVD                   BIT(10)
+#define QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE           BIT(9)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA(x)               (((x) << 1) & GENMASK(8, 1))
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_M                GENMASK(8, 1)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_X(x)             (((x) & GENMASK(8, 1)) >> 1)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_MODE                 BIT(0)
+
+#define QSYS_STAT_CNT_CFG_TX_GREEN_CNT_MODE               BIT(5)
+#define QSYS_STAT_CNT_CFG_TX_YELLOW_CNT_MODE              BIT(4)
+#define QSYS_STAT_CNT_CFG_DROP_GREEN_CNT_MODE             BIT(3)
+#define QSYS_STAT_CNT_CFG_DROP_YELLOW_CNT_MODE            BIT(2)
+#define QSYS_STAT_CNT_CFG_DROP_COUNT_ONCE                 BIT(1)
+#define QSYS_STAT_CNT_CFG_DROP_COUNT_EGRESS               BIT(0)
+
+#define QSYS_EEE_CFG_RSZ                                  0x4
+
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES(x)                  (((x) << 8) & GENMASK(15, 8))
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES_M                   GENMASK(15, 8)
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES_X(x)                (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_EEE_THRES_EEE_HIGH_FRAMES(x)                 ((x) & GENMASK(7, 0))
+#define QSYS_EEE_THRES_EEE_HIGH_FRAMES_M                  GENMASK(7, 0)
+
+#define QSYS_SW_STATUS_RSZ                                0x4
+
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT(x)                  (((x) << 8) & GENMASK(12, 8))
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_M                   GENMASK(12, 8)
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_X(x)                (((x) & GENMASK(12, 8)) >> 8)
+#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK(x)                  ((x) & GENMASK(7, 0))
+#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M                   GENMASK(7, 0)
+
+#define QSYS_QMAP_GSZ                                     0x4
+
+#define QSYS_QMAP_SE_BASE(x)                              (((x) << 5) & GENMASK(12, 5))
+#define QSYS_QMAP_SE_BASE_M                               GENMASK(12, 5)
+#define QSYS_QMAP_SE_BASE_X(x)                            (((x) & GENMASK(12, 5)) >> 5)
+#define QSYS_QMAP_SE_IDX_SEL(x)                           (((x) << 2) & GENMASK(4, 2))
+#define QSYS_QMAP_SE_IDX_SEL_M                            GENMASK(4, 2)
+#define QSYS_QMAP_SE_IDX_SEL_X(x)                         (((x) & GENMASK(4, 2)) >> 2)
+#define QSYS_QMAP_SE_INP_SEL(x)                           ((x) & GENMASK(1, 0))
+#define QSYS_QMAP_SE_INP_SEL_M                            GENMASK(1, 0)
+
+#define QSYS_ISDX_SGRP_GSZ                                0x4
+
+#define QSYS_TIMED_FRAME_ENTRY_GSZ                        0x4
+
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT(x)               (((x) << 9) & GENMASK(18, 9))
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_M                GENMASK(18, 9)
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_X(x)             (((x) & GENMASK(18, 9)) >> 9)
+#define QSYS_TFRM_MISC_TIMED_CANCEL_1SHOT                 BIT(8)
+#define QSYS_TFRM_MISC_TIMED_SLOT_MODE_MC                 BIT(7)
+#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT(x)            ((x) & GENMASK(6, 0))
+#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT_M             GENMASK(6, 0)
+
+#define QSYS_RED_PROFILE_RSZ                              0x4
+
+#define QSYS_RED_PROFILE_WM_RED_LOW(x)                    (((x) << 8) & GENMASK(15, 8))
+#define QSYS_RED_PROFILE_WM_RED_LOW_M                     GENMASK(15, 8)
+#define QSYS_RED_PROFILE_WM_RED_LOW_X(x)                  (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_RED_PROFILE_WM_RED_HIGH(x)                   ((x) & GENMASK(7, 0))
+#define QSYS_RED_PROFILE_WM_RED_HIGH_M                    GENMASK(7, 0)
+
+#define QSYS_RES_CFG_GSZ                                  0x8
+
+#define QSYS_RES_STAT_GSZ                                 0x8
+
+#define QSYS_RES_STAT_INUSE(x)                            (((x) << 12) & GENMASK(23, 12))
+#define QSYS_RES_STAT_INUSE_M                             GENMASK(23, 12)
+#define QSYS_RES_STAT_INUSE_X(x)                          (((x) & GENMASK(23, 12)) >> 12)
+#define QSYS_RES_STAT_MAXUSE(x)                           ((x) & GENMASK(11, 0))
+#define QSYS_RES_STAT_MAXUSE_M                            GENMASK(11, 0)
+
+#define QSYS_EVENTS_CORE_EV_FDC(x)                        (((x) << 2) & GENMASK(4, 2))
+#define QSYS_EVENTS_CORE_EV_FDC_M                         GENMASK(4, 2)
+#define QSYS_EVENTS_CORE_EV_FDC_X(x)                      (((x) & GENMASK(4, 2)) >> 2)
+#define QSYS_EVENTS_CORE_EV_FRD(x)                        ((x) & GENMASK(1, 0))
+#define QSYS_EVENTS_CORE_EV_FRD_M                         GENMASK(1, 0)
+
+#define QSYS_QMAXSDU_CFG_0_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_1_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_2_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_3_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_4_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_5_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_6_RSZ                            0x4
+
+#define QSYS_QMAXSDU_CFG_7_RSZ                            0x4
+
+#define QSYS_PREEMPTION_CFG_RSZ                           0x4
+
+#define QSYS_PREEMPTION_CFG_P_QUEUES(x)                   ((x) & GENMASK(7, 0))
+#define QSYS_PREEMPTION_CFG_P_QUEUES_M                    GENMASK(7, 0)
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE(x)           (((x) << 8) & GENMASK(9, 8))
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_M            GENMASK(9, 8)
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_X(x)         (((x) & GENMASK(9, 8)) >> 8)
+#define QSYS_PREEMPTION_CFG_STRICT_IPG(x)                 (((x) << 12) & GENMASK(13, 12))
+#define QSYS_PREEMPTION_CFG_STRICT_IPG_M                  GENMASK(13, 12)
+#define QSYS_PREEMPTION_CFG_STRICT_IPG_X(x)               (((x) & GENMASK(13, 12)) >> 12)
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE(x)               (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_M                GENMASK(31, 16)
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_X(x)             (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_CIR_CFG_GSZ                                  0x80
+
+#define QSYS_CIR_CFG_CIR_RATE(x)                          (((x) << 6) & GENMASK(20, 6))
+#define QSYS_CIR_CFG_CIR_RATE_M                           GENMASK(20, 6)
+#define QSYS_CIR_CFG_CIR_RATE_X(x)                        (((x) & GENMASK(20, 6)) >> 6)
+#define QSYS_CIR_CFG_CIR_BURST(x)                         ((x) & GENMASK(5, 0))
+#define QSYS_CIR_CFG_CIR_BURST_M                          GENMASK(5, 0)
+
+#define QSYS_EIR_CFG_GSZ                                  0x80
+
+#define QSYS_EIR_CFG_EIR_RATE(x)                          (((x) << 7) & GENMASK(21, 7))
+#define QSYS_EIR_CFG_EIR_RATE_M                           GENMASK(21, 7)
+#define QSYS_EIR_CFG_EIR_RATE_X(x)                        (((x) & GENMASK(21, 7)) >> 7)
+#define QSYS_EIR_CFG_EIR_BURST(x)                         (((x) << 1) & GENMASK(6, 1))
+#define QSYS_EIR_CFG_EIR_BURST_M                          GENMASK(6, 1)
+#define QSYS_EIR_CFG_EIR_BURST_X(x)                       (((x) & GENMASK(6, 1)) >> 1)
+#define QSYS_EIR_CFG_EIR_MARK_ENA                         BIT(0)
+
+#define QSYS_SE_CFG_GSZ                                   0x80
+
+#define QSYS_SE_CFG_SE_DWRR_CNT(x)                        (((x) << 6) & GENMASK(9, 6))
+#define QSYS_SE_CFG_SE_DWRR_CNT_M                         GENMASK(9, 6)
+#define QSYS_SE_CFG_SE_DWRR_CNT_X(x)                      (((x) & GENMASK(9, 6)) >> 6)
+#define QSYS_SE_CFG_SE_RR_ENA                             BIT(5)
+#define QSYS_SE_CFG_SE_AVB_ENA                            BIT(4)
+#define QSYS_SE_CFG_SE_FRM_MODE(x)                        (((x) << 2) & GENMASK(3, 2))
+#define QSYS_SE_CFG_SE_FRM_MODE_M                         GENMASK(3, 2)
+#define QSYS_SE_CFG_SE_FRM_MODE_X(x)                      (((x) & GENMASK(3, 2)) >> 2)
+#define QSYS_SE_CFG_SE_EXC_ENA                            BIT(1)
+#define QSYS_SE_CFG_SE_EXC_FWD                            BIT(0)
+
+#define QSYS_SE_DWRR_CFG_GSZ                              0x80
+#define QSYS_SE_DWRR_CFG_RSZ                              0x4
+
+#define QSYS_SE_CONNECT_GSZ                               0x80
+
+#define QSYS_SE_CONNECT_SE_OUTP_IDX(x)                    (((x) << 17) & GENMASK(24, 17))
+#define QSYS_SE_CONNECT_SE_OUTP_IDX_M                     GENMASK(24, 17)
+#define QSYS_SE_CONNECT_SE_OUTP_IDX_X(x)                  (((x) & GENMASK(24, 17)) >> 17)
+#define QSYS_SE_CONNECT_SE_INP_IDX(x)                     (((x) << 9) & GENMASK(16, 9))
+#define QSYS_SE_CONNECT_SE_INP_IDX_M                      GENMASK(16, 9)
+#define QSYS_SE_CONNECT_SE_INP_IDX_X(x)                   (((x) & GENMASK(16, 9)) >> 9)
+#define QSYS_SE_CONNECT_SE_OUTP_CON(x)                    (((x) << 5) & GENMASK(8, 5))
+#define QSYS_SE_CONNECT_SE_OUTP_CON_M                     GENMASK(8, 5)
+#define QSYS_SE_CONNECT_SE_OUTP_CON_X(x)                  (((x) & GENMASK(8, 5)) >> 5)
+#define QSYS_SE_CONNECT_SE_INP_CNT(x)                     (((x) << 1) & GENMASK(4, 1))
+#define QSYS_SE_CONNECT_SE_INP_CNT_M                      GENMASK(4, 1)
+#define QSYS_SE_CONNECT_SE_INP_CNT_X(x)                   (((x) & GENMASK(4, 1)) >> 1)
+#define QSYS_SE_CONNECT_SE_TERMINAL                       BIT(0)
+
+#define QSYS_SE_DLB_SENSE_GSZ                             0x80
+
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO(x)                  (((x) << 11) & GENMASK(13, 11))
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_M                   GENMASK(13, 11)
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_X(x)                (((x) & GENMASK(13, 11)) >> 11)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT(x)                 (((x) << 7) & GENMASK(10, 7))
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_M                  GENMASK(10, 7)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_X(x)               (((x) & GENMASK(10, 7)) >> 7)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT(x)                 (((x) << 3) & GENMASK(6, 3))
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_M                  GENMASK(6, 3)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_X(x)               (((x) & GENMASK(6, 3)) >> 3)
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_ENA                 BIT(2)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_ENA                BIT(1)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_ENA                BIT(0)
+
+#define QSYS_CIR_STATE_GSZ                                0x80
+
+#define QSYS_CIR_STATE_CIR_LVL(x)                         (((x) << 4) & GENMASK(25, 4))
+#define QSYS_CIR_STATE_CIR_LVL_M                          GENMASK(25, 4)
+#define QSYS_CIR_STATE_CIR_LVL_X(x)                       (((x) & GENMASK(25, 4)) >> 4)
+#define QSYS_CIR_STATE_SHP_TIME(x)                        ((x) & GENMASK(3, 0))
+#define QSYS_CIR_STATE_SHP_TIME_M                         GENMASK(3, 0)
+
+#define QSYS_EIR_STATE_GSZ                                0x80
+
+#define QSYS_SE_STATE_GSZ                                 0x80
+
+#define QSYS_SE_STATE_SE_OUTP_LVL(x)                      (((x) << 1) & GENMASK(2, 1))
+#define QSYS_SE_STATE_SE_OUTP_LVL_M                       GENMASK(2, 1)
+#define QSYS_SE_STATE_SE_OUTP_LVL_X(x)                    (((x) & GENMASK(2, 1)) >> 1)
+#define QSYS_SE_STATE_SE_WAS_YEL                          BIT(0)
+
+#define QSYS_HSCH_MISC_CFG_SE_CONNECT_VLD                 BIT(8)
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ(x)                     (((x) << 3) & GENMASK(7, 3))
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ_M                      GENMASK(7, 3)
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ_X(x)                   (((x) & GENMASK(7, 3)) >> 3)
+#define QSYS_HSCH_MISC_CFG_LEAK_DIS                       BIT(2)
+#define QSYS_HSCH_MISC_CFG_QSHP_EXC_ENA                   BIT(1)
+#define QSYS_HSCH_MISC_CFG_PFC_BYP_UPD                    BIT(0)
+
+#define QSYS_TAG_CONFIG_RSZ                               0x4
+
+#define QSYS_TAG_CONFIG_ENABLE                            BIT(0)
+#define QSYS_TAG_CONFIG_LINK_SPEED(x)                     (((x) << 4) & GENMASK(5, 4))
+#define QSYS_TAG_CONFIG_LINK_SPEED_M                      GENMASK(5, 4)
+#define QSYS_TAG_CONFIG_LINK_SPEED_X(x)                   (((x) & GENMASK(5, 4)) >> 4)
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE(x)                (((x) << 8) & GENMASK(15, 8))
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE_M                 GENMASK(15, 8)
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE_X(x)              (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES(x)             (((x) << 16) & GENMASK(23, 16))
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_M              GENMASK(23, 16)
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_X(x)           (((x) & GENMASK(23, 16)) >> 16)
+
+#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(x)               ((x) & GENMASK(7, 0))
+#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M                GENMASK(7, 0)
+#define QSYS_TAS_PARAM_CFG_CTRL_ALWAYS_GUARD_BAND_SCH_Q   BIT(8)
+#define QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE             BIT(16)
+
+#define QSYS_PORT_MAX_SDU_RSZ                             0x4
+
+#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(x)         ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB_M          GENMASK(15, 0)
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH(x)               (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_M                GENMASK(31, 16)
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_X(x)             (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM(x)               ((x) & GENMASK(5, 0))
+#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM_M                GENMASK(5, 0)
+#define QSYS_GCL_CFG_REG_1_GATE_STATE(x)                  (((x) << 8) & GENMASK(15, 8))
+#define QSYS_GCL_CFG_REG_1_GATE_STATE_M                   GENMASK(15, 8)
+#define QSYS_GCL_CFG_REG_1_GATE_STATE_X(x)                (((x) & GENMASK(15, 8)) >> 8)
+
+#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB(x)      ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB_M       GENMASK(15, 0)
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH(x)            (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_M             GENMASK(31, 16)
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(x)          (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB(x)   ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB_M    GENMASK(15, 0)
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE(x)        (((x) << 16) & GENMASK(23, 16))
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_M         GENMASK(23, 16)
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_X(x)      (((x) & GENMASK(23, 16)) >> 16)
+#define QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING            BIT(24)
+
+#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(x)            ((x) & GENMASK(5, 0))
+#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M             GENMASK(5, 0)
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE(x)               (((x) << 8) & GENMASK(15, 8))
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE_M                GENMASK(15, 8)
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE_X(x)             (((x) & GENMASK(15, 8)) >> 8)
+
+#endif
-- 
cgit v1.2.3


From 8007880a2ca97c34e7ccd1fcf12daf854b792544 Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <zyjzyj2000@gmail.com>
Date: Sat, 14 Dec 2019 10:51:17 +0200
Subject: net/mlx5: limit the function in local scope

The function mlx5_buf_alloc_node is only used by the function in the
local scope. So it is appropriate to limit this function in the local
scope.

Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h | 2 --
 1 file changed, 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 27200dea0297..59cff380f41a 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -928,8 +928,6 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev);
 void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health);
 void mlx5_drain_health_wq(struct mlx5_core_dev *dev);
 void mlx5_trigger_health_work(struct mlx5_core_dev *dev);
-int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
-			struct mlx5_frag_buf *buf, int node);
 int mlx5_buf_alloc(struct mlx5_core_dev *dev,
 		   int size, struct mlx5_frag_buf *buf);
 void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf);
-- 
cgit v1.2.3


From dcfea72e79b0aa7a057c8f6024169d86a1bbc84b Mon Sep 17 00:00:00 2001
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Wed, 8 Jan 2020 16:59:02 -0500
Subject: net: introduce skb_list_walk_safe for skb segment walking

As part of the continual effort to remove direct usage of skb->next and
skb->prev, this patch adds a helper for iterating through the
singly-linked variant of skb lists, which are used for lists of GSO
packet. The name "skb_list_..." has been chosen to match the existing
function, "kfree_skb_list, which also operates on these singly-linked
lists, and the "..._walk_safe" part is the same idiom as elsewhere in
the kernel.

This patch removes the helper from wireguard and puts it into
linux/skbuff.h, while making it a bit more robust for general usage. In
particular, parenthesis are added around the macro argument usage, and it
now accounts for trying to iterate through an already-null skb pointer,
which will simply run the iteration zero times. This latter enhancement
means it can be used to replace both do { ... } while and while (...)
open-coded idioms.

This should take care of these three possible usages, which match all
current methods of iterations.

skb_list_walk_safe(segs, skb, next) { ... }
skb_list_walk_safe(skb, skb, next) { ... }
skb_list_walk_safe(segs, skb, segs) { ... }

Gcc appears to generate efficient code for each of these.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/skbuff.h | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'include')

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e9133bcf0544..64e5b1be9ff5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1478,6 +1478,11 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/* Iterate through singly-linked GSO fragments of an skb. */
+#define skb_list_walk_safe(first, skb, next)                                   \
+	for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb);      \
+	     (skb) = (next), (next) = (skb) ? (skb)->next : NULL)
+
 static inline void skb_list_del_init(struct sk_buff *skb)
 {
 	__list_del_entry(&skb->list);
-- 
cgit v1.2.3


From 6181e5cb752e5de9f56fbcee3f0206a2c51f1478 Mon Sep 17 00:00:00 2001
From: Vikas Gupta <vikas.gupta@broadcom.com>
Date: Thu, 2 Jan 2020 21:18:09 +0530
Subject: devlink: add support for reporter recovery completion

It is possible that a reporter recovery completion do not finish
successfully when recovery is triggered via
devlink_health_reporter_recover as recovery could be processed in
different context. In such scenario an error is returned by driver when
recover hook is invoked and successful recovery completion is
intimated later.
Expose devlink recover done API to update recovery stats.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 47f87b2fcf63..453f45cc1519 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1000,6 +1000,8 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
 void
 devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
 				     enum devlink_health_reporter_state state);
+void
+devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter);
 
 bool devlink_is_reload_failed(const struct devlink *devlink);
 
-- 
cgit v1.2.3


From 4d776482ecc689bdd68627985ac4cb5a6f325953 Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 7 Jan 2020 21:06:05 -0800
Subject: net: dsa: Get information about stacked DSA protocol

It is possible to stack multiple DSA switches in a way that they are not
part of the tree (disjoint) but the DSA master of a switch is a DSA
slave of another. When that happens switch drivers may have to know this
is the case so as to determine whether their tagging protocol has a
remove chance of working.

This is useful for specific switch drivers such as b53 where devices
have been known to be stacked in the wild without the Broadcom tag
protocol supporting that feature. This allows b53 to continue supporting
those devices by forcing the disabling of Broadcom tags on the outermost
switches if necessary.

The get_tag_protocol() function is therefore updated to gain an
additional enum dsa_tag_protocol argument which denotes the current
tagging protocol used by the DSA master we are attached to, else
DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dsa.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 0c39fed8cd99..63495e3443ac 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -380,7 +380,8 @@ typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid,
 			      bool is_static, void *data);
 struct dsa_switch_ops {
 	enum dsa_tag_protocol (*get_tag_protocol)(struct dsa_switch *ds,
-						  int port);
+						  int port,
+						  enum dsa_tag_protocol mprot);
 
 	int	(*setup)(struct dsa_switch *ds);
 	void	(*teardown)(struct dsa_switch *ds);
-- 
cgit v1.2.3


From 27ae7997a66174cb8afd6a75b3989f5e0c1b9e5a Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 8 Jan 2020 16:35:03 -0800
Subject: bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS

This patch allows the kernel's struct ops (i.e. func ptr) to be
implemented in BPF.  The first use case in this series is the
"struct tcp_congestion_ops" which will be introduced in a
latter patch.

This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
func ptr of a kernel struct.  The attr->attach_btf_id is the btf id
of a kernel struct.  The attr->expected_attach_type is the member
"index" of that kernel struct.  The first member of a struct starts
with member index 0.  That will avoid ambiguity when a kernel struct
has multiple func ptrs with the same func signature.

For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
to implement the "init" func ptr of the "struct tcp_congestion_ops".
The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
of the _running_ kernel.  The attr->expected_attach_type is 3.

The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
by arch_prepare_bpf_trampoline that will be done in the next
patch when introducing BPF_MAP_TYPE_STRUCT_OPS.

"struct bpf_struct_ops" is introduced as a common interface for the kernel
struct that supports BPF_PROG_TYPE_STRUCT_OPS prog.  The supporting kernel
struct will need to implement an instance of the "struct bpf_struct_ops".

The supporting kernel struct also needs to implement a bpf_verifier_ops.
During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
bpf_verifier_ops by searching the attr->attach_btf_id.

A new "btf_struct_access" is also added to the bpf_verifier_ops such
that the supporting kernel struct can optionally provide its own specific
check on accessing the func arg (e.g. provide limited write access).

After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
to initialize some values (e.g. the btf id of the supporting kernel
struct) and it can only be done once the btf_vmlinux is available.

The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
if the return type of the prog->aux->attach_func_proto is "void".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
---
 include/linux/bpf.h       | 30 ++++++++++++++++++++++++++++++
 include/linux/bpf_types.h |  4 ++++
 include/linux/btf.h       | 34 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/bpf.h  |  1 +
 4 files changed, 69 insertions(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b14e51d56a82..50f3b20ae284 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -349,6 +349,10 @@ struct bpf_verifier_ops {
 				  const struct bpf_insn *src,
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
+	int (*btf_struct_access)(struct bpf_verifier_log *log,
+				 const struct btf_type *t, int off, int size,
+				 enum bpf_access_type atype,
+				 u32 *next_btf_id);
 };
 
 struct bpf_prog_offload_ops {
@@ -668,6 +672,32 @@ struct bpf_array_aux {
 	struct work_struct work;
 };
 
+struct btf_type;
+struct btf_member;
+
+#define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64
+struct bpf_struct_ops {
+	const struct bpf_verifier_ops *verifier_ops;
+	int (*init)(struct btf *btf);
+	int (*check_member)(const struct btf_type *t,
+			    const struct btf_member *member);
+	const struct btf_type *type;
+	const char *name;
+	struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS];
+	u32 type_id;
+};
+
+#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
+const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id);
+void bpf_struct_ops_init(struct btf *btf);
+#else
+static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
+{
+	return NULL;
+}
+static inline void bpf_struct_ops_init(struct btf *btf) { }
+#endif
+
 struct bpf_array {
 	struct bpf_map map;
 	u32 elem_size;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 93740b3614d7..fadd243ffa2d 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -65,6 +65,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 	      struct sk_reuseport_md, struct sk_reuseport_kern)
 #endif
+#if defined(CONFIG_BPF_JIT)
+BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
+	      void *, void *)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 79d4abc2556a..f74a09a7120b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -53,6 +53,18 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 bool btf_type_is_void(const struct btf_type *t);
+s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
+const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
+					       u32 id, u32 *res_id);
+const struct btf_type *btf_type_resolve_ptr(const struct btf *btf,
+					    u32 id, u32 *res_id);
+const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
+						 u32 id, u32 *res_id);
+
+#define for_each_member(i, struct_type, member)			\
+	for (i = 0, member = btf_type_member(struct_type);	\
+	     i < btf_type_vlen(struct_type);			\
+	     i++, member++)
 
 static inline bool btf_type_is_ptr(const struct btf_type *t)
 {
@@ -84,6 +96,28 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
 }
 
+static inline u16 btf_type_vlen(const struct btf_type *t)
+{
+	return BTF_INFO_VLEN(t->info);
+}
+
+static inline bool btf_type_kflag(const struct btf_type *t)
+{
+	return BTF_INFO_KFLAG(t->info);
+}
+
+static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type,
+					   const struct btf_member *member)
+{
+	return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset)
+					   : 0;
+}
+
+static inline const struct btf_member *btf_type_member(const struct btf_type *t)
+{
+	return (const struct btf_member *)(t + 1);
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7df436da542d..c1eeb3e0e116 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -174,6 +174,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 	BPF_PROG_TYPE_TRACING,
+	BPF_PROG_TYPE_STRUCT_OPS,
 };
 
 enum bpf_attach_type {
-- 
cgit v1.2.3


From 85d33df357b634649ddbe0a20fd2d0fc5732c3cb Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 8 Jan 2020 16:35:05 -0800
Subject: bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS

The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.

The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code).  For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
	refcount_t refcnt;
	enum bpf_struct_ops_state state;
	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
is created automatically by a macro.  Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem).  The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".

Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
   set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
   running kernel.
   Instead of reusing the attr->btf_value_type_id,
   btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
   used as the "user" btf which could store other useful sysadmin/debug
   info that may be introduced in the furture,
   e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
   in the running kernel btf.  Populate the value of this object.
   The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
   the map value.  The key is always "0".

During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem.  The map will not allow further update
from this point.

Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".

Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
have the prog _id_ populated as the func ptr.

The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)

The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
for the purose of tracking the subsystem usage.  Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage.  However, that will also tie down the
future semantics of map->refcnt and map->usercnt.

The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt.  When the very last subsystem's refcnt
is gone, it will also release the map->refcnt.  All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).

Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
        "value": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": 1,
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 2,
                "init": 24,
                "release": 0,
                "ssthresh": 25,
                "cong_avoid": 30,
                "set_state": 27,
                "cwnd_event": 28,
                "in_ack_event": 26,
                "undo_cwnd": 29,
                "pkts_acked": 0,
                "min_tso_segs": 0,
                "sndbuf_expand": 0,
                "cong_control": 0,
                "get_info": 0,
                "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                ],
                "owner": 0
            }
        }
    }
]

Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
  It does an inplace update on "*value" instead returning a pointer
  to syscall.c.  Otherwise, it needs a separate copy of "zero" value
  for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

* The bpf_struct_ops_map_delete_elem() is also called without
  preempt_disable() from map_delete_elem().  It is because
  the "->unreg()" may requires sleepable context, e.g.
  the "tcp_unregister_congestion_control()".

* "const" is added to some of the existing "struct btf_func_model *"
  function arg to avoid a compiler warning caused by this patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
---
 include/linux/bpf.h       | 49 +++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/bpf_types.h |  3 +++
 include/linux/btf.h       | 13 +++++++++++++
 include/uapi/linux/bpf.h  |  7 ++++++-
 4 files changed, 69 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 50f3b20ae284..a7bfe8a388c6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -17,6 +17,7 @@
 #include <linux/u64_stats_sync.h>
 #include <linux/refcount.h>
 #include <linux/mutex.h>
+#include <linux/module.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -106,6 +107,7 @@ struct bpf_map {
 	struct btf *btf;
 	struct bpf_map_memory memory;
 	char name[BPF_OBJ_NAME_LEN];
+	u32 btf_vmlinux_value_type_id;
 	bool unpriv_array;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 	/* 22 bytes hole */
@@ -183,7 +185,8 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
 
 static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
 {
-	return map->btf && map->ops->map_seq_show_elem;
+	return (map->btf_value_type_id || map->btf_vmlinux_value_type_id) &&
+		map->ops->map_seq_show_elem;
 }
 
 int map_check_no_btf(const struct bpf_map *map,
@@ -441,7 +444,8 @@ struct btf_func_model {
  *      fentry = a set of program to run before calling original function
  *      fexit = a set of program to run after original function
  */
-int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
+int arch_prepare_bpf_trampoline(void *image, void *image_end,
+				const struct btf_func_model *m, u32 flags,
 				struct bpf_prog **fentry_progs, int fentry_cnt,
 				struct bpf_prog **fexit_progs, int fexit_cnt,
 				void *orig_call);
@@ -672,6 +676,7 @@ struct bpf_array_aux {
 	struct work_struct work;
 };
 
+struct bpf_struct_ops_value;
 struct btf_type;
 struct btf_member;
 
@@ -681,21 +686,61 @@ struct bpf_struct_ops {
 	int (*init)(struct btf *btf);
 	int (*check_member)(const struct btf_type *t,
 			    const struct btf_member *member);
+	int (*init_member)(const struct btf_type *t,
+			   const struct btf_member *member,
+			   void *kdata, const void *udata);
+	int (*reg)(void *kdata);
+	void (*unreg)(void *kdata);
 	const struct btf_type *type;
+	const struct btf_type *value_type;
 	const char *name;
 	struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS];
 	u32 type_id;
+	u32 value_id;
 };
 
 #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
+#define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
 const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id);
 void bpf_struct_ops_init(struct btf *btf);
+bool bpf_struct_ops_get(const void *kdata);
+void bpf_struct_ops_put(const void *kdata);
+int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
+				       void *value);
+static inline bool bpf_try_module_get(const void *data, struct module *owner)
+{
+	if (owner == BPF_MODULE_OWNER)
+		return bpf_struct_ops_get(data);
+	else
+		return try_module_get(owner);
+}
+static inline void bpf_module_put(const void *data, struct module *owner)
+{
+	if (owner == BPF_MODULE_OWNER)
+		bpf_struct_ops_put(data);
+	else
+		module_put(owner);
+}
 #else
 static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
 {
 	return NULL;
 }
 static inline void bpf_struct_ops_init(struct btf *btf) { }
+static inline bool bpf_try_module_get(const void *data, struct module *owner)
+{
+	return try_module_get(owner);
+}
+static inline void bpf_module_put(const void *data, struct module *owner)
+{
+	module_put(owner);
+}
+static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
+						     void *key,
+						     void *value)
+{
+	return -EINVAL;
+}
 #endif
 
 struct bpf_array {
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fadd243ffa2d..9f326e6ef885 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -109,3 +109,6 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
+#if defined(CONFIG_BPF_JIT)
+BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
+#endif
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f74a09a7120b..881e9b76ef49 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -7,6 +7,8 @@
 #include <linux/types.h>
 #include <uapi/linux/btf.h>
 
+#define BTF_TYPE_EMIT(type) ((void)(type *)0)
+
 struct btf;
 struct btf_member;
 struct btf_type;
@@ -60,6 +62,10 @@ const struct btf_type *btf_type_resolve_ptr(const struct btf *btf,
 					    u32 id, u32 *res_id);
 const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
 						 u32 id, u32 *res_id);
+const struct btf_type *
+btf_resolve_size(const struct btf *btf, const struct btf_type *type,
+		 u32 *type_size, const struct btf_type **elem_type,
+		 u32 *total_nelems);
 
 #define for_each_member(i, struct_type, member)			\
 	for (i = 0, member = btf_type_member(struct_type);	\
@@ -106,6 +112,13 @@ static inline bool btf_type_kflag(const struct btf_type *t)
 	return BTF_INFO_KFLAG(t->info);
 }
 
+static inline u32 btf_member_bit_offset(const struct btf_type *struct_type,
+					const struct btf_member *member)
+{
+	return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset)
+					   : member->offset;
+}
+
 static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type,
 					   const struct btf_member *member)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c1eeb3e0e116..38059880963e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -136,6 +136,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
 	BPF_MAP_TYPE_DEVMAP_HASH,
+	BPF_MAP_TYPE_STRUCT_OPS,
 };
 
 /* Note that tracing related programs such as
@@ -398,6 +399,10 @@ union bpf_attr {
 		__u32	btf_fd;		/* fd pointing to a BTF type data */
 		__u32	btf_key_type_id;	/* BTF type_id of the key */
 		__u32	btf_value_type_id;	/* BTF type_id of the value */
+		__u32	btf_vmlinux_value_type_id;/* BTF type_id of a kernel-
+						   * struct stored as the
+						   * map value
+						   */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -3350,7 +3355,7 @@ struct bpf_map_info {
 	__u32 map_flags;
 	char  name[BPF_OBJ_NAME_LEN];
 	__u32 ifindex;
-	__u32 :32;
+	__u32 btf_vmlinux_value_type_id;
 	__u64 netns_dev;
 	__u64 netns_ino;
 	__u32 btf_id;
-- 
cgit v1.2.3


From 0baf26b0fcd74bbfcef53c5d5e8bad2b99c8d0d2 Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 8 Jan 2020 16:35:08 -0800
Subject: bpf: tcp: Support tcp_congestion_ops in bpf

This patch makes "struct tcp_congestion_ops" to be the first user
of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
in bpf.

The BPF implemented tcp_congestion_ops can be used like
regular kernel tcp-cc through sysctl and setsockopt.  e.g.
[root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
net.ipv4.tcp_congestion_control = bpf_cubic

There has been attempt to move the TCP CC to the user space
(e.g. CCP in TCP).   The common arguments are faster turn around,
get away from long-tail kernel versions in production...etc,
which are legit points.

BPF has been the continuous effort to join both kernel and
userspace upsides together (e.g. XDP to gain the performance
advantage without bypassing the kernel).  The recent BPF
advancements (in particular BTF-aware verifier, BPF trampoline,
BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
possible in BPF.  It allows a faster turnaround for testing algorithm
in the production while leveraging the existing (and continue growing)
BPF feature/framework instead of building one specifically for
userspace TCP CC.

This patch allows write access to a few fields in tcp-sock
(in bpf_tcp_ca_btf_struct_access()).

The optional "get_info" is unsupported now.  It can be added
later.  One possible way is to output the info with a btf-id
to describe the content.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
---
 include/linux/filter.h | 2 ++
 include/net/tcp.h      | 2 ++
 2 files changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 70e6dd960bca..a366a0b64a57 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -843,6 +843,8 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog);
 int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
 			      bpf_aux_classic_check_t trans, bool save_orig);
 void bpf_prog_destroy(struct bpf_prog *fp);
+const struct bpf_func_proto *
+bpf_base_func_proto(enum bpf_func_id func_id);
 
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 int sk_attach_bpf(u32 ufd, struct sock *sk);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7df37e2fddca..9dd975be7fdf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1007,6 +1007,7 @@ enum tcp_ca_ack_event_flags {
 #define TCP_CONG_NON_RESTRICTED 0x1
 /* Requires ECN/ECT set on all packets */
 #define TCP_CONG_NEEDS_ECN	0x2
+#define TCP_CONG_MASK	(TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)
 
 union tcp_cc_info;
 
@@ -1101,6 +1102,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk);
 void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked);
 extern struct tcp_congestion_ops tcp_reno;
 
+struct tcp_congestion_ops *tcp_ca_find(const char *name);
 struct tcp_congestion_ops *tcp_ca_find_key(u32 key);
 u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca);
 #ifdef CONFIG_INET
-- 
cgit v1.2.3


From 206057fe020ac5c037d5e2dd6562a9bd216ec765 Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 8 Jan 2020 16:45:51 -0800
Subject: bpf: Add BPF_FUNC_tcp_send_ack helper

Add a helper to send out a tcp-ack.  It will be used in the later
bpf_dctcp implementation that requires to send out an ack
when the CE state changed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com
---
 include/uapi/linux/bpf.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 38059880963e..2d6a2e572f56 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2837,6 +2837,14 @@ union bpf_attr {
  * 	Return
  * 		On success, the strictly positive length of the string,	including
  * 		the trailing NUL character. On error, a negative value.
+ *
+ * int bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
+ *	Description
+ *		Send out a tcp-ack. *tp* is the in-kernel struct tcp_sock.
+ *		*rcv_nxt* is the ack_seq to be sent out.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2954,7 +2962,8 @@ union bpf_attr {
 	FN(probe_read_user),		\
 	FN(probe_read_kernel),		\
 	FN(probe_read_user_str),	\
-	FN(probe_read_kernel_str),
+	FN(probe_read_kernel_str),	\
+	FN(tcp_send_ack),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
cgit v1.2.3


From f5bfcd953d811dbb8913de36b96b38da6bb62135 Mon Sep 17 00:00:00 2001
From: Andrey Ignatov <rdna@fb.com>
Date: Tue, 7 Jan 2020 17:40:06 -0800
Subject: bpf: Document BPF_F_QUERY_EFFECTIVE flag

Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects
attach_flags what may not be obvious and what may lead to confision.

Specifically attach_flags is returned only for target_fd but if programs
are inherited from an ancestor cgroup then returned attach_flags for
current cgroup may be confusing. For example, two effective programs of
same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in
attach_flags.

Simple repro:
  # bpftool c s /sys/fs/cgroup/path/to/task
  ID       AttachType      AttachFlags     Name
  # bpftool c s /sys/fs/cgroup/path/to/task effective
  ID       AttachType      AttachFlags     Name
  95043    ingress                         tw_ipt_ingress
  95048    ingress                         tw_ingress

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
---
 include/uapi/linux/bpf.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2d6a2e572f56..52966e758fe5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -359,7 +359,12 @@ enum bpf_attach_type {
 /* Enable memory-mapping BPF map */
 #define BPF_F_MMAPABLE		(1U << 10)
 
-/* flags for BPF_PROG_QUERY */
+/* Flags for BPF_PROG_QUERY. */
+
+/* Query effective (directly attached + inherited from ancestor cgroups)
+ * programs that will be executed for events within a cgroup.
+ * attach_flags with this flag are returned only for directly attached programs.
+ */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
 enum bpf_stack_build_id_status {
-- 
cgit v1.2.3


From e9cdced78dc20c1592c1fb98ed064943007a46c5 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:14 -0800
Subject: net: Make sock protocol value checks more specific

SK_PROTOCOL_MAX is only used in two places, for DECNet and AX.25. The
limits have more to do with the those protocol definitions than they do
with the data type of sk_protocol, so remove SK_PROTOCOL_MAX and use
U8_MAX directly.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sock.h | 1 -
 1 file changed, 1 deletion(-)

(limited to 'include')

diff --git a/include/net/sock.h b/include/net/sock.h
index 8dff68b4c316..091e55428415 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -458,7 +458,6 @@ struct sock {
 				sk_userlocks : 4,
 				sk_protocol  : 8,
 				sk_type      : 16;
-#define SK_PROTOCOL_MAX U8_MAX
 	u16			sk_gso_max_segs;
 	u8			sk_pacing_shift;
 	unsigned long	        sk_lingertime;
-- 
cgit v1.2.3


From bf9765145b856fa2e238a5b8a54453795ba30ad6 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:15 -0800
Subject: sock: Make sk_protocol a 16-bit value

Match the 16-bit width of skbuff->protocol. Fills an 8-bit hole so
sizeof(struct sock) does not change.

Also take care of BPF field access for sk_type/sk_protocol. Both of them
are now outside the bitfield, so we can use load instructions without
further shifting/masking.

v5 -> v6:
 - update eBPF accessors, too (Intel's kbuild test robot)
v2 -> v3:
 - keep 'sk_type' 2 bytes aligned (Eric)
v1 -> v2:
 - preserve sk_pacing_shift as bit field (Eric)

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sock.h          | 25 +++++--------------------
 include/trace/events/sock.h |  2 +-
 2 files changed, 6 insertions(+), 21 deletions(-)

(limited to 'include')

diff --git a/include/net/sock.h b/include/net/sock.h
index 091e55428415..8766f9bc3e70 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -436,30 +436,15 @@ struct sock {
 	 * Because of non atomicity rules, all
 	 * changes are protected by socket lock.
 	 */
-	unsigned int		__sk_flags_offset[0];
-#ifdef __BIG_ENDIAN_BITFIELD
-#define SK_FL_PROTO_SHIFT  16
-#define SK_FL_PROTO_MASK   0x00ff0000
-
-#define SK_FL_TYPE_SHIFT   0
-#define SK_FL_TYPE_MASK    0x0000ffff
-#else
-#define SK_FL_PROTO_SHIFT  8
-#define SK_FL_PROTO_MASK   0x0000ff00
-
-#define SK_FL_TYPE_SHIFT   16
-#define SK_FL_TYPE_MASK    0xffff0000
-#endif
-
-	unsigned int		sk_padding : 1,
+	u8			sk_padding : 1,
 				sk_kern_sock : 1,
 				sk_no_check_tx : 1,
 				sk_no_check_rx : 1,
-				sk_userlocks : 4,
-				sk_protocol  : 8,
-				sk_type      : 16;
-	u16			sk_gso_max_segs;
+				sk_userlocks : 4;
 	u8			sk_pacing_shift;
+	u16			sk_type;
+	u16			sk_protocol;
+	u16			sk_gso_max_segs;
 	unsigned long	        sk_lingertime;
 	struct proto		*sk_prot_creator;
 	rwlock_t		sk_callback_lock;
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index 51fe9f6719eb..3ff12b90048d 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -147,7 +147,7 @@ TRACE_EVENT(inet_sock_set_state,
 		__field(__u16, sport)
 		__field(__u16, dport)
 		__field(__u16, family)
-		__field(__u8, protocol)
+		__field(__u16, protocol)
 		__array(__u8, saddr, 4)
 		__array(__u8, daddr, 4)
 		__array(__u8, saddr_v6, 16)
-- 
cgit v1.2.3


From faf391c3826cd29feae02078ca2022d2f912f7cc Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:16 -0800
Subject: tcp: Define IPPROTO_MPTCP

To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP),
IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing
IPPROTO numbers mostly map directly to IANA-specified protocol numbers.
MPTCP does not have a protocol number allocated because MPTCP packets
use the TCP protocol number. Use private number not used OTA.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/trace/events/sock.h | 3 ++-
 include/uapi/linux/in.h     | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index 3ff12b90048d..a966d4b5ab37 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -19,7 +19,8 @@
 #define inet_protocol_names		\
 		EM(IPPROTO_TCP)			\
 		EM(IPPROTO_DCCP)		\
-		EMe(IPPROTO_SCTP)
+		EM(IPPROTO_SCTP)		\
+		EMe(IPPROTO_MPTCP)
 
 #define tcp_state_names			\
 		EM(TCP_ESTABLISHED)		\
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index e7ad9d350a28..1521073b6348 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -76,6 +76,8 @@ enum {
 #define IPPROTO_MPLS		IPPROTO_MPLS
   IPPROTO_RAW = 255,		/* Raw IP packets			*/
 #define IPPROTO_RAW		IPPROTO_RAW
+  IPPROTO_MPTCP = 262,		/* Multipath TCP connection		*/
+#define IPPROTO_MPTCP		IPPROTO_MPTCP
   IPPROTO_MAX
 };
 #endif
-- 
cgit v1.2.3


From c74a39c861aeaf6b789b7abdbb3256f54c9fb365 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:17 -0800
Subject: tcp: Add MPTCP option number

TCP option 30 is allocated for MPTCP by the IANA.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tcp.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7df37e2fddca..85f1d7ff6e8b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -182,6 +182,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define TCPOPT_SACK             5       /* SACK Block */
 #define TCPOPT_TIMESTAMP	8	/* Better RTT estimations/PAWS */
 #define TCPOPT_MD5SIG		19	/* MD5 Signature (RFC2385) */
+#define TCPOPT_MPTCP		30	/* Multipath TCP (RFC6824) */
 #define TCPOPT_FASTOPEN		34	/* Fast open (RFC7413) */
 #define TCPOPT_EXP		254	/* Experimental */
 /* Magic number to be after the option value for sharing TCP
-- 
cgit v1.2.3


From 1323059301c8f36d933876233516245d882346a6 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:18 -0800
Subject: tcp, ulp: Add clone operation to tcp_ulp_ops

If ULP is used on a listening socket, icsk_ulp_ops and icsk_ulp_data are
copied when the listener is cloned. Sometimes the clone is immediately
deleted, which will invoke the release op on the clone and likely
corrupt the listening socket's icsk_ulp_data.

The clone operation is invoked immediately after the clone is copied and
gives the ULP type an opportunity to set up the clone socket and its
icsk_ulp_data.

The MPTCP ULP clone will silently fallback to plain TCP on allocation
failure, so 'clone()' does not need to return an error code.

v6 -> v7:
 - move and rename ulp clone helper to make it inline-friendly
v5 -> v6:
 - clarified MPTCP clone usage in commit message

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tcp.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 85f1d7ff6e8b..ac52633e7061 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2154,6 +2154,9 @@ struct tcp_ulp_ops {
 	/* diagnostic */
 	int (*get_info)(const struct sock *sk, struct sk_buff *skb);
 	size_t (*get_info_size)(const struct sock *sk);
+	/* clone ulp */
+	void (*clone)(const struct request_sock *req, struct sock *newsk,
+		      const gfp_t priority);
 
 	char		name[TCP_ULP_NAME_MAX];
 	struct module	*owner;
-- 
cgit v1.2.3


From 3ee17bc78e0f3fdeff9890993e8f3a9f5145163b Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:19 -0800
Subject: mptcp: Add MPTCP to skb extensions

Add enum value for MPTCP and update config dependencies

v5 -> v6:
 - fixed '__unused' field size

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/skbuff.h |  3 +++
 include/net/mptcp.h    | 28 ++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)
 create mode 100644 include/net/mptcp.h

(limited to 'include')

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 64e5b1be9ff5..f5c27600b410 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4096,6 +4096,9 @@ enum skb_ext_id {
 #endif
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	TC_SKB_EXT,
+#endif
+#if IS_ENABLED(CONFIG_MPTCP)
+	SKB_EXT_MPTCP,
 #endif
 	SKB_EXT_NUM, /* must be last */
 };
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
new file mode 100644
index 000000000000..326043c29c0a
--- /dev/null
+++ b/include/net/mptcp.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Multipath TCP
+ *
+ * Copyright (c) 2017 - 2019, Intel Corporation.
+ */
+
+#ifndef __NET_MPTCP_H
+#define __NET_MPTCP_H
+
+#include <linux/types.h>
+
+/* MPTCP sk_buff extension data */
+struct mptcp_ext {
+	u64		data_ack;
+	u64		data_seq;
+	u32		subflow_seq;
+	u16		data_len;
+	u8		use_map:1,
+			dsn64:1,
+			data_fin:1,
+			use_ack:1,
+			ack64:1,
+			__unused:3;
+	/* one byte hole */
+};
+
+#endif /* __NET_MPTCP_H */
-- 
cgit v1.2.3


From 85712484110df308215077be6ee21c4e57d7dec2 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:20 -0800
Subject: tcp: coalesce/collapse must respect MPTCP extensions

Coalesce and collapse of packets carrying MPTCP extensions is allowed
when the newer packet has no extension or the extensions carried by both
packets are equal.

This allows merging of TSO packet trains and even cross-TSO packets, and
does not require any additional action when moving data into existing
SKBs.

v3 -> v4:
 - allow collapsing, under mptcp_skb_can_collapse() constraint

v5 -> v6:
 - clarify MPTCP skb extensions must always be cleared at allocation
   time

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/mptcp.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/net/tcp.h   |  8 ++++++++
 2 files changed, 65 insertions(+)

(limited to 'include')

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 326043c29c0a..0573ae75c3db 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -8,6 +8,7 @@
 #ifndef __NET_MPTCP_H
 #define __NET_MPTCP_H
 
+#include <linux/skbuff.h>
 #include <linux/types.h>
 
 /* MPTCP sk_buff extension data */
@@ -25,4 +26,60 @@ struct mptcp_ext {
 	/* one byte hole */
 };
 
+#ifdef CONFIG_MPTCP
+
+/* move the skb extension owership, with the assumption that 'to' is
+ * newly allocated
+ */
+static inline void mptcp_skb_ext_move(struct sk_buff *to,
+				      struct sk_buff *from)
+{
+	if (!skb_ext_exist(from, SKB_EXT_MPTCP))
+		return;
+
+	if (WARN_ON_ONCE(to->active_extensions))
+		skb_ext_put(to);
+
+	to->active_extensions = from->active_extensions;
+	to->extensions = from->extensions;
+	from->active_extensions = 0;
+}
+
+static inline bool mptcp_ext_matches(const struct mptcp_ext *to_ext,
+				     const struct mptcp_ext *from_ext)
+{
+	/* MPTCP always clears the ext when adding it to the skb, so
+	 * holes do not bother us here
+	 */
+	return !from_ext ||
+	       (to_ext && from_ext &&
+	        !memcmp(from_ext, to_ext, sizeof(struct mptcp_ext)));
+}
+
+/* check if skbs can be collapsed.
+ * MPTCP collapse is allowed if neither @to or @from carry an mptcp data
+ * mapping, or if the extension of @to is the same as @from.
+ * Collapsing is not possible if @to lacks an extension, but @from carries one.
+ */
+static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
+					  const struct sk_buff *from)
+{
+	return mptcp_ext_matches(skb_ext_find(to, SKB_EXT_MPTCP),
+				 skb_ext_find(from, SKB_EXT_MPTCP));
+}
+
+#else
+
+static inline void mptcp_skb_ext_move(struct sk_buff *to,
+				      const struct sk_buff *from)
+{
+}
+
+static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
+					  const struct sk_buff *from)
+{
+	return true;
+}
+
+#endif /* CONFIG_MPTCP */
 #endif /* __NET_MPTCP_H */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index ac52633e7061..13bc83fab454 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -39,6 +39,7 @@
 #include <net/tcp_states.h>
 #include <net/inet_ecn.h>
 #include <net/dst.h>
+#include <net/mptcp.h>
 
 #include <linux/seq_file.h>
 #include <linux/memcontrol.h>
@@ -978,6 +979,13 @@ static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb)
 	return likely(!TCP_SKB_CB(skb)->eor);
 }
 
+static inline bool tcp_skb_can_collapse(const struct sk_buff *to,
+					const struct sk_buff *from)
+{
+	return likely(tcp_skb_can_collapse_to(to) &&
+		      mptcp_skb_can_collapse(to, from));
+}
+
 /* Events passed to congestion control interface */
 enum tcp_ca_event {
 	CA_EVENT_TX_START,	/* first transmit when no packets in flight */
-- 
cgit v1.2.3


From 35b2c32116091ef87a15c604cac363da8322a288 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Thu, 9 Jan 2020 07:59:21 -0800
Subject: tcp: Export TCP functions and ops struct

MPTCP will make use of tcp_send_mss() and tcp_push() when sending
data to specific TCP subflows.

tcp_request_sock_ipvX_ops and ipvX_specific will be referenced
during TCP subflow creation.

Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tcp.h | 8 ++++++++
 1 file changed, 8 insertions(+)

(limited to 'include')

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 13bc83fab454..5e4133d09b9d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -330,6 +330,9 @@ int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
 			size_t size, int flags);
 ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 		 size_t size, int flags);
+int tcp_send_mss(struct sock *sk, int *size_goal, int flags);
+void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle,
+	      int size_goal);
 void tcp_release_cb(struct sock *sk);
 void tcp_wfree(struct sk_buff *skb);
 void tcp_write_timer_handler(struct sock *sk);
@@ -2011,6 +2014,11 @@ struct tcp_request_sock_ops {
 			   enum tcp_synack_type synack_type);
 };
 
+extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops;
+#if IS_ENABLED(CONFIG_IPV6)
+extern const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops;
+#endif
+
 #ifdef CONFIG_SYN_COOKIES
 static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops,
 					 const struct sock *sk, struct sk_buff *skb,
-- 
cgit v1.2.3


From e66b2f31a068dd67172008459678821a79e4ea24 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 9 Jan 2020 07:59:23 -0800
Subject: tcp: clean ext on tx recycle

Otherwise we will find stray/unexpected/old extensions value on next
iteration.

On tcp_write_xmit() we can end-up splitting an already queued skb in two
parts, via tso_fragment(). The newly created skb can be allocated via
the tx cache and an upper layer will not be aware of it, so that upper
layer cannot set the ext properly.

Resetting the ext on recycle ensures that stale data is not propagated
in to packet headers or elsewhere.

An alternative would be add an additional hook in tso_fragment() or in
sk_stream_alloc_skb() to init the ext for upper layers that need it.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sock.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/sock.h b/include/net/sock.h
index 8766f9bc3e70..432ff73d20f3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1464,6 +1464,7 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 	sk_mem_uncharge(sk, skb->truesize);
 	if (static_branch_unlikely(&tcp_tx_skb_cache_key) &&
 	    !sk->sk_tx_skb_cache && !skb_cloned(skb)) {
+		skb_ext_reset(skb);
 		skb_zcopy_clear(skb, true);
 		sk->sk_tx_skb_cache = skb;
 		return;
-- 
cgit v1.2.3


From 8b69a803814bb8b14155ea60df83f6d57527e69e Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 9 Jan 2020 07:59:24 -0800
Subject: skb: add helpers to allocate ext independently from sk_buff

Currently we can allocate the extension only after the skb,
this change allows the user to do the opposite, will simplify
allocation failure handling from MPTCP.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/skbuff.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f5c27600b410..016b3c4ab99a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4120,6 +4120,9 @@ struct skb_ext {
 	char data[0] __aligned(8);
 };
 
+struct skb_ext *__skb_ext_alloc(void);
+void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id,
+		    struct skb_ext *ext);
 void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
 void __skb_ext_del(struct sk_buff *skb, enum skb_ext_id id);
 void __skb_ext_put(struct skb_ext *ext);
-- 
cgit v1.2.3


From 51c39bb1d5d105a02e29aa7960f0a395086e6342 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 9 Jan 2020 22:41:20 -0800
Subject: bpf: Introduce function-by-function verification

New llvm and old llvm with libbpf help produce BTF that distinguish global and
static functions. Unlike arguments of static function the arguments of global
functions cannot be removed or optimized away by llvm. The compiler has to use
exactly the arguments specified in a function prototype. The argument type
information allows the verifier validate each global function independently.
For now only supported argument types are pointer to context and scalars. In
the future pointers to structures, sizes, pointer to packet data can be
supported as well. Consider the following example:

static int f1(int ...)
{
  ...
}

int f3(int b);

int f2(int a)
{
  f1(a) + f3(a);
}

int f3(int b)
{
  ...
}

int main(...)
{
  f1(...) + f2(...) + f3(...);
}

The verifier will start its safety checks from the first global function f2().
It will recursively descend into f1() because it's static. Then it will check
that arguments match for the f3() invocation inside f2(). It will not descend
into f3(). It will finish f2() that has to be successfully verified for all
possible values of 'a'. Then it will proceed with f3(). That function also has
to be safe for all possible values of 'b'. Then it will start subprog 0 (which
is main() function). It will recursively descend into f1() and will skip full
check of f2() and f3(), since they are global. The order of processing global
functions doesn't affect safety, since all global functions must be proven safe
based on their arguments only.

Such function by function verification can drastically improve speed of the
verification and reduce complexity.

Note that the stack limit of 512 still applies to the call chain regardless whether
functions were static or global. The nested level of 8 also still applies. The
same recursion prevention checks are in place as well.

The type information and static/global kind is preserved after the verification
hence in the above example global function f2() and f3() can be replaced later
by equivalent functions with the same types that are loaded and verified later
without affecting safety of this main() program. Such replacement (re-linking)
of global functions is a subject of future patches.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
---
 include/linux/bpf.h          |  7 ++++++-
 include/linux/bpf_verifier.h | 10 ++++++++--
 include/uapi/linux/btf.h     |  6 ++++++
 3 files changed, 20 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a7bfe8a388c6..aed2bc39d72b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -566,6 +566,7 @@ static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 #endif
 
 struct bpf_func_info_aux {
+	u16 linkage;
 	bool unreliable;
 };
 
@@ -1081,7 +1082,11 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
 			   const char *func_name,
 			   struct btf_func_model *m);
 
-int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog);
+struct bpf_reg_state;
+int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
+			     struct bpf_reg_state *regs);
+int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
+			  struct bpf_reg_state *reg);
 
 struct bpf_prog *bpf_prog_by_id(u32 id);
 
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 26e40de9ef55..5406e6e96585 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -304,11 +304,13 @@ struct bpf_insn_aux_data {
 	u64 map_key_state; /* constant (32 bit) key tracking for maps */
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
-	bool seen; /* this insn was processed by the verifier */
+	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
 	bool zext_dst; /* this insn zero extends dst reg */
 	u8 alu_state; /* used in combination with alu_limit */
-	bool prune_point;
+
+	/* below fields are initialized once */
 	unsigned int orig_idx; /* original instruction index */
+	bool prune_point;
 };
 
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
@@ -379,6 +381,7 @@ struct bpf_verifier_env {
 		int *insn_stack;
 		int cur_stack;
 	} cfg;
+	u32 pass_cnt; /* number of times do_check() was called */
 	u32 subprog_cnt;
 	/* number of instructions analyzed by the verifier */
 	u32 prev_insn_processed, insn_processed;
@@ -428,4 +431,7 @@ bpf_prog_offload_replace_insn(struct bpf_verifier_env *env, u32 off,
 void
 bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
 
+int check_ctx_reg(struct bpf_verifier_env *env,
+		  const struct bpf_reg_state *reg, int regno);
+
 #endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index 1a2898c482ee..5a667107ad2c 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -146,6 +146,12 @@ enum {
 	BTF_VAR_GLOBAL_EXTERN = 2,
 };
 
+enum btf_func_linkage {
+	BTF_FUNC_STATIC = 0,
+	BTF_FUNC_GLOBAL = 1,
+	BTF_FUNC_EXTERN = 2,
+};
+
 /* BTF_KIND_VAR is followed by a single "struct btf_var" to describe
  * additional information related to the variable such as its linkage.
  */
-- 
cgit v1.2.3


From 90fbca5952436e7817910b33eb4464ddd77a8964 Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih@mellanox.com>
Date: Thu, 12 Dec 2019 13:09:24 +0200
Subject: net/mlx5: Add Virtio Emulation related device capabilities

Add Virtio Emulation related fields to the device capabilities.

It includes a general bit to indicate whether Virtio Emulation is
supported and the capabilities structure itself.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 5d54fccf87fc..c6abaf4f1c55 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -87,6 +87,7 @@ enum {
 enum {
 	MLX5_GENERAL_OBJ_TYPES_CAP_SW_ICM = (1ULL << MLX5_OBJ_TYPE_SW_ICM),
 	MLX5_GENERAL_OBJ_TYPES_CAP_GENEVE_TLV_OPT = (1ULL << 11),
+	MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q = (1ULL << 13),
 };
 
 enum {
@@ -953,6 +954,19 @@ struct mlx5_ifc_device_event_cap_bits {
 	u8         user_unaffiliated_events[4][0x40];
 };
 
+struct mlx5_ifc_device_virtio_emulation_cap_bits {
+	u8         reserved_at_0[0x20];
+
+	u8         reserved_at_20[0x13];
+	u8         log_doorbell_stride[0x5];
+	u8         reserved_at_38[0x3];
+	u8         log_doorbell_bar_size[0x5];
+
+	u8         doorbell_bar_offset[0x40];
+
+	u8         reserved_at_80[0x780];
+};
+
 enum {
 	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_1_BYTE     = 0x0,
 	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_2_BYTES    = 0x2,
@@ -2751,6 +2765,7 @@ union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_fpga_cap_bits fpga_cap;
 	struct mlx5_ifc_tls_cap_bits tls_cap;
 	struct mlx5_ifc_device_mem_cap_bits device_mem_cap;
+	struct mlx5_ifc_device_virtio_emulation_cap_bits virtio_emulation_cap;
 	u8         reserved_at_0[0x8000];
 };
 
-- 
cgit v1.2.3


From ca1992c62cadb6c8e1e1b47e197b550f3cd89b76 Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih@mellanox.com>
Date: Thu, 12 Dec 2019 13:09:25 +0200
Subject: net/mlx5: Expose vDPA emulation device capabilities

Expose vDPA emulation device capabilities from the core layer.
It includes reading the capabilities from the firmware and exposing
helper functions to access the data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/device.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index cc1c230f10ee..1a1c53f0262d 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1105,6 +1105,7 @@ enum mlx5_cap_type {
 	MLX5_CAP_DEV_MEM,
 	MLX5_CAP_RESERVED_16,
 	MLX5_CAP_TLS,
+	MLX5_CAP_VDPA_EMULATION = 0x13,
 	MLX5_CAP_DEV_EVENT = 0x14,
 	/* NUM OF CAP Types */
 	MLX5_CAP_NUM
@@ -1297,6 +1298,14 @@ enum mlx5_qcam_feature_groups {
 #define MLX5_CAP_DEV_EVENT(mdev, cap)\
 	MLX5_ADDR_OF(device_event_cap, (mdev)->caps.hca_cur[MLX5_CAP_DEV_EVENT], cap)
 
+#define MLX5_CAP_DEV_VDPA_EMULATION(mdev, cap)\
+	MLX5_GET(device_virtio_emulation_cap, \
+		(mdev)->caps.hca_cur[MLX5_CAP_VDPA_EMULATION], cap)
+
+#define MLX5_CAP64_DEV_VDPA_EMULATION(mdev, cap)\
+	MLX5_GET64(device_virtio_emulation_cap, \
+		(mdev)->caps.hca_cur[MLX5_CAP_VDPA_EMULATION], cap)
+
 enum {
 	MLX5_CMD_STAT_OK			= 0x0,
 	MLX5_CMD_STAT_INT_ERR			= 0x1,
-- 
cgit v1.2.3


From 468672b24fbc1c018e192dcc90e887bc9a9b2595 Mon Sep 17 00:00:00 2001
From: Jacob Keller <jacob.e.keller@intel.com>
Date: Thu, 9 Jan 2020 14:46:09 -0800
Subject: devlink: add macro for "fw.psid"

The "fw.psid" devlink info version is documented in devlink-info.rst,
and used by one driver. However, there is no associated macro for this
firmware version like there is for others. Add one now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 453f45cc1519..4e80d9acdb86 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -485,6 +485,8 @@ enum devlink_param_generic_id {
 #define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI	"fw.undi"
 /* NCSI support/handler version */
 #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI	"fw.ncsi"
+/* FW parameter set id */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_PSID	"fw.psid"
 
 struct devlink_region;
 struct devlink_info_req;
-- 
cgit v1.2.3


From f4bdd7103652fab5ac8b0ed75fa5cbc515b50b8b Mon Sep 17 00:00:00 2001
From: Jacob Keller <jacob.e.keller@intel.com>
Date: Thu, 9 Jan 2020 14:46:10 -0800
Subject: devlink: move devlink documentation to subfolder

Combine the documentation for devlink into a subfolder, and provide an
index.rst file that can be used to generally describe devlink.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 4e80d9acdb86..a6856f1d5d1f 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -564,7 +564,7 @@ struct devlink_trap {
 };
 
 /* All traps must be documented in
- * Documentation/networking/devlink-trap.rst
+ * Documentation/networking/devlink/devlink-trap.rst
  */
 enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
@@ -598,7 +598,7 @@ enum devlink_trap_generic_id {
 };
 
 /* All trap groups must be documented in
- * Documentation/networking/devlink-trap.rst
+ * Documentation/networking/devlink/devlink-trap.rst
  */
 enum devlink_trap_group_generic_id {
 	DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
-- 
cgit v1.2.3


From a442c2c3850dc308ab972f3d10d1077e2c8fd035 Mon Sep 17 00:00:00 2001
From: Jonathan Lemon <jonathan.lemon@gmail.com>
Date: Thu, 9 Jan 2020 11:23:17 -0800
Subject: mlx4: Bump up MAX_MSIX from 64 to 128

On modern hardware with a large number of cpus and using XDP,
the current MSIX limit is insufficient.  Bump the limit in
order to allow more queues.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/mlx4/device.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 36e412c3d657..20372de0b587 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -47,7 +47,7 @@
 #define DEFAULT_UAR_PAGE_SHIFT  12
 
 #define MAX_MSIX_P_PORT		17
-#define MAX_MSIX		64
+#define MAX_MSIX		128
 #define MIN_MSIX_P_PORT		5
 #define MLX4_IS_LEGACY_EQ_MODE(dev_cap) ((dev_cap).num_comp_vectors < \
 					 (dev_cap).num_ports * MIN_MSIX_P_PORT)
-- 
cgit v1.2.3


From c74f16b6034401b17bb1cf549871186a8ece5f92 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Sun, 12 Jan 2020 13:04:43 +0100
Subject: wan: ixp4xx_hss: prepare compile testing

The ixp4xx_hss driver needs the platform data definition and the
system clock rate to be compiled. Move both into a new platform_data
header file.

This is a prerequisite for compile testing, but turning on compile
testing requires further patches to isolate the SoC headers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/platform_data/wan_ixp4xx_hss.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 include/linux/platform_data/wan_ixp4xx_hss.h

(limited to 'include')

diff --git a/include/linux/platform_data/wan_ixp4xx_hss.h b/include/linux/platform_data/wan_ixp4xx_hss.h
new file mode 100644
index 000000000000..d525a0feb9e1
--- /dev/null
+++ b/include/linux/platform_data/wan_ixp4xx_hss.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PLATFORM_DATA_WAN_IXP4XX_HSS_H
+#define __PLATFORM_DATA_WAN_IXP4XX_HSS_H
+
+#include <linux/types.h>
+
+/* Information about built-in HSS (synchronous serial) interfaces */
+struct hss_plat_info {
+	int (*set_clock)(int port, unsigned int clock_type);
+	int (*open)(int port, void *pdev,
+		    void (*set_carrier_cb)(void *pdev, int carrier));
+	void (*close)(int port, void *pdev);
+	u8 txreadyq;
+	u32 timer_freq;
+};
+
+#endif
-- 
cgit v1.2.3


From a41a5b26d29fb0123cd3290dca453857cd8c0c66 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Sun, 12 Jan 2020 13:04:45 +0100
Subject: ixp4xx_eth: move platform_data definition

The platform data is needed to compile the driver as standalone,
so move it to a global location along with similar files.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/platform_data/eth_ixp4xx.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 include/linux/platform_data/eth_ixp4xx.h

(limited to 'include')

diff --git a/include/linux/platform_data/eth_ixp4xx.h b/include/linux/platform_data/eth_ixp4xx.h
new file mode 100644
index 000000000000..6f652ea0c6ae
--- /dev/null
+++ b/include/linux/platform_data/eth_ixp4xx.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PLATFORM_DATA_ETH_IXP4XX
+#define __PLATFORM_DATA_ETH_IXP4XX
+
+#include <linux/types.h>
+
+#define IXP4XX_ETH_NPEA		0x00
+#define IXP4XX_ETH_NPEB		0x10
+#define IXP4XX_ETH_NPEC		0x20
+
+/* Information about built-in Ethernet MAC interfaces */
+struct eth_plat_info {
+	u8 phy;		/* MII PHY ID, 0 - 31 */
+	u8 rxq;		/* configurable, currently 0 - 31 only */
+	u8 txreadyq;
+	u8 hwaddr[6];
+};
+
+#endif
-- 
cgit v1.2.3


From 0eac8ce95bb386838121189b2aa2216cd070f143 Mon Sep 17 00:00:00 2001
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Mon, 13 Jan 2020 11:22:16 +0100
Subject: ptr_ring: add include of linux/mm.h

Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails")
started to use kvmalloc_array and kvfree, which are defined in mm.h,
the previous functions kcalloc and kfree, which are defined in slab.h.

Add the missing include of linux/mm.h.  This went unnoticed as other
include files happened to include mm.h.

Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/ptr_ring.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 0abe9a4fc842..417db0a79a62 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -23,6 +23,7 @@
 #include <linux/types.h>
 #include <linux/compiler.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <asm/errno.h>
 #endif
 
-- 
cgit v1.2.3


From 579a25a854d482bc9d0f9ab0e07ba32fb66bd9e3 Mon Sep 17 00:00:00 2001
From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Mon, 13 Jan 2020 17:24:09 +0100
Subject: net: stmmac: Initial support for TBS

Adds the initial hooks for TBS support. This needs a 32 byte descriptor
in order for it to work with current HW. Adds all the logic for Enhanced
Descriptors in main core but no HW related logic for now.

Changes from v2:
- Use bitfield for TBS status / support (Jakub)
- Remove unneeded cache alignment (Jakub)
- Fix checkpatch issues

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/stmmac.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 0531afa9b21e..19190c609282 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -139,6 +139,7 @@ struct stmmac_txq_cfg {
 	u32 low_credit;
 	bool use_prio;
 	u32 prio;
+	int tbs_en;
 };
 
 struct plat_stmmacenet_data {
-- 
cgit v1.2.3


From e27f178793de16ca1b421f2c3f4bc3497b2ce723 Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sun, 12 Jan 2020 09:35:38 -0800
Subject: net: phy: Added IRQ print to phylink_bringup_phy()

The information about the PHY attached to the PHYLINK instance is useful
but is missing the IRQ prints that phy_attached_info() adds.
phy_attached_info() is a bit long and it would not be possible to use
phylink_info() anyway.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5932bb8e9c35..3a70b756ac1a 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1131,6 +1131,8 @@ static inline void phy_unlock_mdio_bus(struct phy_device *phydev)
 
 void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
 	__printf(2, 3);
+char *phy_attached_info_irq(struct phy_device *phydev)
+	__malloc;
 void phy_attached_info(struct phy_device *phydev);
 
 /* Clause 22 PHY */
-- 
cgit v1.2.3


From c0e4eadfb8daf2e9557c7450f9b237c08b404419 Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:39 +0100
Subject: net: macsec: move some definitions in a dedicated header

This patch moves some structure, type and identifier definitions into a
MACsec specific header. This patch does not modify how the MACsec code
is running and only move things around. This is a preparation for the
future MACsec hardware offloading support, which will re-use those
definitions outside macsec.c.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/macsec.h | 177 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 177 insertions(+)
 create mode 100644 include/net/macsec.h

(limited to 'include')

diff --git a/include/net/macsec.h b/include/net/macsec.h
new file mode 100644
index 000000000000..e7b41c1043f6
--- /dev/null
+++ b/include/net/macsec.h
@@ -0,0 +1,177 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * MACsec netdev header, used for h/w accelerated implementations.
+ *
+ * Copyright (c) 2015 Sabrina Dubroca <sd@queasysnail.net>
+ */
+#ifndef _NET_MACSEC_H_
+#define _NET_MACSEC_H_
+
+#include <linux/u64_stats_sync.h>
+#include <uapi/linux/if_link.h>
+#include <uapi/linux/if_macsec.h>
+
+typedef u64 __bitwise sci_t;
+
+#define MACSEC_NUM_AN 4 /* 2 bits for the association number */
+
+/**
+ * struct macsec_key - SA key
+ * @id: user-provided key identifier
+ * @tfm: crypto struct, key storage
+ */
+struct macsec_key {
+	u8 id[MACSEC_KEYID_LEN];
+	struct crypto_aead *tfm;
+};
+
+struct macsec_rx_sc_stats {
+	__u64 InOctetsValidated;
+	__u64 InOctetsDecrypted;
+	__u64 InPktsUnchecked;
+	__u64 InPktsDelayed;
+	__u64 InPktsOK;
+	__u64 InPktsInvalid;
+	__u64 InPktsLate;
+	__u64 InPktsNotValid;
+	__u64 InPktsNotUsingSA;
+	__u64 InPktsUnusedSA;
+};
+
+struct macsec_rx_sa_stats {
+	__u32 InPktsOK;
+	__u32 InPktsInvalid;
+	__u32 InPktsNotValid;
+	__u32 InPktsNotUsingSA;
+	__u32 InPktsUnusedSA;
+};
+
+struct macsec_tx_sa_stats {
+	__u32 OutPktsProtected;
+	__u32 OutPktsEncrypted;
+};
+
+struct macsec_tx_sc_stats {
+	__u64 OutPktsProtected;
+	__u64 OutPktsEncrypted;
+	__u64 OutOctetsProtected;
+	__u64 OutOctetsEncrypted;
+};
+
+/**
+ * struct macsec_rx_sa - receive secure association
+ * @active:
+ * @next_pn: packet number expected for the next packet
+ * @lock: protects next_pn manipulations
+ * @key: key structure
+ * @stats: per-SA stats
+ */
+struct macsec_rx_sa {
+	struct macsec_key key;
+	spinlock_t lock;
+	u32 next_pn;
+	refcount_t refcnt;
+	bool active;
+	struct macsec_rx_sa_stats __percpu *stats;
+	struct macsec_rx_sc *sc;
+	struct rcu_head rcu;
+};
+
+struct pcpu_rx_sc_stats {
+	struct macsec_rx_sc_stats stats;
+	struct u64_stats_sync syncp;
+};
+
+struct pcpu_tx_sc_stats {
+	struct macsec_tx_sc_stats stats;
+	struct u64_stats_sync syncp;
+};
+
+/**
+ * struct macsec_rx_sc - receive secure channel
+ * @sci: secure channel identifier for this SC
+ * @active: channel is active
+ * @sa: array of secure associations
+ * @stats: per-SC stats
+ */
+struct macsec_rx_sc {
+	struct macsec_rx_sc __rcu *next;
+	sci_t sci;
+	bool active;
+	struct macsec_rx_sa __rcu *sa[MACSEC_NUM_AN];
+	struct pcpu_rx_sc_stats __percpu *stats;
+	refcount_t refcnt;
+	struct rcu_head rcu_head;
+};
+
+/**
+ * struct macsec_tx_sa - transmit secure association
+ * @active:
+ * @next_pn: packet number to use for the next packet
+ * @lock: protects next_pn manipulations
+ * @key: key structure
+ * @stats: per-SA stats
+ */
+struct macsec_tx_sa {
+	struct macsec_key key;
+	spinlock_t lock;
+	u32 next_pn;
+	refcount_t refcnt;
+	bool active;
+	struct macsec_tx_sa_stats __percpu *stats;
+	struct rcu_head rcu;
+};
+
+/**
+ * struct macsec_tx_sc - transmit secure channel
+ * @active:
+ * @encoding_sa: association number of the SA currently in use
+ * @encrypt: encrypt packets on transmit, or authenticate only
+ * @send_sci: always include the SCI in the SecTAG
+ * @end_station:
+ * @scb: single copy broadcast flag
+ * @sa: array of secure associations
+ * @stats: stats for this TXSC
+ */
+struct macsec_tx_sc {
+	bool active;
+	u8 encoding_sa;
+	bool encrypt;
+	bool send_sci;
+	bool end_station;
+	bool scb;
+	struct macsec_tx_sa __rcu *sa[MACSEC_NUM_AN];
+	struct pcpu_tx_sc_stats __percpu *stats;
+};
+
+/**
+ * struct macsec_secy - MACsec Security Entity
+ * @netdev: netdevice for this SecY
+ * @n_rx_sc: number of receive secure channels configured on this SecY
+ * @sci: secure channel identifier used for tx
+ * @key_len: length of keys used by the cipher suite
+ * @icv_len: length of ICV used by the cipher suite
+ * @validate_frames: validation mode
+ * @operational: MAC_Operational flag
+ * @protect_frames: enable protection for this SecY
+ * @replay_protect: enable packet number checks on receive
+ * @replay_window: size of the replay window
+ * @tx_sc: transmit secure channel
+ * @rx_sc: linked list of receive secure channels
+ */
+struct macsec_secy {
+	struct net_device *netdev;
+	unsigned int n_rx_sc;
+	sci_t sci;
+	u16 key_len;
+	u16 icv_len;
+	enum macsec_validation_type validate_frames;
+	bool operational;
+	bool protect_frames;
+	bool replay_protect;
+	u32 replay_window;
+	struct macsec_tx_sc tx_sc;
+	struct macsec_rx_sc __rcu *rx_sc;
+};
+
+#endif /* _NET_MACSEC_H_ */
-- 
cgit v1.2.3


From 76564261a7db80c5f5c624e0122a28787f266bdf Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:40 +0100
Subject: net: macsec: introduce the macsec_context structure

This patch introduces the macsec_context structure. It will be used
in the kernel to exchange information between the common MACsec
implementation (macsec.c) and the MACsec hardware offloading
implementations. This structure contains pointers to MACsec specific
structures which contain the actual MACsec configuration, and to the
underlying device (phydev for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h          |  2 ++
 include/net/macsec.h         | 21 +++++++++++++++++++++
 include/uapi/linux/if_link.h |  7 +++++++
 3 files changed, 30 insertions(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 3a70b756ac1a..be079a7bb40a 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -332,6 +332,8 @@ struct phy_c45_device_ids {
 	u32 device_ids[8];
 };
 
+struct macsec_context;
+
 /* phy_device: An instance of a PHY
  *
  * drv: Pointer to the driver for this PHY instance
diff --git a/include/net/macsec.h b/include/net/macsec.h
index e7b41c1043f6..0b98803f92ec 100644
--- a/include/net/macsec.h
+++ b/include/net/macsec.h
@@ -174,4 +174,25 @@ struct macsec_secy {
 	struct macsec_rx_sc __rcu *rx_sc;
 };
 
+/**
+ * struct macsec_context - MACsec context for hardware offloading
+ */
+struct macsec_context {
+	struct phy_device *phydev;
+	enum macsec_offload offload;
+
+	struct macsec_secy *secy;
+	struct macsec_rx_sc *rx_sc;
+	struct {
+		unsigned char assoc_num;
+		u8 key[MACSEC_KEYID_LEN];
+		union {
+			struct macsec_rx_sa *rx_sa;
+			struct macsec_tx_sa *tx_sa;
+		};
+	} sa;
+
+	u8 prepare:1;
+};
+
 #endif /* _NET_MACSEC_H_ */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1d69f637c5d6..024af2d1d0af 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -486,6 +486,13 @@ enum macsec_validation_type {
 	MACSEC_VALIDATE_MAX = __MACSEC_VALIDATE_END - 1,
 };
 
+enum macsec_offload {
+	MACSEC_OFFLOAD_OFF = 0,
+	MACSEC_OFFLOAD_PHY = 1,
+	__MACSEC_OFFLOAD_END,
+	MACSEC_OFFLOAD_MAX = __MACSEC_OFFLOAD_END - 1,
+};
+
 /* IPVLAN section */
 enum {
 	IFLA_IPVLAN_UNSPEC,
-- 
cgit v1.2.3


From 0830e20b62ad156f7df5ff5b9c4cea280ebe8fef Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:41 +0100
Subject: net: macsec: introduce MACsec ops

This patch introduces MACsec ops for drivers to support offloading
MACsec operations.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/macsec.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

(limited to 'include')

diff --git a/include/net/macsec.h b/include/net/macsec.h
index 0b98803f92ec..16e7e5061178 100644
--- a/include/net/macsec.h
+++ b/include/net/macsec.h
@@ -195,4 +195,28 @@ struct macsec_context {
 	u8 prepare:1;
 };
 
+/**
+ * struct macsec_ops - MACsec offloading operations
+ */
+struct macsec_ops {
+	/* Device wide */
+	int (*mdo_dev_open)(struct macsec_context *ctx);
+	int (*mdo_dev_stop)(struct macsec_context *ctx);
+	/* SecY */
+	int (*mdo_add_secy)(struct macsec_context *ctx);
+	int (*mdo_upd_secy)(struct macsec_context *ctx);
+	int (*mdo_del_secy)(struct macsec_context *ctx);
+	/* Security channels */
+	int (*mdo_add_rxsc)(struct macsec_context *ctx);
+	int (*mdo_upd_rxsc)(struct macsec_context *ctx);
+	int (*mdo_del_rxsc)(struct macsec_context *ctx);
+	/* Security associations */
+	int (*mdo_add_rxsa)(struct macsec_context *ctx);
+	int (*mdo_upd_rxsa)(struct macsec_context *ctx);
+	int (*mdo_del_rxsa)(struct macsec_context *ctx);
+	int (*mdo_add_txsa)(struct macsec_context *ctx);
+	int (*mdo_upd_txsa)(struct macsec_context *ctx);
+	int (*mdo_del_txsa)(struct macsec_context *ctx);
+};
+
 #endif /* _NET_MACSEC_H_ */
-- 
cgit v1.2.3


From 2e18135845b359f26c37df38ba56565496517c10 Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:42 +0100
Subject: net: phy: add MACsec ops in phy_device

This patch adds a reference to MACsec ops in the phy_device, to allow
PHYs to support offloading MACsec operations. The phydev lock will be
held while calling those helpers.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index be079a7bb40a..2929d0bc307f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -333,6 +333,7 @@ struct phy_c45_device_ids {
 };
 
 struct macsec_context;
+struct macsec_ops;
 
 /* phy_device: An instance of a PHY
  *
@@ -356,6 +357,7 @@ struct macsec_context;
  * attached_dev: The attached enet driver's device instance ptr
  * adjust_link: Callback for the enet controller to respond to
  * changes in the link state.
+ * macsec_ops: MACsec offloading ops.
  *
  * speed, duplex, pause, supported, advertising, lp_advertising,
  * and autoneg are used like in mii_if_info
@@ -455,6 +457,11 @@ struct phy_device {
 
 	void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier);
 	void (*adjust_link)(struct net_device *dev);
+
+#if IS_ENABLED(CONFIG_MACSEC)
+	/* MACsec management functions */
+	const struct macsec_ops *macsec_ops;
+#endif
 };
 #define to_phy_device(d) container_of(to_mdio_device(d), \
 				      struct phy_device, mdio)
-- 
cgit v1.2.3


From dcb780fb279514f268826f2e9f4df3bc75610703 Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:44 +0100
Subject: net: macsec: add nla support for changing the offloading selection

MACsec offloading to underlying hardware devices is disabled by default
(the software implementation is used). This patch adds support for
changing this setting through the MACsec netlink interface. Many checks
are done when enabling offloading on a given MACsec interface as there
are limitations (it must be supported by the hardware, only a single
interface can be offloaded on a given physical device at a time, rules
can't be moved for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_macsec.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h
index 98e4d5d7c45c..1d63c43c38cc 100644
--- a/include/uapi/linux/if_macsec.h
+++ b/include/uapi/linux/if_macsec.h
@@ -45,6 +45,7 @@ enum macsec_attrs {
 	MACSEC_ATTR_RXSC_LIST,   /* dump, nested, macsec_rxsc_attrs for each RXSC */
 	MACSEC_ATTR_TXSC_STATS,  /* dump, nested, macsec_txsc_stats_attr */
 	MACSEC_ATTR_SECY_STATS,  /* dump, nested, macsec_secy_stats_attr */
+	MACSEC_ATTR_OFFLOAD,     /* config, nested, macsec_offload_attrs */
 	__MACSEC_ATTR_END,
 	NUM_MACSEC_ATTR = __MACSEC_ATTR_END,
 	MACSEC_ATTR_MAX = __MACSEC_ATTR_END - 1,
@@ -97,6 +98,15 @@ enum macsec_sa_attrs {
 	MACSEC_SA_ATTR_MAX = __MACSEC_SA_ATTR_END - 1,
 };
 
+enum macsec_offload_attrs {
+	MACSEC_OFFLOAD_ATTR_UNSPEC,
+	MACSEC_OFFLOAD_ATTR_TYPE, /* config/dump, u8 0..2 */
+	MACSEC_OFFLOAD_ATTR_PAD,
+	__MACSEC_OFFLOAD_ATTR_END,
+	NUM_MACSEC_OFFLOAD_ATTR = __MACSEC_OFFLOAD_ATTR_END,
+	MACSEC_OFFLOAD_ATTR_MAX = __MACSEC_OFFLOAD_ATTR_END - 1,
+};
+
 enum macsec_nl_commands {
 	MACSEC_CMD_GET_TXSC,
 	MACSEC_CMD_ADD_RXSC,
@@ -108,6 +118,7 @@ enum macsec_nl_commands {
 	MACSEC_CMD_ADD_RXSA,
 	MACSEC_CMD_DEL_RXSA,
 	MACSEC_CMD_UPD_RXSA,
+	MACSEC_CMD_UPD_OFFLOAD,
 };
 
 /* u64 per-RXSC stats */
-- 
cgit v1.2.3


From 5c937de78b39e47ce9924fc4b863c5b727edc328 Mon Sep 17 00:00:00 2001
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Mon, 13 Jan 2020 23:31:47 +0100
Subject: net: macsec: PN wrap callback

Allow to call macsec_pn_wrapped from hardware drivers to notify when a
PN rolls over. Some drivers might used an interrupt to implement this.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/macsec.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/macsec.h b/include/net/macsec.h
index 16e7e5061178..92e43db8b566 100644
--- a/include/net/macsec.h
+++ b/include/net/macsec.h
@@ -219,4 +219,6 @@ struct macsec_ops {
 	int (*mdo_del_txsa)(struct macsec_context *ctx);
 };
 
+void macsec_pn_wrapped(struct macsec_secy *secy, struct macsec_tx_sa *tx_sa);
+
 #endif /* _NET_MACSEC_H_ */
-- 
cgit v1.2.3


From 5eee7bd7e245914e4e050c413dfe864e31805207 Mon Sep 17 00:00:00 2001
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Mon, 13 Jan 2020 18:42:26 -0500
Subject: net: skbuff: disambiguate argument and member for skb_list_walk_safe
 helper

This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/skbuff.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 016b3c4ab99a..aaf73b34f72f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1479,9 +1479,9 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb)
 }
 
 /* Iterate through singly-linked GSO fragments of an skb. */
-#define skb_list_walk_safe(first, skb, next)                                   \
-	for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb);      \
-	     (skb) = (next), (next) = (skb) ? (skb)->next : NULL)
+#define skb_list_walk_safe(first, skb, next_skb)                               \
+	for ((skb) = (first), (next_skb) = (skb) ? (skb)->next : NULL; (skb);  \
+	     (skb) = (next_skb), (next_skb) = (skb) ? (skb)->next : NULL)
 
 static inline void skb_list_del_init(struct sk_buff *skb)
 {
-- 
cgit v1.2.3


From 1e301fd04eaaa5b1e3c202450d86864e6714d783 Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Tue, 14 Jan 2020 13:23:10 +0200
Subject: ipv4: Encapsulate function arguments in a struct

fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).

The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.

v2:
* Name last argument of fib_dump_info()
* Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
  later be passed to fib_alias_hw_flags_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index b9cba41c6d4f..0c071c820e33 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -204,6 +204,15 @@ __be32 fib_result_prefsrc(struct net *net, struct fib_result *res);
 #define FIB_RES_DEV(res)	(FIB_RES_NHC(res)->nhc_dev)
 #define FIB_RES_OIF(res)	(FIB_RES_NHC(res)->nhc_oif)
 
+struct fib_rt_info {
+	struct fib_info		*fi;
+	u32			tb_id;
+	__be32			dst;
+	int			dst_len;
+	u8			tos;
+	u8			type;
+};
+
 struct fib_entry_notifier_info {
 	struct fib_notifier_info info; /* must be first */
 	u32 dst;
-- 
cgit v1.2.3


From 90b93f1b31f86dcde2fa3c57f1ae33d28d87a1f8 Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Tue, 14 Jan 2020 13:23:11 +0200
Subject: ipv4: Add "offload" and "trap" indications to routes

When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h           | 4 ++++
 include/uapi/linux/rtnetlink.h | 2 ++
 2 files changed, 6 insertions(+)

(limited to 'include')

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 0c071c820e33..6a1ae49809de 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -211,6 +211,9 @@ struct fib_rt_info {
 	int			dst_len;
 	u8			tos;
 	u8			type;
+	u8			offload:1,
+				trap:1,
+				unused:6;
 };
 
 struct fib_entry_notifier_info {
@@ -473,6 +476,7 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
 void fib_nh_common_release(struct fib_nh_common *nhc);
 
 /* Exported by fib_trie.c */
+void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri);
 void fib_trie_init(void);
 struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 1418a8362bb7..cd43321d20dd 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -309,6 +309,8 @@ enum rt_scope_t {
 #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
 #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
 #define RTM_F_FIB_MATCH	        0x2000	/* return full fib lookup match */
+#define RTM_F_OFFLOAD		0x4000	/* route is offloaded */
+#define RTM_F_TRAP		0x8000	/* route is trapping packets */
 
 /* Reserved table identifiers */
 
-- 
cgit v1.2.3


From bb3c4ab93e44784c1e958bdbba7824bba40f23cd Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch@mellanox.com>
Date: Tue, 14 Jan 2020 13:23:12 +0200
Subject: ipv6: Add "offload" and "trap" indications to routes

In a similar fashion to previous patch, add "offload" and "trap"
indication to IPv6 routes.

This is done by using two unused bits in 'struct fib6_info' to hold
these indications. Capable drivers are expected to set these when
processing the various in-kernel route notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip6_fib.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index b579faea41e9..fd60a8ac02ee 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -192,7 +192,9 @@ struct fib6_info {
 					dst_nopolicy:1,
 					dst_host:1,
 					fib6_destroying:1,
-					unused:3;
+					offload:1,
+					trap:1,
+					unused:1;
 
 	struct rcu_head			rcu;
 	struct nexthop			*nh;
@@ -329,6 +331,13 @@ static inline void fib6_info_release(struct fib6_info *f6i)
 		call_rcu(&f6i->rcu, fib6_info_destroy_rcu);
 }
 
+static inline void fib6_info_hw_flags_set(struct fib6_info *f6i, bool offload,
+					  bool trap)
+{
+	f6i->offload = offload;
+	f6i->trap = trap;
+}
+
 enum fib6_walk_state {
 #ifdef CONFIG_IPV6_SUBTREES
 	FWS_S,
-- 
cgit v1.2.3


From 8dcea187088bce5d2f1149294ad109f022653547 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Tue, 14 Jan 2020 19:56:09 +0200
Subject: net: bridge: vlan: add rtm definitions and dump support

This patch adds vlan rtm definitions:
 - NEWVLAN: to be used for creating vlans, setting options and
   notifications
 - DELVLAN: to be used for deleting vlans
 - GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_bridge.h | 28 ++++++++++++++++++++++++++++
 include/uapi/linux/rtnetlink.h |  7 +++++++
 2 files changed, 35 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 4a58e3d7de46..4da04f77d9ee 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -165,6 +165,34 @@ struct bridge_stp_xstats {
 	__u64 tx_tcn;
 };
 
+/* Bridge vlan RTM header */
+struct br_vlan_msg {
+	__u8 family;
+	__u8 reserved1;
+	__u16 reserved2;
+	__u32 ifindex;
+};
+
+/* Bridge vlan RTM attributes
+ * [BRIDGE_VLANDB_ENTRY] = {
+ *     [BRIDGE_VLANDB_ENTRY_INFO]
+ *     ...
+ * }
+ */
+enum {
+	BRIDGE_VLANDB_UNSPEC,
+	BRIDGE_VLANDB_ENTRY,
+	__BRIDGE_VLANDB_MAX,
+};
+#define BRIDGE_VLANDB_MAX (__BRIDGE_VLANDB_MAX - 1)
+
+enum {
+	BRIDGE_VLANDB_ENTRY_UNSPEC,
+	BRIDGE_VLANDB_ENTRY_INFO,
+	__BRIDGE_VLANDB_ENTRY_MAX,
+};
+#define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
+
 /* Bridge multicast database attributes
  * [MDBA_MDB] = {
  *     [MDBA_MDB_ENTRY] = {
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cd43321d20dd..b333cff14071 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -171,6 +171,13 @@ enum {
 	RTM_GETLINKPROP,
 #define RTM_GETLINKPROP	RTM_GETLINKPROP
 
+	RTM_NEWVLAN = 112,
+#define RTM_NEWNVLAN	RTM_NEWVLAN
+	RTM_DELVLAN,
+#define RTM_DELVLAN	RTM_DELVLAN
+	RTM_GETVLAN,
+#define RTM_GETVLAN	RTM_GETVLAN
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
-- 
cgit v1.2.3


From 0ab558795184f87b532363e262357160d25f5d64 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Tue, 14 Jan 2020 19:56:12 +0200
Subject: net: bridge: vlan: add rtm range support

Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
similar vlans into ranges. This will be also used when per-vlan options
are introduced and vlans' options match, they will be put into a single
range which is encapsulated in one netlink attribute. We need to run
similar checks as br_process_vlan_info() does because these ranges will
be used for options setting and they'll be able to skip
br_process_vlan_info().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_bridge.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 4da04f77d9ee..ac38f0b674b8 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -189,6 +189,7 @@ enum {
 enum {
 	BRIDGE_VLANDB_ENTRY_UNSPEC,
 	BRIDGE_VLANDB_ENTRY_INFO,
+	BRIDGE_VLANDB_ENTRY_RANGE,
 	__BRIDGE_VLANDB_ENTRY_MAX,
 };
 #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
-- 
cgit v1.2.3


From cf5bddb95cbe5340e486e84d46cf2f0bb324078c Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Tue, 14 Jan 2020 19:56:13 +0200
Subject: net: bridge: vlan: add rtnetlink group and notify support

Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/rtnetlink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index b333cff14071..4a8c5b745157 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -730,6 +730,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_IPV6_MROUTE_R	RTNLGRP_IPV6_MROUTE_R
 	RTNLGRP_NEXTHOP,
 #define RTNLGRP_NEXTHOP		RTNLGRP_NEXTHOP
+	RTNLGRP_BRVLAN,
+#define RTNLGRP_BRVLAN		RTNLGRP_BRVLAN
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
-- 
cgit v1.2.3


From 8482941f09067da42f9c3362e15bfb3f3c19d610 Mon Sep 17 00:00:00 2001
From: Yonghong Song <yhs@fb.com>
Date: Tue, 14 Jan 2020 19:50:02 -0800
Subject: bpf: Add bpf_send_signal_thread() helper

Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
added helper bpf_send_signal() which permits bpf program to
send a signal to the current process. The signal may be
delivered to any threads in the process.

We found a use case where sending the signal to the current
thread is more preferable.
  - A bpf program will collect the stack trace and then
    send signal to the user application.
  - The user application will add some thread specific
    information to the just collected stack trace for
    later analysis.

If bpf_send_signal() is used, user application will need
to check whether the thread receiving the signal matches
the thread collecting the stack by checking thread id.
If not, it will need to send signal to another thread
through pthread_kill().

This patch proposed a new helper bpf_send_signal_thread(),
which sends the signal to the thread corresponding to
the current kernel task. This way, user space is guaranteed that
bpf_program execution context and user space signal handling
context are the same thread.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com
---
 include/uapi/linux/bpf.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 52966e758fe5..a88837d4dd18 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2714,7 +2714,8 @@ union bpf_attr {
  *
  * int bpf_send_signal(u32 sig)
  *	Description
- *		Send signal *sig* to the current task.
+ *		Send signal *sig* to the process of the current task.
+ *		The signal may be delivered to any of this process's threads.
  *	Return
  *		0 on success or successfully queued.
  *
@@ -2850,6 +2851,19 @@ union bpf_attr {
  *	Return
  *		0 on success, or a negative error in case of failure.
  *
+ * int bpf_send_signal_thread(u32 sig)
+ *	Description
+ *		Send signal *sig* to the thread corresponding to the current task.
+ *	Return
+ *		0 on success or successfully queued.
+ *
+ *		**-EBUSY** if work queue under nmi is full.
+ *
+ *		**-EINVAL** if *sig* is invalid.
+ *
+ *		**-EPERM** if no permission to send the *sig*.
+ *
+ *		**-EAGAIN** if bpf program can try again.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2968,7 +2982,8 @@ union bpf_attr {
 	FN(probe_read_kernel),		\
 	FN(probe_read_user_str),	\
 	FN(probe_read_kernel_str),	\
-	FN(tcp_send_ack),
+	FN(tcp_send_ack),		\
+	FN(send_signal_thread),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
cgit v1.2.3


From 600a87490ff9823d065fc15e86c709e707033ecc Mon Sep 17 00:00:00 2001
From: Alain Michaud <alainm@chromium.org>
Date: Tue, 7 Jan 2020 00:43:17 +0000
Subject: Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.

MGMT command is added to receive the list of blocked keys from
user-space.

The list is used to:
1) Block keys from being distributed by the device during
   the ke distribution phase of SMP.
2) Filter out any keys that were previously saved so
   they are no longer used.

Signed-off-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/bluetooth/hci_core.h | 10 ++++++++++
 include/net/bluetooth/mgmt.h     | 17 +++++++++++++++++
 2 files changed, 27 insertions(+)

(limited to 'include')

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index faebe3859931..89ecf0a80aa1 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -118,6 +118,13 @@ struct bt_uuid {
 	u8 svc_hint;
 };
 
+struct blocked_key {
+	struct list_head list;
+	struct rcu_head rcu;
+	u8 type;
+	u8 val[16];
+};
+
 struct smp_csrk {
 	bdaddr_t bdaddr;
 	u8 bdaddr_type;
@@ -397,6 +404,7 @@ struct hci_dev {
 	struct list_head	le_conn_params;
 	struct list_head	pend_le_conns;
 	struct list_head	pend_le_reports;
+	struct list_head	blocked_keys;
 
 	struct hci_dev_stats	stat;
 
@@ -1123,6 +1131,8 @@ struct smp_irk *hci_find_irk_by_addr(struct hci_dev *hdev, bdaddr_t *bdaddr,
 struct smp_irk *hci_add_irk(struct hci_dev *hdev, bdaddr_t *bdaddr,
 			    u8 addr_type, u8 val[16], bdaddr_t *rpa);
 void hci_remove_irk(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 addr_type);
+bool hci_is_blocked_key(struct hci_dev *hdev, u8 type, u8 val[16]);
+void hci_blocked_keys_clear(struct hci_dev *hdev);
 void hci_smp_irks_clear(struct hci_dev *hdev);
 
 bool hci_bdaddr_is_paired(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type);
diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h
index 9cee7ddc6741..a90666af05bd 100644
--- a/include/net/bluetooth/mgmt.h
+++ b/include/net/bluetooth/mgmt.h
@@ -654,6 +654,23 @@ struct mgmt_cp_set_phy_confguration {
 } __packed;
 #define MGMT_SET_PHY_CONFIGURATION_SIZE	4
 
+#define MGMT_OP_SET_BLOCKED_KEYS	0x0046
+
+#define HCI_BLOCKED_KEY_TYPE_LINKKEY	0x00
+#define HCI_BLOCKED_KEY_TYPE_LTK	0x01
+#define HCI_BLOCKED_KEY_TYPE_IRK	0x02
+
+struct mgmt_blocked_key_info {
+	__u8 type;
+	__u8 val[16];
+} __packed;
+
+struct mgmt_cp_set_blocked_keys {
+	__le16 key_count;
+	struct mgmt_blocked_key_info keys[0];
+} __packed;
+#define MGMT_OP_SET_BLOCKED_KEYS_SIZE 2
+
 #define MGMT_EV_CMD_COMPLETE		0x0001
 struct mgmt_ev_cmd_complete {
 	__le16	opcode;
-- 
cgit v1.2.3


From 4de0fc599eb936d37542f819e931ba3fd8e435ca Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Wed, 15 Jan 2020 13:02:11 -0800
Subject: Bluetooth: Add definitions for CIS connections

These adds the HCI definitions for handling CIS connections along with
ISO data packets.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/bluetooth/hci.h | 159 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 158 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 07b6ecedc6ce..6293bdd7d862 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -27,6 +27,7 @@
 
 #define HCI_MAX_ACL_SIZE	1024
 #define HCI_MAX_SCO_SIZE	255
+#define HCI_MAX_ISO_SIZE	251
 #define HCI_MAX_EVENT_SIZE	260
 #define HCI_MAX_FRAME_SIZE	(HCI_MAX_ACL_SIZE + 4)
 
@@ -303,6 +304,7 @@ enum {
 #define HCI_ACLDATA_PKT		0x02
 #define HCI_SCODATA_PKT		0x03
 #define HCI_EVENT_PKT		0x04
+#define HCI_ISODATA_PKT		0x05
 #define HCI_DIAG_PKT		0xf0
 #define HCI_VENDOR_PKT		0xff
 
@@ -352,6 +354,15 @@ enum {
 #define ACL_ACTIVE_BCAST	0x04
 #define ACL_PICO_BCAST		0x08
 
+/* ISO PB flags */
+#define ISO_START		0x00
+#define ISO_CONT		0x01
+#define ISO_SINGLE		0x02
+#define ISO_END			0x03
+
+/* ISO TS flags */
+#define ISO_TS			0x01
+
 /* Baseband links */
 #define SCO_LINK	0x00
 #define ACL_LINK	0x01
@@ -359,6 +370,7 @@ enum {
 /* Low Energy links do not have defined link type. Use invented one */
 #define LE_LINK		0x80
 #define AMP_LINK	0x81
+#define ISO_LINK	0x82
 #define INVALID_LINK	0xff
 
 /* LMP features */
@@ -440,6 +452,8 @@ enum {
 #define HCI_LE_PHY_2M			0x01
 #define HCI_LE_PHY_CODED		0x08
 #define HCI_LE_CHAN_SEL_ALG2		0x40
+#define HCI_LE_CIS_MASTER		0x10
+#define HCI_LE_CIS_SLAVE		0x20
 
 /* Connection modes */
 #define HCI_CM_ACTIVE	0x0000
@@ -1718,6 +1732,86 @@ struct hci_cp_le_set_adv_set_rand_addr {
 	bdaddr_t  bdaddr;
 } __packed;
 
+#define HCI_OP_LE_READ_BUFFER_SIZE_V2	0x2060
+struct hci_rp_le_read_buffer_size_v2 {
+	__u8    status;
+	__le16  acl_mtu;
+	__u8    acl_max_pkt;
+	__le16  iso_mtu;
+	__u8    iso_max_pkt;
+} __packed;
+
+#define HCI_OP_LE_READ_ISO_TX_SYNC		0x2061
+struct hci_cp_le_read_iso_tx_sync {
+	__le16  handle;
+} __packed;
+
+struct hci_rp_le_read_iso_tx_sync {
+	__u8    status;
+	__le16  handle;
+	__le16  seq;
+	__le32  imestamp;
+	__u8    offset[3];
+} __packed;
+
+#define HCI_OP_LE_SET_CIG_PARAMS		0x2062
+struct hci_cis_params {
+	__u8    cis_id;
+	__le16  m_sdu;
+	__le16  s_sdu;
+	__u8    m_phy;
+	__u8    s_phy;
+	__u8    m_rtn;
+	__u8    s_rtn;
+} __packed;
+
+struct hci_cp_le_set_cig_params {
+	__u8    cig_id;
+	__u8    m_interval[3];
+	__u8    s_interval[3];
+	__u8    sca;
+	__u8    packing;
+	__u8    framing;
+	__le16  m_latency;
+	__le16  s_latency;
+	__u8    num_cis;
+	struct hci_cis_params cis[0];
+} __packed;
+
+struct hci_rp_le_set_cig_params {
+	__u8    status;
+	__u8    cig_id;
+	__u8    num_handles;
+	__le16  handle[0];
+} __packed;
+
+#define HCI_OP_LE_CREATE_CIS			0x2064
+struct hci_cis {
+	__le16  cis_handle;
+	__le16  acl_handle;
+} __packed;
+
+struct hci_cp_le_create_cis {
+	__u8    num_cis;
+	struct hci_cis cis[0];
+} __packed;
+
+#define HCI_OP_LE_REMOVE_CIG			0x2065
+struct hci_cp_le_remove_cig {
+	__u8    cig_id;
+} __packed;
+
+#define HCI_OP_LE_ACCEPT_CIS			0x2066
+struct hci_cp_le_accept_cis {
+	__le16  handle;
+} __packed;
+
+#define HCI_OP_LE_REJECT_CIS			0x2067
+struct hci_cp_le_reject_cis {
+	__le16  handle;
+	__u8    reason;
+} __packed;
+
 /* ---- HCI Events ---- */
 #define HCI_EV_INQUIRY_COMPLETE		0x01
 
@@ -2189,7 +2283,7 @@ struct hci_ev_le_direct_adv_info {
 #define HCI_EV_LE_PHY_UPDATE_COMPLETE	0x0c
 struct hci_ev_le_phy_update_complete {
 	__u8  status;
-	__u16 handle;
+	__le16 handle;
 	__u8  tx_phy;
 	__u8  rx_phy;
 } __packed;
@@ -2234,6 +2328,34 @@ struct hci_evt_le_ext_adv_set_term {
 	__u8	num_evts;
 } __packed;
 
+#define HCI_EVT_LE_CIS_ESTABLISHED	0x19
+struct hci_evt_le_cis_established {
+	__u8  status;
+	__le16 handle;
+	__u8  cig_sync_delay[3];
+	__u8  cis_sync_delay[3];
+	__u8  m_latency[3];
+	__u8  s_latency[3];
+	__u8  m_phy;
+	__u8  s_phy;
+	__u8  nse;
+	__u8  m_bn;
+	__u8  s_bn;
+	__u8  m_ft;
+	__u8  s_ft;
+	__le16 m_mtu;
+	__le16 s_mtu;
+	__le16 interval;
+} __packed;
+
+#define HCI_EVT_LE_CIS_REQ		0x1a
+struct hci_evt_le_cis_req {
+	__le16 acl_handle;
+	__le16 cis_handle;
+	__u8  cig_id;
+	__u8  cis_id;
+} __packed;
+
 #define HCI_EV_VENDOR			0xff
 
 /* Internal events generated by Bluetooth stack */
@@ -2262,6 +2384,7 @@ struct hci_ev_si_security {
 #define HCI_EVENT_HDR_SIZE   2
 #define HCI_ACL_HDR_SIZE     4
 #define HCI_SCO_HDR_SIZE     3
+#define HCI_ISO_HDR_SIZE     4
 
 struct hci_command_hdr {
 	__le16	opcode;		/* OCF & OGF */
@@ -2283,6 +2406,30 @@ struct hci_sco_hdr {
 	__u8	dlen;
 } __packed;
 
+struct hci_iso_hdr {
+	__le16	handle;
+	__le16	dlen;
+	__u8	data[0];
+} __packed;
+
+/* ISO data packet status flags */
+#define HCI_ISO_STATUS_VALID	0x00
+#define HCI_ISO_STATUS_INVALID	0x01
+#define HCI_ISO_STATUS_NOP	0x02
+
+#define HCI_ISO_DATA_HDR_SIZE	4
+struct hci_iso_data_hdr {
+	__le16	sn;
+	__le16	slen;
+};
+
+#define HCI_ISO_TS_DATA_HDR_SIZE 8
+struct hci_iso_ts_data_hdr {
+	__le32	ts;
+	__le16	sn;
+	__le16	slen;
+};
+
 static inline struct hci_event_hdr *hci_event_hdr(const struct sk_buff *skb)
 {
 	return (struct hci_event_hdr *) skb->data;
@@ -2308,4 +2455,14 @@ static inline struct hci_sco_hdr *hci_sco_hdr(const struct sk_buff *skb)
 #define hci_handle(h)		(h & 0x0fff)
 #define hci_flags(h)		(h >> 12)
 
+/* ISO handle and flags pack/unpack */
+#define hci_iso_flags_pb(f)		(f & 0x0003)
+#define hci_iso_flags_ts(f)		((f >> 2) & 0x0001)
+#define hci_iso_flags_pack(pb, ts)	((pb & 0x03) | ((ts & 0x01) << 2))
+
+/* ISO data length and flags pack/unpack */
+#define hci_iso_data_len_pack(h, f)	((__u16) ((h) | ((f) << 14)))
+#define hci_iso_data_len(h)		((h) & 0x3fff)
+#define hci_iso_data_flags(h)		((h) >> 14)
+
 #endif /* __HCI_H */
-- 
cgit v1.2.3


From f9a619db7c137b7c2dec0414d8deb8ec762ae8f9 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Wed, 15 Jan 2020 13:02:17 -0800
Subject: Bluetooth: monitor: Add support for ISO packets

This enables passing ISO packets to the monitor socket.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/bluetooth/hci_mon.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/bluetooth/hci_mon.h b/include/net/bluetooth/hci_mon.h
index 240786b04a46..2d5fcda1bcd0 100644
--- a/include/net/bluetooth/hci_mon.h
+++ b/include/net/bluetooth/hci_mon.h
@@ -49,6 +49,8 @@ struct hci_mon_hdr {
 #define HCI_MON_CTRL_CLOSE	15
 #define HCI_MON_CTRL_COMMAND	16
 #define HCI_MON_CTRL_EVENT	17
+#define HCI_MON_ISO_TX_PKT	18
+#define HCI_MON_ISO_RX_PKT	19
 
 struct hci_mon_new_index {
 	__u8		type;
-- 
cgit v1.2.3


From cb4d03ab499d4c040f4ab6fd4389d2b49f42b5a5 Mon Sep 17 00:00:00 2001
From: Brian Vazquez <brianvv@google.com>
Date: Wed, 15 Jan 2020 10:43:01 -0800
Subject: bpf: Add generic support for lookup batch op

This commit introduces generic support for the bpf_map_lookup_batch.
This implementation can be used by almost all the bpf maps since its core
implementation is relying on the existing map_get_next_key and
map_lookup_elem. The bpf syscall subcommand introduced is:

  BPF_MAP_LOOKUP_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;

in_batch/out_batch are opaque values use to communicate between
user/kernel space, in_batch/out_batch must be of key_size length.

To start iterating from the beginning in_batch must be null,
count is the # of key/value elements to retrieve. Note that the 'keys'
buffer must be a buffer of key_size * count size and the 'values' buffer
must be value_size * count, where value_size must be aligned to 8 bytes
by userspace if it's dealing with percpu maps. 'count' will contain the
number of keys/values successfully retrieved. Note that 'count' is an
input/output variable and it can contain a lower value after a call.

If there's no more entries to retrieve, ENOENT will be returned. If error
is ENOENT, count might be > 0 in case it copied some values but there were
no more entries to retrieve.

Note that if the return code is an error and not -EFAULT,
count indicates the number of elements successfully processed.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
---
 include/linux/bpf.h      |  5 +++++
 include/uapi/linux/bpf.h | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aed2bc39d72b..807744ecaa5a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -44,6 +44,8 @@ struct bpf_map_ops {
 	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
 	void (*map_release_uref)(struct bpf_map *map);
 	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
+	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
 
 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
@@ -982,6 +984,9 @@ void *bpf_map_area_alloc(u64 size, int numa_node);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
+int  generic_map_lookup_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a88837d4dd18..0721b2c6bb8c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -107,6 +107,7 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
 	BPF_BTF_GET_NEXT_ID,
+	BPF_MAP_LOOKUP_BATCH,
 };
 
 enum bpf_map_type {
@@ -420,6 +421,23 @@ union bpf_attr {
 		__u64		flags;
 	};
 
+	struct { /* struct used by BPF_MAP_*_BATCH commands */
+		__aligned_u64	in_batch;	/* start batch,
+						 * NULL to start from beginning
+						 */
+		__aligned_u64	out_batch;	/* output: next start batch */
+		__aligned_u64	keys;
+		__aligned_u64	values;
+		__u32		count;		/* input/output:
+						 * input: # of key/value
+						 * elements
+						 * output: # of filled elements
+						 */
+		__u32		map_fd;
+		__u64		elem_flags;
+		__u64		flags;
+	} batch;
+
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
 		__u32		prog_type;	/* one of enum bpf_prog_type */
 		__u32		insn_cnt;
-- 
cgit v1.2.3


From aa2e93b8e58e18442edfb2427446732415bc215e Mon Sep 17 00:00:00 2001
From: Brian Vazquez <brianvv@google.com>
Date: Wed, 15 Jan 2020 10:43:02 -0800
Subject: bpf: Add generic support for update and delete batch ops

This commit adds generic support for update and delete batch ops that
can be used for almost all the bpf maps. These commands share the same
UAPI attr that lookup and lookup_and_delete batch ops use and the
syscall commands are:

  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The main difference between update/delete and lookup batch ops is that
for update/delete keys/values must be specified for userspace and
because of that, neither in_batch nor out_batch are used.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
---
 include/linux/bpf.h      | 10 ++++++++++
 include/uapi/linux/bpf.h |  2 ++
 2 files changed, 12 insertions(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 807744ecaa5a..05466ad6cf1c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -46,6 +46,10 @@ struct bpf_map_ops {
 	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
 	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
 				union bpf_attr __user *uattr);
+	int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
+	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
 
 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
@@ -987,6 +991,12 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
 			      const union bpf_attr *attr,
 			      union bpf_attr __user *uattr);
+int  generic_map_update_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
+int  generic_map_delete_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0721b2c6bb8c..d5320b0be459 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -108,6 +108,8 @@ enum bpf_cmd {
 	BPF_MAP_FREEZE,
 	BPF_BTF_GET_NEXT_ID,
 	BPF_MAP_LOOKUP_BATCH,
+	BPF_MAP_UPDATE_BATCH,
+	BPF_MAP_DELETE_BATCH,
 };
 
 enum bpf_map_type {
-- 
cgit v1.2.3


From 057996380a42bb64ccc04383cfa9c0ace4ea11f0 Mon Sep 17 00:00:00 2001
From: Yonghong Song <yhs@fb.com>
Date: Wed, 15 Jan 2020 10:43:04 -0800
Subject: bpf: Add batch ops to all htab bpf map

htab can't use generic batch support due some problematic behaviours
inherent to the data structre, i.e. while iterating the bpf map  a
concurrent program might delete the next entry that batch was about to
use, in that case there's no easy solution to retrieve the next entry,
the issue has been discussed multiple times (see [1] and [2]).

The only way hmap can be traversed without the problem previously
exposed is by making sure that the map is traversing entire buckets.
This commit implements those strict requirements for hmap, the
implementation follows the same interaction that generic support with
some exceptions:

 - If keys/values buffer are not big enough to traverse a bucket,
   ENOSPC will be returned.
 - out_batch contains the value of the next bucket in the iteration, not
   the next key, but this is transparent for the user since the user
   should never use out_batch for other than bpf batch syscalls.

This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new
command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete
batch ops it is possible to use the generic implementations.

[1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
[2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
---
 include/linux/bpf.h      | 3 +++
 include/uapi/linux/bpf.h | 1 +
 2 files changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 05466ad6cf1c..3517e32149a4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -46,6 +46,9 @@ struct bpf_map_ops {
 	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
 	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
 				union bpf_attr __user *uattr);
+	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
+					   const union bpf_attr *attr,
+					   union bpf_attr __user *uattr);
 	int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
 				union bpf_attr __user *uattr);
 	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d5320b0be459..033d90a2282d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -108,6 +108,7 @@ enum bpf_cmd {
 	BPF_MAP_FREEZE,
 	BPF_BTF_GET_NEXT_ID,
 	BPF_MAP_LOOKUP_BATCH,
+	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
 };
-- 
cgit v1.2.3


From c320e527e1548305f31d95ec405140b04aed25f5 Mon Sep 17 00:00:00 2001
From: Moni Shoua <monis@mellanox.com>
Date: Wed, 15 Jan 2020 14:43:31 +0200
Subject: IB: Allow calls to ib_umem_get from kernel ULPs

So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.

Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_umem.h     | 4 ++--
 include/rdma/ib_umem_odp.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

(limited to 'include')

diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 753f54e17e0a..e3518fd6b95b 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -69,7 +69,7 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 
 #ifdef CONFIG_INFINIBAND_USER_MEM
 
-struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
+struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 			    size_t size, int access);
 void ib_umem_release(struct ib_umem *umem);
 int ib_umem_page_count(struct ib_umem *umem);
@@ -83,7 +83,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 
 #include <linux/err.h>
 
-static inline struct ib_umem *ib_umem_get(struct ib_udata *udata,
+static inline struct ib_umem *ib_umem_get(struct ib_device *device,
 					  unsigned long addr, size_t size,
 					  int access)
 {
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index 81429acc8257..64314ff76612 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -114,9 +114,9 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp)
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 
 struct ib_umem_odp *
-ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, size_t size,
+ib_umem_odp_get(struct ib_device *device, unsigned long addr, size_t size,
 		int access, const struct mmu_interval_notifier_ops *ops);
-struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_udata *udata,
+struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device,
 					       int access);
 struct ib_umem_odp *
 ib_umem_odp_alloc_child(struct ib_umem_odp *root_umem, unsigned long addr,
@@ -134,7 +134,7 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 start_offset,
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 static inline struct ib_umem_odp *
-ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, size_t size,
+ib_umem_odp_get(struct ib_device *device, unsigned long addr, size_t size,
 		int access, const struct mmu_interval_notifier_ops *ops)
 {
 	return ERR_PTR(-EINVAL);
-- 
cgit v1.2.3


From 33006bd4f37f7d2c3d1cf0268b4f327b5fdc2558 Mon Sep 17 00:00:00 2001
From: Moni Shoua <monis@mellanox.com>
Date: Wed, 15 Jan 2020 14:43:32 +0200
Subject: IB/core: Introduce ib_reg_user_mr

Add ib_reg_user_mr() for kernel ULPs to register user MRs.

The common use case that uses this function is a userspace application
that allocates memory for HCA access but the responsibility to register
the memory at the HCA is on an kernel ULP. This ULP that acts as an agent
for the userspace application.

This function is intended to be used without a user context so vendor
drivers need to be aware of calling reg_user_mr() device operation with
udata equal to NULL.

Among all drivers, i40iw is the only driver which relies on presence
of udata, so check udata existence for that driver.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'include')

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5608e14e3aad..170d5ec95b79 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4153,6 +4153,12 @@ static inline void ib_dma_free_coherent(struct ib_device *dev,
 	dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
 }
 
+/* ib_reg_user_mr - register a memory region for virtual addresses from kernel
+ * space. This function should be called when 'current' is the owning MM.
+ */
+struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+			     u64 virt_addr, int mr_access_flags);
+
 /**
  * ib_dereg_mr_user - Deregisters a memory region and removes it from the
  *   HCA translation table.
-- 
cgit v1.2.3


From 87d8069f6b028793254ddd0a66df1d7b6d79b450 Mon Sep 17 00:00:00 2001
From: Moni Shoua <monis@mellanox.com>
Date: Wed, 15 Jan 2020 14:43:33 +0200
Subject: IB/core: Add interface to advise_mr for kernel users

Allow ULPs to call advise_mr, so they can control ODP regions
in the same way as user space applications.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 170d5ec95b79..e2cc62217cc2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4159,6 +4159,9 @@ static inline void ib_dma_free_coherent(struct ib_device *dev,
 struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			     u64 virt_addr, int mr_access_flags);
 
+/* ib_advise_mr -  give an advice about an address range in a memory region */
+int ib_advise_mr(struct ib_pd *pd, enum ib_uverbs_advise_mr_advice advice,
+		 u32 flags, struct ib_sge *sg_list, u32 num_sge);
 /**
  * ib_dereg_mr_user - Deregisters a memory region and removes it from the
  *   HCA translation table.
-- 
cgit v1.2.3


From 4a7faaf4add3108522512c16c9ee08fb2ad4525e Mon Sep 17 00:00:00 2001
From: Jeremy Sowden <jeremy@azazel.net>
Date: Wed, 1 Jan 2020 13:41:32 +0000
Subject: netfilter: nft_bitwise: correct uapi header comment.

The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations.  However, the row for OR:

 mask    xor
 0       x

is incorrect.

  dreg = (sreg & 0) ^ x

is not equivalent to:

  dreg = sreg | x

What the code actually does is:

  dreg = (sreg & ~x) ^ x

Update the documentation to match.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index e237ecbdcd8a..dd4611767933 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -501,7 +501,7 @@ enum nft_immediate_attributes {
  *
  * 		mask	xor
  * NOT:		1	1
- * OR:		0	x
+ * OR:		~x	x
  * XOR:		1	x
  * AND:		x	0
  */
-- 
cgit v1.2.3


From 445db8d09659eb27bcd5920cb91d91686f0197d0 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 5 Jan 2020 22:00:57 +0100
Subject: netfilter: flowtable: remove dying bit, use teardown bit instead

The dying bit removes the conntrack entry if the netdev that owns this
flow is going down. Instead, use the teardown mechanism to push back the
flow to conntrack to let the classic software path decide what to do
with it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_flow_table.h | 5 -----
 1 file changed, 5 deletions(-)

(limited to 'include')

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 415b8f49d150..4ad924d5f983 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -85,7 +85,6 @@ struct flow_offload_tuple_rhash {
 
 #define FLOW_OFFLOAD_SNAT	0x1
 #define FLOW_OFFLOAD_DNAT	0x2
-#define FLOW_OFFLOAD_DYING	0x4
 #define FLOW_OFFLOAD_TEARDOWN	0x8
 #define FLOW_OFFLOAD_HW		0x10
 #define FLOW_OFFLOAD_HW_DYING	0x20
@@ -134,10 +133,6 @@ int nf_flow_table_init(struct nf_flowtable *flow_table);
 void nf_flow_table_free(struct nf_flowtable *flow_table);
 
 void flow_offload_teardown(struct flow_offload *flow);
-static inline void flow_offload_dead(struct flow_offload *flow)
-{
-	flow->flags |= FLOW_OFFLOAD_DYING;
-}
 
 int nf_flow_snat_port(const struct flow_offload *flow,
 		      struct sk_buff *skb, unsigned int thoff,
-- 
cgit v1.2.3


From 355a8b13f87a8964ebe785b065f1388a1bd00c7e Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 5 Jan 2020 20:41:15 +0100
Subject: netfilter: flowtable: use atomic bitwise operations for flow flags

Originally, all flow flag bits were set on only from the workqueue. With
the introduction of the flow teardown state and hardware offload this is
no longer true. Let's be safe and use atomic bitwise operation to
operation with flow flags.

Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_flow_table.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

(limited to 'include')

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 4ad924d5f983..5a10e28c3e40 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -83,12 +83,14 @@ struct flow_offload_tuple_rhash {
 	struct flow_offload_tuple	tuple;
 };
 
-#define FLOW_OFFLOAD_SNAT	0x1
-#define FLOW_OFFLOAD_DNAT	0x2
-#define FLOW_OFFLOAD_TEARDOWN	0x8
-#define FLOW_OFFLOAD_HW		0x10
-#define FLOW_OFFLOAD_HW_DYING	0x20
-#define FLOW_OFFLOAD_HW_DEAD	0x40
+enum nf_flow_flags {
+	NF_FLOW_SNAT,
+	NF_FLOW_DNAT,
+	NF_FLOW_TEARDOWN,
+	NF_FLOW_HW,
+	NF_FLOW_HW_DYING,
+	NF_FLOW_HW_DEAD,
+};
 
 enum flow_offload_type {
 	NF_FLOW_OFFLOAD_UNSPEC	= 0,
@@ -98,7 +100,7 @@ enum flow_offload_type {
 struct flow_offload {
 	struct flow_offload_tuple_rhash		tuplehash[FLOW_OFFLOAD_DIR_MAX];
 	struct nf_conn				*ct;
-	u16					flags;
+	unsigned long				flags;
 	u16					type;
 	u32					timeout;
 	struct rcu_head				rcu_head;
-- 
cgit v1.2.3


From a5449cdcaac5c78d62b8bea8f79158071f23da01 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 7 Jan 2020 09:56:27 +0100
Subject: netfilter: flowtable: add nf_flowtable_hw_offload() helper function

This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that
the flowtable hardware offload is enabled.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_flow_table.h | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'include')

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 5a10e28c3e40..9ee1eaeaab04 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -47,6 +47,11 @@ struct nf_flowtable {
 	possible_net_t			net;
 };
 
+static inline bool nf_flowtable_hw_offload(struct nf_flowtable *flowtable)
+{
+	return flowtable->flags & NF_FLOWTABLE_HW_OFFLOAD;
+}
+
 enum flow_offload_tuple_dir {
 	FLOW_OFFLOAD_DIR_ORIGINAL = IP_CT_DIR_ORIGINAL,
 	FLOW_OFFLOAD_DIR_REPLY = IP_CT_DIR_REPLY,
-- 
cgit v1.2.3


From f698fe40829b21088d323c8b0a7c626571528fc6 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 6 Jan 2020 12:56:47 +0100
Subject: netfilter: flowtable: refresh flow if hardware offload fails

If nf_flow_offload_add() fails to add the flow to hardware, then the
NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable
software path.

If flowtable hardware offload is enabled, this patch enqueues a new
request to offload this flow to hardware.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_flow_table.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 9ee1eaeaab04..e0f709d9d547 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -95,6 +95,7 @@ enum nf_flow_flags {
 	NF_FLOW_HW,
 	NF_FLOW_HW_DYING,
 	NF_FLOW_HW_DEAD,
+	NF_FLOW_HW_REFRESH,
 };
 
 enum flow_offload_type {
-- 
cgit v1.2.3


From 9d1f979986c2e29632b6a8f7a8ef8b3c7d24a48c Mon Sep 17 00:00:00 2001
From: Jeremy Sowden <jeremy@azazel.net>
Date: Wed, 15 Jan 2020 20:05:51 +0000
Subject: netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.

Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops.  It describes the type of
operation an expression contains.  Currently, it only has one value:
NFT_BITWISE_BOOL.  More values will be added later to implement shifts.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index dd4611767933..0cddf357281f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -484,6 +484,16 @@ enum nft_immediate_attributes {
 };
 #define NFTA_IMMEDIATE_MAX	(__NFTA_IMMEDIATE_MAX - 1)
 
+/**
+ * enum nft_bitwise_ops - nf_tables bitwise operations
+ *
+ * @NFT_BITWISE_BOOL: mask-and-xor operation used to implement NOT, AND, OR and
+ *                    XOR boolean operations
+ */
+enum nft_bitwise_ops {
+	NFT_BITWISE_BOOL,
+};
+
 /**
  * enum nft_bitwise_attributes - nf_tables bitwise expression netlink attributes
  *
@@ -492,6 +502,7 @@ enum nft_immediate_attributes {
  * @NFTA_BITWISE_LEN: length of operands (NLA_U32)
  * @NFTA_BITWISE_MASK: mask value (NLA_NESTED: nft_data_attributes)
  * @NFTA_BITWISE_XOR: xor value (NLA_NESTED: nft_data_attributes)
+ * @NFTA_BITWISE_OP: type of operation (NLA_U32: nft_bitwise_ops)
  *
  * The bitwise expression performs the following operation:
  *
@@ -512,6 +523,7 @@ enum nft_bitwise_attributes {
 	NFTA_BITWISE_LEN,
 	NFTA_BITWISE_MASK,
 	NFTA_BITWISE_XOR,
+	NFTA_BITWISE_OP,
 	__NFTA_BITWISE_MAX
 };
 #define NFTA_BITWISE_MAX	(__NFTA_BITWISE_MAX - 1)
-- 
cgit v1.2.3


From 779f725e142cd987a69296c0eb7d416ee3ba57dd Mon Sep 17 00:00:00 2001
From: Jeremy Sowden <jeremy@azazel.net>
Date: Wed, 15 Jan 2020 20:05:56 +0000
Subject: netfilter: bitwise: add NFTA_BITWISE_DATA attribute.

Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift.  It is not used by boolean
operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 0cddf357281f..8bef0620bc4f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -503,6 +503,8 @@ enum nft_bitwise_ops {
  * @NFTA_BITWISE_MASK: mask value (NLA_NESTED: nft_data_attributes)
  * @NFTA_BITWISE_XOR: xor value (NLA_NESTED: nft_data_attributes)
  * @NFTA_BITWISE_OP: type of operation (NLA_U32: nft_bitwise_ops)
+ * @NFTA_BITWISE_DATA: argument for non-boolean operations
+ *                     (NLA_NESTED: nft_data_attributes)
  *
  * The bitwise expression performs the following operation:
  *
@@ -524,6 +526,7 @@ enum nft_bitwise_attributes {
 	NFTA_BITWISE_MASK,
 	NFTA_BITWISE_XOR,
 	NFTA_BITWISE_OP,
+	NFTA_BITWISE_DATA,
 	__NFTA_BITWISE_MAX
 };
 #define NFTA_BITWISE_MAX	(__NFTA_BITWISE_MAX - 1)
-- 
cgit v1.2.3


From 567d746b55bc66d3800c9ae91d50f0c5deb2fd93 Mon Sep 17 00:00:00 2001
From: Jeremy Sowden <jeremy@azazel.net>
Date: Wed, 15 Jan 2020 20:05:57 +0000
Subject: netfilter: bitwise: add support for shifts.

Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR.  Extend it to do shifts as well.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 8bef0620bc4f..261864736b26 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -489,9 +489,13 @@ enum nft_immediate_attributes {
  *
  * @NFT_BITWISE_BOOL: mask-and-xor operation used to implement NOT, AND, OR and
  *                    XOR boolean operations
+ * @NFT_BITWISE_LSHIFT: left-shift operation
+ * @NFT_BITWISE_RSHIFT: right-shift operation
  */
 enum nft_bitwise_ops {
 	NFT_BITWISE_BOOL,
+	NFT_BITWISE_LSHIFT,
+	NFT_BITWISE_RSHIFT,
 };
 
 /**
@@ -506,11 +510,12 @@ enum nft_bitwise_ops {
  * @NFTA_BITWISE_DATA: argument for non-boolean operations
  *                     (NLA_NESTED: nft_data_attributes)
  *
- * The bitwise expression performs the following operation:
+ * The bitwise expression supports boolean and shift operations.  It implements
+ * the boolean operations by performing the following operation:
  *
  * dreg = (sreg & mask) ^ xor
  *
- * which allow to express all bitwise operations:
+ * with these mask and xor values:
  *
  * 		mask	xor
  * NOT:		1	1
-- 
cgit v1.2.3


From f397464eb7c25bda903ec8b9cf5701e72a1f7b16 Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Mon, 7 Oct 2019 10:29:46 +0300
Subject: net/mlx5: Add structures layout for new MCAM access reg groups

MCAM has 3 access_reg_groups (0-2). Defines data structures in order to
read and parse access_reg_groups #1 and #2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c6abaf4f1c55..43cdf9211747 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8832,6 +8832,28 @@ struct mlx5_ifc_mcam_access_reg_bits {
 	u8         regs_31_to_0[0x20];
 };
 
+struct mlx5_ifc_mcam_access_reg_bits1 {
+	u8         regs_127_to_96[0x20];
+
+	u8         regs_95_to_64[0x20];
+
+	u8         regs_63_to_32[0x20];
+
+	u8         regs_31_to_0[0x20];
+};
+
+struct mlx5_ifc_mcam_access_reg_bits2 {
+	u8         regs_127_to_99[0x1d];
+	u8         mirc[0x1];
+	u8         regs_97_to_96[0x2];
+
+	u8         regs_95_to_64[0x20];
+
+	u8         regs_63_to_32[0x20];
+
+	u8         regs_31_to_0[0x20];
+};
+
 struct mlx5_ifc_mcam_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         feature_group[0x8];
@@ -8842,6 +8864,8 @@ struct mlx5_ifc_mcam_reg_bits {
 
 	union {
 		struct mlx5_ifc_mcam_access_reg_bits access_regs;
+		struct mlx5_ifc_mcam_access_reg_bits1 access_regs1;
+		struct mlx5_ifc_mcam_access_reg_bits2 access_regs2;
 		u8         reserved_at_0[0x80];
 	} mng_access_reg_cap_mask;
 
-- 
cgit v1.2.3


From 932ef155117cc5caf1108bd27664dab974ba6e89 Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Mon, 7 Oct 2019 10:31:42 +0300
Subject: net/mlx5: Read MCAM register groups 1 and 2

On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/device.h | 14 +++++++++++++-
 include/linux/mlx5/driver.h |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 1a1c53f0262d..0e62c3db45e5 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1121,6 +1121,9 @@ enum mlx5_pcam_feature_groups {
 
 enum mlx5_mcam_reg_groups {
 	MLX5_MCAM_REGS_FIRST_128                    = 0x0,
+	MLX5_MCAM_REGS_0x9080_0x90FF                = 0x1,
+	MLX5_MCAM_REGS_0x9100_0x917F                = 0x2,
+	MLX5_MCAM_REGS_NUM                          = 0x3,
 };
 
 enum mlx5_mcam_feature_groups {
@@ -1269,7 +1272,16 @@ enum mlx5_qcam_feature_groups {
 	MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
 
 #define MLX5_CAP_MCAM_REG(mdev, reg) \
-	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg)
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \
+		 mng_access_reg_cap_mask.access_regs.reg)
+
+#define MLX5_CAP_MCAM_REG1(mdev, reg) \
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \
+		 mng_access_reg_cap_mask.access_regs1.reg)
+
+#define MLX5_CAP_MCAM_REG2(mdev, reg) \
+	MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \
+		 mng_access_reg_cap_mask.access_regs2.reg)
 
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
 	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 27200dea0297..54431256af42 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -684,7 +684,7 @@ struct mlx5_core_dev {
 		u32 hca_cur[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 		u32 hca_max[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 		u32 pcam[MLX5_ST_SZ_DW(pcam_reg)];
-		u32 mcam[MLX5_ST_SZ_DW(mcam_reg)];
+		u32 mcam[MLX5_MCAM_REGS_NUM][MLX5_ST_SZ_DW(mcam_reg)];
 		u32 fpga[MLX5_ST_SZ_DW(fpga_cap)];
 		u32 qcam[MLX5_ST_SZ_DW(qcam_reg)];
 		u8  embedded_cpu;
-- 
cgit v1.2.3


From bab58ba10ecfa39c46d280d2acbca6054e1e863d Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Mon, 7 Oct 2019 10:30:32 +0300
Subject: net/mlx5: Add structures and defines for MIRC register

Add needed structures, layouts and defines for MIRC (Management Image
Re-activation Control) register. This structure will be used for the FSM
reactivation flow in the downstream patches.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h   | 1 +
 include/linux/mlx5/mlx5_ifc.h | 8 ++++++++
 2 files changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 54431256af42..7848b9858587 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -145,6 +145,7 @@ enum {
 	MLX5_REG_MCC		 = 0x9062,
 	MLX5_REG_MCDA		 = 0x9063,
 	MLX5_REG_MCAM		 = 0x907f,
+	MLX5_REG_MIRC		 = 0x9162,
 };
 
 enum mlx5_qpts_trust_state {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 43cdf9211747..a133583c3e4f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -9471,6 +9471,13 @@ struct mlx5_ifc_mcda_reg_bits {
 	u8         data[0][0x20];
 };
 
+struct mlx5_ifc_mirc_reg_bits {
+	u8         reserved_at_0[0x18];
+	u8         status_code[0x8];
+
+	u8         reserved_at_20[0x20];
+};
+
 union mlx5_ifc_ports_control_registers_document_bits {
 	struct mlx5_ifc_bufferx_reg_bits bufferx_reg;
 	struct mlx5_ifc_eth_2819_cntrs_grp_data_layout_bits eth_2819_cntrs_grp_data_layout;
@@ -9526,6 +9533,7 @@ union mlx5_ifc_ports_control_registers_document_bits {
 	struct mlx5_ifc_mcqi_reg_bits mcqi_reg;
 	struct mlx5_ifc_mcc_reg_bits mcc_reg;
 	struct mlx5_ifc_mcda_reg_bits mcda_reg;
+	struct mlx5_ifc_mirc_reg_bits mirc_reg;
 	u8         reserved_at_0[0x60e0];
 };
 
-- 
cgit v1.2.3


From 609b82727f719b41b50440c4028d48d0b2e04913 Mon Sep 17 00:00:00 2001
From: Aya Levin <ayal@mellanox.com>
Date: Mon, 4 Nov 2019 14:51:55 +0200
Subject: net/mlx5: Expose resource dump register mapping

Add new register enumeration for resource dump. Add layout mapping for
resource dump: access command and response.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h   |   1 +
 include/linux/mlx5/mlx5_ifc.h | 130 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 7848b9858587..c821fa4d7475 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -146,6 +146,7 @@ enum {
 	MLX5_REG_MCDA		 = 0x9063,
 	MLX5_REG_MCAM		 = 0x907f,
 	MLX5_REG_MIRC		 = 0x9162,
+	MLX5_REG_RESOURCE_DUMP   = 0xC000,
 };
 
 enum mlx5_qpts_trust_state {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index a133583c3e4f..6fe0431e11ec 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -823,7 +823,9 @@ struct mlx5_ifc_qos_cap_bits {
 struct mlx5_ifc_debug_cap_bits {
 	u8         core_dump_general[0x1];
 	u8         core_dump_qp[0x1];
-	u8         reserved_at_2[0x1e];
+	u8         reserved_at_2[0x7];
+	u8         resource_dump[0x1];
+	u8         reserved_at_a[0x16];
 
 	u8         reserved_at_20[0x2];
 	u8         stall_detect[0x1];
@@ -1767,6 +1769,132 @@ struct mlx5_ifc_resize_field_select_bits {
 	u8         resize_field_select[0x20];
 };
 
+struct mlx5_ifc_resource_dump_bits {
+	u8         more_dump[0x1];
+	u8         inline_dump[0x1];
+	u8         reserved_at_2[0xa];
+	u8         seq_num[0x4];
+	u8         segment_type[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         vhca_id[0x10];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         num_of_obj1[0x10];
+	u8         num_of_obj2[0x10];
+
+	u8         reserved_at_a0[0x20];
+
+	u8         device_opaque[0x40];
+
+	u8         mkey[0x20];
+
+	u8         size[0x20];
+
+	u8         address[0x40];
+
+	u8         inline_data[52][0x20];
+};
+
+struct mlx5_ifc_resource_dump_menu_record_bits {
+	u8         reserved_at_0[0x4];
+	u8         num_of_obj2_supports_active[0x1];
+	u8         num_of_obj2_supports_all[0x1];
+	u8         must_have_num_of_obj2[0x1];
+	u8         support_num_of_obj2[0x1];
+	u8         num_of_obj1_supports_active[0x1];
+	u8         num_of_obj1_supports_all[0x1];
+	u8         must_have_num_of_obj1[0x1];
+	u8         support_num_of_obj1[0x1];
+	u8         must_have_index2[0x1];
+	u8         support_index2[0x1];
+	u8         must_have_index1[0x1];
+	u8         support_index1[0x1];
+	u8         segment_type[0x10];
+
+	u8         segment_name[4][0x20];
+
+	u8         index1_name[4][0x20];
+
+	u8         index2_name[4][0x20];
+};
+
+struct mlx5_ifc_resource_dump_segment_header_bits {
+	u8         length_dw[0x10];
+	u8         segment_type[0x10];
+};
+
+struct mlx5_ifc_resource_dump_command_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         segment_called[0x10];
+	u8         vhca_id[0x10];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         num_of_obj1[0x10];
+	u8         num_of_obj2[0x10];
+};
+
+struct mlx5_ifc_resource_dump_error_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x10];
+	u8         syndrome_id[0x10];
+
+	u8         reserved_at_40[0x40];
+
+	u8         error[8][0x20];
+};
+
+struct mlx5_ifc_resource_dump_info_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x18];
+	u8         dump_version[0x8];
+
+	u8         hw_version[0x20];
+
+	u8         fw_version[0x20];
+};
+
+struct mlx5_ifc_resource_dump_menu_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x10];
+	u8         num_of_records[0x10];
+
+	struct mlx5_ifc_resource_dump_menu_record_bits record[0];
+};
+
+struct mlx5_ifc_resource_dump_resource_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+
+	u8         reserved_at_20[0x20];
+
+	u8         index1[0x20];
+
+	u8         index2[0x20];
+
+	u8         payload[0][0x20];
+};
+
+struct mlx5_ifc_resource_dump_terminate_segment_bits {
+	struct mlx5_ifc_resource_dump_segment_header_bits segment_header;
+};
+
+struct mlx5_ifc_menu_resource_dump_response_bits {
+	struct mlx5_ifc_resource_dump_info_segment_bits info;
+	struct mlx5_ifc_resource_dump_command_segment_bits cmd;
+	struct mlx5_ifc_resource_dump_menu_segment_bits menu;
+	struct mlx5_ifc_resource_dump_terminate_segment_bits terminate;
+};
+
 enum {
 	MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_PERIOD     = 0x1,
 	MLX5_MODIFY_FIELD_SELECT_MODIFY_FIELD_SELECT_CQ_MAX_COUNT  = 0x2,
-- 
cgit v1.2.3


From 31d8bde1c8812c9b44065dcd98e554488c6a98d2 Mon Sep 17 00:00:00 2001
From: Hamdan Igbaria <hamdani@mellanox.com>
Date: Thu, 9 Jan 2020 13:26:53 +0200
Subject: net/mlx5: Add copy header action struct layout

Add definition for copy header action, copy action is used
to copy header fields from source to destination.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6fe0431e11ec..23613a6ea51c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -5609,6 +5609,21 @@ struct mlx5_ifc_add_action_in_bits {
 	u8         data[0x20];
 };
 
+struct mlx5_ifc_copy_action_in_bits {
+	u8         action_type[0x4];
+	u8         src_field[0xc];
+	u8         reserved_at_10[0x3];
+	u8         src_offset[0x5];
+	u8         reserved_at_18[0x3];
+	u8         length[0x5];
+
+	u8         reserved_at_20[0x4];
+	u8         dst_field[0xc];
+	u8         reserved_at_30[0x3];
+	u8         dst_offset[0x5];
+	u8         reserved_at_38[0x8];
+};
+
 union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 	struct mlx5_ifc_set_action_in_bits set_action_in;
 	struct mlx5_ifc_add_action_in_bits add_action_in;
@@ -5618,6 +5633,7 @@ union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 enum {
 	MLX5_ACTION_TYPE_SET   = 0x1,
 	MLX5_ACTION_TYPE_ADD   = 0x2,
+	MLX5_ACTION_TYPE_COPY  = 0x3,
 };
 
 enum {
-- 
cgit v1.2.3


From 822e114b50641d3b57d2eb30939e60d8b4758288 Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@mellanox.com>
Date: Mon, 1 Apr 2019 13:31:32 +0300
Subject: net/mlx5: Add mlx5_ifc definitions for connection tracking support

Add the required hardware definitions to mlx5_ifc:
ignore_flow_level, registers, copy_header, and fwd_and_modify cap.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Sholomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 23613a6ea51c..e9c165ffe3f9 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -375,8 +375,17 @@ struct mlx5_ifc_flow_table_fields_supported_bits {
 	u8	   outer_esp_spi[0x1];
 	u8         reserved_at_58[0x2];
 	u8         bth_dst_qp[0x1];
+	u8         reserved_at_5b[0x5];
 
-	u8         reserved_at_5b[0x25];
+	u8         reserved_at_60[0x18];
+	u8         metadata_reg_c_7[0x1];
+	u8         metadata_reg_c_6[0x1];
+	u8         metadata_reg_c_5[0x1];
+	u8         metadata_reg_c_4[0x1];
+	u8         metadata_reg_c_3[0x1];
+	u8         metadata_reg_c_2[0x1];
+	u8         metadata_reg_c_1[0x1];
+	u8         metadata_reg_c_0[0x1];
 };
 
 struct mlx5_ifc_flow_table_prop_layout_bits {
@@ -401,7 +410,8 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
 	u8	   reformat_l3_tunnel_to_l2[0x1];
 	u8	   reformat_l2_to_l3_tunnel[0x1];
 	u8	   reformat_and_modify_action[0x1];
-	u8         reserved_at_15[0x2];
+	u8	   ignore_flow_level[0x1];
+	u8         reserved_at_16[0x1];
 	u8	   table_miss_action_domain[0x1];
 	u8         termination_table[0x1];
 	u8         reserved_at_19[0x7];
@@ -722,7 +732,9 @@ enum {
 
 struct mlx5_ifc_flow_table_eswitch_cap_bits {
 	u8      fdb_to_vport_reg_c_id[0x8];
-	u8      reserved_at_8[0xf];
+	u8      reserved_at_8[0xd];
+	u8      fdb_modify_header_fwd_to_table[0x1];
+	u8      reserved_at_16[0x1];
 	u8      flow_source[0x1];
 	u8      reserved_at_18[0x2];
 	u8      multi_fdb_encap[0x1];
@@ -4141,7 +4153,8 @@ struct mlx5_ifc_set_fte_in_bits {
 	u8         reserved_at_a0[0x8];
 	u8         table_id[0x18];
 
-	u8         reserved_at_c0[0x18];
+	u8         ignore_flow_level[0x1];
+	u8         reserved_at_c1[0x17];
 	u8         modify_enable_mask[0x8];
 
 	u8         reserved_at_e0[0x20];
@@ -5627,6 +5640,7 @@ struct mlx5_ifc_copy_action_in_bits {
 union mlx5_ifc_set_action_in_add_action_in_auto_bits {
 	struct mlx5_ifc_set_action_in_bits set_action_in;
 	struct mlx5_ifc_add_action_in_bits add_action_in;
+	struct mlx5_ifc_copy_action_in_bits copy_action_in;
 	u8         reserved_at_0[0x40];
 };
 
@@ -5669,6 +5683,8 @@ enum {
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_3  = 0x54,
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_4  = 0x55,
 	MLX5_ACTION_IN_FIELD_METADATA_REG_C_5  = 0x56,
+	MLX5_ACTION_IN_FIELD_METADATA_REG_C_6  = 0x57,
+	MLX5_ACTION_IN_FIELD_METADATA_REG_C_7  = 0x58,
 	MLX5_ACTION_IN_FIELD_OUT_TCP_SEQ_NUM   = 0x59,
 	MLX5_ACTION_IN_FIELD_OUT_TCP_ACK_NUM   = 0x5B,
 };
-- 
cgit v1.2.3


From a58837f52d432f32995b1c00e803cc4db18762d3 Mon Sep 17 00:00:00 2001
From: Aya Levin <ayal@mellanox.com>
Date: Mon, 30 Dec 2019 14:22:57 +0200
Subject: net/mlx5e: Expose FEC feilds and related capability bit

Introduce 50G per lane FEC modes capability bit and newly supported
fields in PPLM register which allow this configuration.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index e9c165ffe3f9..2ab4562b4851 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8581,6 +8581,18 @@ struct mlx5_ifc_pplm_reg_bits {
 	u8	   fec_override_admin_50g[0x4];
 	u8	   fec_override_admin_25g[0x4];
 	u8	   fec_override_admin_10g_40g[0x4];
+
+	u8         fec_override_cap_400g_8x[0x10];
+	u8         fec_override_cap_200g_4x[0x10];
+
+	u8         fec_override_cap_100g_2x[0x10];
+	u8         fec_override_cap_50g_1x[0x10];
+
+	u8         fec_override_admin_400g_8x[0x10];
+	u8         fec_override_admin_200g_4x[0x10];
+
+	u8         fec_override_admin_100g_2x[0x10];
+	u8         fec_override_admin_50g_1x[0x10];
 };
 
 struct mlx5_ifc_ppcnt_reg_bits {
@@ -8907,7 +8919,9 @@ struct mlx5_ifc_mpegc_reg_bits {
 };
 
 struct mlx5_ifc_pcam_enhanced_features_bits {
-	u8         reserved_at_0[0x6d];
+	u8         reserved_at_0[0x68];
+	u8         fec_50G_per_lane_in_pplm[0x1];
+	u8         reserved_at_69[0x4];
 	u8         rx_icrc_encapsulated_counter[0x1];
 	u8	   reserved_at_6e[0x4];
 	u8         ptys_extended_ethernet[0x1];
-- 
cgit v1.2.3


From 827a8cb2dd2b72848652b2a425bba3262808ff44 Mon Sep 17 00:00:00 2001
From: Aharon Landau <aharonl@mellanox.com>
Date: Mon, 16 Dec 2019 12:50:13 +0200
Subject: net/mlx5e: Add discard counters per priority

Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2ab4562b4851..ee0a34d66c7c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2180,7 +2180,9 @@ struct mlx5_ifc_eth_per_prio_grp_data_layout_bits {
 
 	u8         rx_pause_transition_low[0x20];
 
-	u8         reserved_at_3c0[0x40];
+	u8         rx_discards_high[0x20];
+
+	u8         rx_discards_low[0x20];
 
 	u8         device_stall_minor_watermark_cnt_high[0x20];
 
-- 
cgit v1.2.3


From 61dc7b0141c51f5fa4aed97e49f9cf102ec51479 Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@mellanox.com>
Date: Thu, 14 Nov 2019 16:59:58 +0200
Subject: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table

Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/fs.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

(limited to 'include')

diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 4e5b84e66822..a3f8b63839de 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -145,25 +145,25 @@ mlx5_get_flow_vport_acl_namespace(struct mlx5_core_dev *dev,
 				  enum mlx5_flow_namespace_type type,
 				  int vport);
 
-struct mlx5_flow_table *
-mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
-				    int prio,
-				    int num_flow_table_entries,
-				    int max_num_groups,
-				    u32 level,
-				    u32 flags);
-
 struct mlx5_flow_table_attr {
 	int prio;
 	int max_fte;
 	u32 level;
 	u32 flags;
+
+	struct {
+		int max_num_groups;
+	} autogroup;
 };
 
 struct mlx5_flow_table *
 mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
 		       struct mlx5_flow_table_attr *ft_attr);
 
+struct mlx5_flow_table *
+mlx5_create_auto_grouped_flow_table(struct mlx5_flow_namespace *ns,
+				    struct mlx5_flow_table_attr *ft_attr);
+
 struct mlx5_flow_table *
 mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 			     int prio,
-- 
cgit v1.2.3


From 5281a0c909194c477656e89401ac11dd7b29ad2d Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@mellanox.com>
Date: Tue, 23 Jul 2019 11:43:57 +0300
Subject: net/mlx5: fs_core: Introduce unmanaged flow tables

Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and  levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.

Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.

This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/fs.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index a3f8b63839de..de2c838bae90 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -48,6 +48,7 @@ enum {
 	MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT = BIT(0),
 	MLX5_FLOW_TABLE_TUNNEL_EN_DECAP = BIT(1),
 	MLX5_FLOW_TABLE_TERMINATION = BIT(2),
+	MLX5_FLOW_TABLE_UNMANAGED = BIT(3),
 };
 
 #define LEFTOVERS_RULE_NUM	 2
@@ -150,6 +151,7 @@ struct mlx5_flow_table_attr {
 	int max_fte;
 	u32 level;
 	u32 flags;
+	struct mlx5_flow_table *next_ft;
 
 	struct {
 		int max_num_groups;
-- 
cgit v1.2.3


From ff189b43568216c6211e9e7ddd9026cb8295e744 Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@mellanox.com>
Date: Sun, 5 Jan 2020 15:15:54 +0200
Subject: net/mlx5: Add ignore level support fwd to table rules

If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.

This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/fs.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index de2c838bae90..81f393fb7d96 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -196,6 +196,7 @@ struct mlx5_fs_vlan {
 
 enum {
 	FLOW_ACT_NO_APPEND = BIT(0),
+	FLOW_ACT_IGNORE_FLOW_LEVEL = BIT(1),
 };
 
 struct mlx5_flow_act {
-- 
cgit v1.2.3


From 79cdb0aaea8b5478db34afa1d4d5ecc808689a67 Mon Sep 17 00:00:00 2001
From: Paul Blakey <paulb@mellanox.com>
Date: Thu, 14 Nov 2019 17:02:59 +0200
Subject: net/mlx5: Allow creating autogroups with reserved entries

Exclude the last n entries for an autogrouped flow table.

Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/fs.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 81f393fb7d96..4cae16016b2b 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -155,6 +155,7 @@ struct mlx5_flow_table_attr {
 
 	struct {
 		int max_num_groups;
+		int num_reserved_entries;
 	} autogroup;
 };
 
-- 
cgit v1.2.3


From 75ccae62cb8d42a619323a85c577107b8b37d797 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke@redhat.com>
Date: Thu, 16 Jan 2020 16:14:44 +0100
Subject: xdp: Move devmap bulk queue into struct net_device
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit.  To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

After this patch, the relevant part of struct net_device looks like this,
according to pahole:

	/* --- cacheline 14 boundary (896 bytes) --- */
	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
	unsigned int               num_tx_queues;        /*   904     4 */
	unsigned int               real_num_tx_queues;   /*   908     4 */
	struct Qdisc *             qdisc;                /*   912     8 */
	unsigned int               tx_queue_len;         /*   920     4 */
	spinlock_t                 tx_global_lock;       /*   924     4 */
	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct hlist_head  qdisc_hash[16];               /*   960   128 */
	/* --- cacheline 17 boundary (1088 bytes) --- */
	struct timer_list  watchdog_timer;               /*  1088    40 */

	/* XXX last struct has 4 bytes of padding */

	int                        watchdog_timeo;       /*  1128     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head   todo_list;                    /*  1136    16 */
	/* --- cacheline 18 boundary (1152 bytes) --- */

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
---
 include/linux/netdevice.h  | 13 ++++++++-----
 include/trace/events/xdp.h |  2 +-
 2 files changed, 9 insertions(+), 6 deletions(-)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2741aa35bec6..5ec3537fbdb1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -876,6 +876,7 @@ enum bpf_netdev_command {
 struct bpf_prog_offload_ops;
 struct netlink_ext_ack;
 struct xdp_umem;
+struct xdp_dev_bulk_queue;
 
 struct netdev_bpf {
 	enum bpf_netdev_command command;
@@ -1986,12 +1987,10 @@ struct net_device {
 	unsigned int		num_tx_queues;
 	unsigned int		real_num_tx_queues;
 	struct Qdisc		*qdisc;
-#ifdef CONFIG_NET_SCHED
-	DECLARE_HASHTABLE	(qdisc_hash, 4);
-#endif
 	unsigned int		tx_queue_len;
 	spinlock_t		tx_global_lock;
-	int			watchdog_timeo;
+
+	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
 
 #ifdef CONFIG_XPS
 	struct xps_dev_maps __rcu *xps_cpus_map;
@@ -2001,11 +2000,15 @@ struct net_device {
 	struct mini_Qdisc __rcu	*miniq_egress;
 #endif
 
+#ifdef CONFIG_NET_SCHED
+	DECLARE_HASHTABLE	(qdisc_hash, 4);
+#endif
 	/* These may be needed for future network-power-down code. */
 	struct timer_list	watchdog_timer;
+	int			watchdog_timeo;
 
-	int __percpu		*pcpu_refcnt;
 	struct list_head	todo_list;
+	int __percpu		*pcpu_refcnt;
 
 	struct list_head	link_watch_list;
 
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index a7378bcd9928..72bad13d4a3c 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit,
 	),
 
 	TP_fast_assign(
-		__entry->map_id		= map->id;
+		__entry->map_id		= map ? map->id : 0;
 		__entry->act		= XDP_REDIRECT;
 		__entry->map_index	= map_index;
 		__entry->drops		= drops;
-- 
cgit v1.2.3


From 1d233886dd904edbf239eeffe435c3308ae97625 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke@redhat.com>
Date: Thu, 16 Jan 2020 16:14:45 +0100
Subject: xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
---
 include/linux/bpf.h        |  13 +++++-
 include/linux/filter.h     |  10 ++++-
 include/trace/events/xdp.h | 101 ++++++++++++++++++++-------------------------
 3 files changed, 63 insertions(+), 61 deletions(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3517e32149a4..8e3b8f4ad183 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1056,7 +1056,9 @@ struct sk_buff;
 
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
-void __dev_map_flush(void);
+void __dev_flush(void);
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+		    struct net_device *dev_rx);
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -1169,13 +1171,20 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
 	return NULL;
 }
 
-static inline void __dev_map_flush(void)
+static inline void __dev_flush(void)
 {
 }
 
 struct xdp_buff;
 struct bpf_dtab_netdev;
 
+static inline
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
+{
+	return 0;
+}
+
 static inline
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a366a0b64a57..f349e2c0884c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -918,7 +918,7 @@ static inline int xdp_ok_fwd_dev(const struct net_device *fwd,
 	return 0;
 }
 
-/* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
+/* The pair of xdp_do_redirect and xdp_do_flush MUST be called in the
  * same cpu context. Further for best results no more than a single map
  * for the do_redirect/do_flush pair should be used. This limitation is
  * because we only track one map and force a flush when the map changes.
@@ -929,7 +929,13 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 int xdp_do_redirect(struct net_device *dev,
 		    struct xdp_buff *xdp,
 		    struct bpf_prog *prog);
-void xdp_do_flush_map(void);
+void xdp_do_flush(void);
+
+/* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as
+ * it is no longer only flushing maps. Keep this define for compatibility
+ * until all drivers are updated - do not use xdp_do_flush_map() in new code!
+ */
+#define xdp_do_flush_map xdp_do_flush
 
 void bpf_warn_invalid_xdp_action(u32 act);
 
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 72bad13d4a3c..b680973687b4 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -79,14 +79,26 @@ TRACE_EVENT(xdp_bulk_tx,
 		  __entry->sent, __entry->drops, __entry->err)
 );
 
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+	struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
+#define devmap_ifindex(tgt, map)				\
+	(((map->map_type == BPF_MAP_TYPE_DEVMAP ||	\
+		  map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \
+	  ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0)
+
 DECLARE_EVENT_CLASS(xdp_redirect_template,
 
 	TP_PROTO(const struct net_device *dev,
 		 const struct bpf_prog *xdp,
-		 int to_ifindex, int err,
-		 const struct bpf_map *map, u32 map_index),
+		 const void *tgt, int err,
+		 const struct bpf_map *map, u32 index),
 
-	TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
+	TP_ARGS(dev, xdp, tgt, err, map, index),
 
 	TP_STRUCT__entry(
 		__field(int, prog_id)
@@ -103,90 +115,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template,
 		__entry->act		= XDP_REDIRECT;
 		__entry->ifindex	= dev->ifindex;
 		__entry->err		= err;
-		__entry->to_ifindex	= to_ifindex;
+		__entry->to_ifindex	= map ? devmap_ifindex(tgt, map) :
+						index;
 		__entry->map_id		= map ? map->id : 0;
-		__entry->map_index	= map_index;
+		__entry->map_index	= map ? index : 0;
 	),
 
-	TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d",
+	TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
+		  " map_id=%d map_index=%d",
 		  __entry->prog_id,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->ifindex, __entry->to_ifindex,
-		  __entry->err)
+		  __entry->err, __entry->map_id, __entry->map_index)
 );
 
 DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
 	TP_PROTO(const struct net_device *dev,
 		 const struct bpf_prog *xdp,
-		 int to_ifindex, int err,
-		 const struct bpf_map *map, u32 map_index),
-	TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
+		 const void *tgt, int err,
+		 const struct bpf_map *map, u32 index),
+	TP_ARGS(dev, xdp, tgt, err, map, index)
 );
 
 DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err,
 	TP_PROTO(const struct net_device *dev,
 		 const struct bpf_prog *xdp,
-		 int to_ifindex, int err,
-		 const struct bpf_map *map, u32 map_index),
-	TP_ARGS(dev, xdp, to_ifindex, err, map, map_index)
+		 const void *tgt, int err,
+		 const struct bpf_map *map, u32 index),
+	TP_ARGS(dev, xdp, tgt, err, map, index)
 );
 
 #define _trace_xdp_redirect(dev, xdp, to)		\
-	 trace_xdp_redirect(dev, xdp, to, 0, NULL, 0);
+	 trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to);
 
 #define _trace_xdp_redirect_err(dev, xdp, to, err)	\
-	 trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0);
+	 trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to);
+
+#define _trace_xdp_redirect_map(dev, xdp, to, map, index)		\
+	 trace_xdp_redirect(dev, xdp, to, 0, map, index);
+
+#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err)	\
+	 trace_xdp_redirect_err(dev, xdp, to, err, map, index);
 
-DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map,
+/* not used anymore, but kept around so as not to break old programs */
+DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map,
 	TP_PROTO(const struct net_device *dev,
 		 const struct bpf_prog *xdp,
-		 int to_ifindex, int err,
-		 const struct bpf_map *map, u32 map_index),
-	TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
-	TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
-		  " map_id=%d map_index=%d",
-		  __entry->prog_id,
-		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
-		  __entry->ifindex, __entry->to_ifindex,
-		  __entry->err,
-		  __entry->map_id, __entry->map_index)
+		 const void *tgt, int err,
+		 const struct bpf_map *map, u32 index),
+	TP_ARGS(dev, xdp, tgt, err, map, index)
 );
 
-DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
+DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err,
 	TP_PROTO(const struct net_device *dev,
 		 const struct bpf_prog *xdp,
-		 int to_ifindex, int err,
-		 const struct bpf_map *map, u32 map_index),
-	TP_ARGS(dev, xdp, to_ifindex, err, map, map_index),
-	TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d"
-		  " map_id=%d map_index=%d",
-		  __entry->prog_id,
-		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
-		  __entry->ifindex, __entry->to_ifindex,
-		  __entry->err,
-		  __entry->map_id, __entry->map_index)
+		 const void *tgt, int err,
+		 const struct bpf_map *map, u32 index),
+	TP_ARGS(dev, xdp, tgt, err, map, index)
 );
 
-#ifndef __DEVMAP_OBJ_TYPE
-#define __DEVMAP_OBJ_TYPE
-struct _bpf_dtab_netdev {
-	struct net_device *dev;
-};
-#endif /* __DEVMAP_OBJ_TYPE */
-
-#define devmap_ifindex(fwd, map)				\
-	((map->map_type == BPF_MAP_TYPE_DEVMAP ||		\
-	  map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ?		\
-	  ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)
-
-#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx)		\
-	 trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map),	\
-				0, map, idx)
-
-#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err)	\
-	 trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map),	\
-				    err, map, idx)
-
 TRACE_EVENT(xdp_cpumap_kthread,
 
 	TP_PROTO(int map_id, unsigned int processed,  unsigned int drops,
-- 
cgit v1.2.3


From 58aa94f922c1b44e0919d1814d2eab5b9e8bf945 Mon Sep 17 00:00:00 2001
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 16 Jan 2020 16:14:46 +0100
Subject: devmap: Adjust tracepoint for map-less queue flush
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that we don't have a reference to a devmap when flushing the device
bulk queue, let's change the the devmap_xmit tracepoint to remote the
map_id and map_index fields entirely. Rearrange the fields so 'drops' and
'sent' stay in the same position in the tracepoint struct, to make it
possible for the xdp_monitor utility to read both the old and the new
format.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
---
 include/trace/events/xdp.h | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

(limited to 'include')

diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index b680973687b4..b95d65e8c628 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -246,43 +246,38 @@ TRACE_EVENT(xdp_cpumap_enqueue,
 
 TRACE_EVENT(xdp_devmap_xmit,
 
-	TP_PROTO(const struct bpf_map *map, u32 map_index,
-		 int sent, int drops,
-		 const struct net_device *from_dev,
-		 const struct net_device *to_dev, int err),
+	TP_PROTO(const struct net_device *from_dev,
+		 const struct net_device *to_dev,
+		 int sent, int drops, int err),
 
-	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
+	TP_ARGS(from_dev, to_dev, sent, drops, err),
 
 	TP_STRUCT__entry(
-		__field(int, map_id)
+		__field(int, from_ifindex)
 		__field(u32, act)
-		__field(u32, map_index)
+		__field(int, to_ifindex)
 		__field(int, drops)
 		__field(int, sent)
-		__field(int, from_ifindex)
-		__field(int, to_ifindex)
 		__field(int, err)
 	),
 
 	TP_fast_assign(
-		__entry->map_id		= map ? map->id : 0;
+		__entry->from_ifindex	= from_dev->ifindex;
 		__entry->act		= XDP_REDIRECT;
-		__entry->map_index	= map_index;
+		__entry->to_ifindex	= to_dev->ifindex;
 		__entry->drops		= drops;
 		__entry->sent		= sent;
-		__entry->from_ifindex	= from_dev->ifindex;
-		__entry->to_ifindex	= to_dev->ifindex;
 		__entry->err		= err;
 	),
 
 	TP_printk("ndo_xdp_xmit"
-		  " map_id=%d map_index=%d action=%s"
+		  " from_ifindex=%d to_ifindex=%d action=%s"
 		  " sent=%d drops=%d"
-		  " from_ifindex=%d to_ifindex=%d err=%d",
-		  __entry->map_id, __entry->map_index,
+		  " err=%d",
+		  __entry->from_ifindex, __entry->to_ifindex,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->sent, __entry->drops,
-		  __entry->from_ifindex, __entry->to_ifindex, __entry->err)
+		  __entry->err)
 );
 
 /* Expect users already include <net/xdp.h>, but not xdp_priv.h */
-- 
cgit v1.2.3


From 080bb352fad00d04995102f681b134e3754bfb6e Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 15 Jan 2020 20:48:50 -0800
Subject: net: phy: Maintain MDIO device and bus statistics

We maintain global statistics for an entire MDIO bus, as well as broken
down, per MDIO bus address statistics. Given that it is possible for
MDIO devices such as switches to access MDIO bus addresses for which
there is not a mdio_device instance created (therefore not a a
corresponding device directory in sysfs either), we also maintain
per-address statistics under the statistics folder. The layout looks
like this:

/sys/class/mdio_bus/../statistics/
	transfers
	errrors
	writes
	reads
	transfers_<addr>
	errors_<addr>
	writes_<addr>
	reads_<addr>

When a mdio_device instance is registered, a statistics/ folder is
created with the tranfers, errors, writes and reads attributes which
point to the appropriate MDIO bus statistics structure.

Statistics are 64-bit unsigned quantities and maintained through the
u64_stats_sync.h helper functions.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 2929d0bc307f..99a87f02667f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -22,6 +22,7 @@
 #include <linux/timer.h>
 #include <linux/workqueue.h>
 #include <linux/mod_devicetable.h>
+#include <linux/u64_stats_sync.h>
 
 #include <linux/atomic.h>
 
@@ -212,6 +213,15 @@ struct sfp_bus;
 struct sfp_upstream_ops;
 struct sk_buff;
 
+struct mdio_bus_stats {
+	u64_stats_t transfers;
+	u64_stats_t errors;
+	u64_stats_t writes;
+	u64_stats_t reads;
+	/* Must be last, add new statistics above */
+	struct u64_stats_sync syncp;
+};
+
 /*
  * The Bus class for PHYs.  Devices which provide access to
  * PHYs should register using this structure
@@ -224,6 +234,7 @@ struct mii_bus {
 	int (*read)(struct mii_bus *bus, int addr, int regnum);
 	int (*write)(struct mii_bus *bus, int addr, int regnum, u16 val);
 	int (*reset)(struct mii_bus *bus);
+	struct mdio_bus_stats stats[PHY_MAX_ADDR];
 
 	/*
 	 * A lock to ensure that only one thing can read/write
-- 
cgit v1.2.3


From 56f200c78ce4d94680a27a1ce97a29ebeb4f23e1 Mon Sep 17 00:00:00 2001
From: Guillaume Nault <gnault@redhat.com>
Date: Thu, 16 Jan 2020 21:16:46 +0100
Subject: netns: Constify exported functions

Mark function parameters as 'const' where possible.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/net_namespace.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

(limited to 'include')

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index b8ceaf0cd997..854d39ef1ca3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -347,9 +347,9 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #endif
 
 int peernet2id_alloc(struct net *net, struct net *peer, gfp_t gfp);
-int peernet2id(struct net *net, struct net *peer);
-bool peernet_has_id(struct net *net, struct net *peer);
-struct net *get_net_ns_by_id(struct net *net, int id);
+int peernet2id(const struct net *net, struct net *peer);
+bool peernet_has_id(const struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(const struct net *net, int id);
 
 struct pernet_operations {
 	struct list_head list;
@@ -427,7 +427,7 @@ static inline void unregister_net_sysctl_table(struct ctl_table_header *header)
 }
 #endif
 
-static inline int rt_genid_ipv4(struct net *net)
+static inline int rt_genid_ipv4(const struct net *net)
 {
 	return atomic_read(&net->ipv4.rt_genid);
 }
@@ -459,7 +459,7 @@ static inline void rt_genid_bump_all(struct net *net)
 	rt_genid_bump_ipv6(net);
 }
 
-static inline int fnhe_genid(struct net *net)
+static inline int fnhe_genid(const struct net *net)
 {
 	return atomic_read(&net->fnhe_genid);
 }
-- 
cgit v1.2.3


From 95f0ead8f04bec18e474594ef585f3734bd85b4c Mon Sep 17 00:00:00 2001
From: Amit Cohen <amitc@mellanox.com>
Date: Sun, 19 Jan 2020 15:00:48 +0200
Subject: devlink: Add non-routable packet trap

Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index a6856f1d5d1f..08b757753e1c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -591,6 +591,7 @@ enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_REJECT_ROUTE,
 	DEVLINK_TRAP_GENERIC_ID_IPV4_LPM_UNICAST_MISS,
 	DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS,
+	DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE,
 
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
@@ -659,6 +660,8 @@ enum devlink_trap_group_generic_id {
 	"ipv4_lpm_miss"
 #define DEVLINK_TRAP_GENERIC_NAME_IPV6_LPM_UNICAST_MISS \
 	"ipv6_lpm_miss"
+#define DEVLINK_TRAP_GENERIC_NAME_NON_ROUTABLE \
+	"non_routable_packet"
 
 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
 	"l2_drops"
-- 
cgit v1.2.3


From 13c056ec7d006b11557cebd9f1803edd646d2876 Mon Sep 17 00:00:00 2001
From: Amit Cohen <amitc@mellanox.com>
Date: Sun, 19 Jan 2020 15:00:54 +0200
Subject: devlink: Add tunnel generic packet traps

Add packet traps that can report packets that were dropped during tunnel
decapsulation.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 08b757753e1c..455282a4b714 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -592,6 +592,7 @@ enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_IPV4_LPM_UNICAST_MISS,
 	DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS,
 	DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE,
+	DEVLINK_TRAP_GENERIC_ID_DECAP_ERROR,
 
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
@@ -605,6 +606,7 @@ enum devlink_trap_group_generic_id {
 	DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
 	DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
 	DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_TUNNEL_DROPS,
 
 	/* Add new generic trap group IDs above */
 	__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
@@ -662,6 +664,8 @@ enum devlink_trap_group_generic_id {
 	"ipv6_lpm_miss"
 #define DEVLINK_TRAP_GENERIC_NAME_NON_ROUTABLE \
 	"non_routable_packet"
+#define DEVLINK_TRAP_GENERIC_NAME_DECAP_ERROR \
+	"decap_error"
 
 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
 	"l2_drops"
@@ -669,6 +673,8 @@ enum devlink_trap_group_generic_id {
 	"l3_drops"
 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \
 	"buffer_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_TUNNEL_DROPS \
+	"tunnel_drops"
 
 #define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
 	{								      \
-- 
cgit v1.2.3


From c3cae4916e57d2f0364d5e7769218547fb1b7c60 Mon Sep 17 00:00:00 2001
From: Amit Cohen <amitc@mellanox.com>
Date: Sun, 19 Jan 2020 15:00:58 +0200
Subject: devlink: Add overlay source MAC is multicast trap

Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 455282a4b714..2813fd06ee89 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -593,6 +593,7 @@ enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_IPV6_LPM_UNICAST_MISS,
 	DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE,
 	DEVLINK_TRAP_GENERIC_ID_DECAP_ERROR,
+	DEVLINK_TRAP_GENERIC_ID_OVERLAY_SMAC_MC,
 
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
@@ -666,6 +667,8 @@ enum devlink_trap_group_generic_id {
 	"non_routable_packet"
 #define DEVLINK_TRAP_GENERIC_NAME_DECAP_ERROR \
 	"decap_error"
+#define DEVLINK_TRAP_GENERIC_NAME_OVERLAY_SMAC_MC \
+	"overlay_smac_is_mc"
 
 #define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
 	"l2_drops"
-- 
cgit v1.2.3


From 2ab1d925aa4c0c179dd1eb492e8c03536972707b Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sun, 19 Jan 2020 14:31:55 +0100
Subject: net: phy: add generic ndo_do_ioctl handler phy_do_ioctl

A number of network drivers has the same glue code to use phy_mii_ioctl
as ndo_do_ioctl handler. So let's add such a generic ndo_do_ioctl
handler to phylib.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 99a87f02667f..be6b3a1b03da 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1242,6 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
 int phy_ethtool_ksettings_set(struct phy_device *phydev,
 			      const struct ethtool_link_ksettings *cmd);
 int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd);
+int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
 void phy_request_interrupt(struct phy_device *phydev);
 void phy_free_interrupt(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
-- 
cgit v1.2.3


From 3231e5d2228a2078ce5982d63ea9a617e4972c00 Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 20 Jan 2020 22:16:07 +0100
Subject: net: phy: rename phy_do_ioctl to phy_do_ioctl_running

We just added phy_do_ioctl, but it turned out that we need another
version of this function that doesn't check whether net_device is
running. So rename phy_do_ioctl to phy_do_ioctl_running.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index be6b3a1b03da..f6e714da37d8 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1242,7 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
 int phy_ethtool_ksettings_set(struct phy_device *phydev,
 			      const struct ethtool_link_ksettings *cmd);
 int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd);
-int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
+int phy_do_ioctl_running(struct net_device *dev, struct ifreq *ifr, int cmd);
 void phy_request_interrupt(struct phy_device *phydev);
 void phy_free_interrupt(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
-- 
cgit v1.2.3


From bbbf8430afe6906abbf879352fe10d24d380e588 Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 20 Jan 2020 22:17:11 +0100
Subject: net: phy: add new version of phy_do_ioctl

Add a new version of phy_do_ioctl that doesn't check whether net_device
is running. It will typically be used if suitable drivers attach the
PHY in probe already.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/phy.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/phy.h b/include/linux/phy.h
index f6e714da37d8..c570e162e05e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1242,6 +1242,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
 int phy_ethtool_ksettings_set(struct phy_device *phydev,
 			      const struct ethtool_link_ksettings *cmd);
 int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd);
+int phy_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
 int phy_do_ioctl_running(struct net_device *dev, struct ifreq *ifr, int cmd);
 void phy_request_interrupt(struct phy_device *phydev);
 void phy_free_interrupt(struct phy_device *phydev);
-- 
cgit v1.2.3


From f362e5fe0f1f9774b28c5b4c9b07b0168575cf6e Mon Sep 17 00:00:00 2001
From: Martin Schiller <ms@dev.tdt.de>
Date: Tue, 21 Jan 2020 07:00:33 +0100
Subject: wan/hdlc_x25: make lapb params configurable

This enables you to configure mode (DTE/DCE), Modulo, Window, T1, T2, N2 via
sethdlc (which needs to be patched as well).

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/hdlc/ioctl.h | 9 +++++++++
 include/uapi/linux/if.h         | 1 +
 2 files changed, 10 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/hdlc/ioctl.h b/include/uapi/linux/hdlc/ioctl.h
index 0fe4238e8246..b06341acab5e 100644
--- a/include/uapi/linux/hdlc/ioctl.h
+++ b/include/uapi/linux/hdlc/ioctl.h
@@ -79,6 +79,15 @@ typedef struct {
     unsigned int timeout;
 } cisco_proto;
 
+typedef struct {
+	unsigned short dce; /* 1 for DCE (network side) operation */
+	unsigned int modulo; /* modulo (8 = basic / 128 = extended) */
+	unsigned int window; /* frame window size */
+	unsigned int t1; /* timeout t1 */
+	unsigned int t2; /* timeout t2 */
+	unsigned int n2; /* frame retry counter */
+} x25_hdlc_proto;
+
 /* PPP doesn't need any info now - supply length = 0 to ioctl */
 
 #endif /* __ASSEMBLY__ */
diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 4bf33344aab1..be714cd8c826 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -213,6 +213,7 @@ struct if_settings {
 		fr_proto		__user *fr;
 		fr_proto_pvc		__user *fr_pvc;
 		fr_proto_pvc_info	__user *fr_pvc_info;
+		x25_hdlc_proto		__user *x25;
 
 		/* interface settings */
 		sync_serial_settings	__user *sync;
-- 
cgit v1.2.3


From 43a825afc91e2b06af1e8e7422198e759c2c5e20 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Mon, 20 Jan 2020 10:29:17 +0100
Subject: xsk, net: Make sock_def_readable() have external linkage
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XDP sockets use the default implementation of struct sock's
sk_data_ready callback, which is sock_def_readable(). This function
is called in the XDP socket fast-path, and involves a retpoline. By
letting sock_def_readable() have external linkage, and being called
directly, the retpoline can be avoided.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com
---
 include/net/sock.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/sock.h b/include/net/sock.h
index 8dff68b4c316..0891c55f1e82 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2612,4 +2612,6 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
 	return false;
 }
 
+void sock_def_readable(struct sock *sk);
+
 #endif	/* _SOCK_H */
-- 
cgit v1.2.3


From be8704ff07d2374bcc5c675526f95e70c6459683 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Mon, 20 Jan 2020 16:53:46 -0800
Subject: bpf: Introduce dynamic program extensions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Introduce dynamic program extensions. The users can load additional BPF
functions and replace global functions in previously loaded BPF programs while
these programs are executing.

Global functions are verified individually by the verifier based on their types only.
Hence the global function in the new program which types match older function can
safely replace that corresponding function.

This new function/program is called 'an extension' of old program. At load time
the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
to be replaced. The BPF program type is derived from the target program into
extension program. Technically bpf_verifier_ops is copied from target program.
The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
The extension program can call the same bpf helper functions as target program.
Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
types. The verifier allows only one level of replacement. Meaning that the
extension program cannot recursively extend an extension. That also means that
the maximum stack size is increasing from 512 to 1024 bytes and maximum
function nesting level from 8 to 16. The programs don't always consume that
much. The stack usage is determined by the number of on-stack variables used by
the program. The verifier could have enforced 512 limit for combined original
plus extension program, but it makes for difficult user experience. The main
use case for extensions is to provide generic mechanism to plug external
programs into policy program or function call chaining.

BPF trampoline is used to track both fentry/fexit and program extensions
because both are using the same nop slot at the beginning of every BPF
function. Attaching fentry/fexit to a function that was replaced is not
allowed. The opposite is true as well. Replacing a function that currently
being analyzed with fentry/fexit is not allowed. The executable page allocated
by BPF trampoline is not used by program extensions. This inefficiency will be
optimized in future patches.

Function by function verification of global function supports scalars and
pointer to context only. Hence program extensions are supported for such class
of global functions only. In the future the verifier will be extended with
support to pointers to structures, arrays with sizes, etc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
---
 include/linux/bpf.h       | 10 +++++++++-
 include/linux/bpf_types.h |  2 ++
 include/linux/btf.h       |  5 +++++
 include/uapi/linux/bpf.h  |  1 +
 4 files changed, 17 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8e3b8f4ad183..05d16615054c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -465,7 +465,8 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
 enum bpf_tramp_prog_type {
 	BPF_TRAMP_FENTRY,
 	BPF_TRAMP_FEXIT,
-	BPF_TRAMP_MAX
+	BPF_TRAMP_MAX,
+	BPF_TRAMP_REPLACE, /* more than MAX */
 };
 
 struct bpf_trampoline {
@@ -480,6 +481,11 @@ struct bpf_trampoline {
 		void *addr;
 		bool ftrace_managed;
 	} func;
+	/* if !NULL this is BPF_PROG_TYPE_EXT program that extends another BPF
+	 * program by replacing one of its functions. func.addr is the address
+	 * of the function it replaced.
+	 */
+	struct bpf_prog *extension_prog;
 	/* list of BPF programs using this trampoline */
 	struct hlist_head progs_hlist[BPF_TRAMP_MAX];
 	/* Number of attached programs. A counter per kind. */
@@ -1107,6 +1113,8 @@ int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 			     struct bpf_reg_state *regs);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			  struct bpf_reg_state *reg);
+int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
+			 struct btf *btf, const struct btf_type *t);
 
 struct bpf_prog *bpf_prog_by_id(u32 id);
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9f326e6ef885..c81d4ece79a4 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -68,6 +68,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 #if defined(CONFIG_BPF_JIT)
 BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
 	      void *, void *)
+BPF_PROG_TYPE(BPF_PROG_TYPE_EXT, bpf_extension,
+	      void *, void *)
 #endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 881e9b76ef49..5c1ea99b480f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -107,6 +107,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t)
 	return BTF_INFO_VLEN(t->info);
 }
 
+static inline u16 btf_func_linkage(const struct btf_type *t)
+{
+	return BTF_INFO_VLEN(t->info);
+}
+
 static inline bool btf_type_kflag(const struct btf_type *t)
 {
 	return BTF_INFO_KFLAG(t->info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 033d90a2282d..e81628eb059c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -180,6 +180,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 	BPF_PROG_TYPE_TRACING,
 	BPF_PROG_TYPE_STRUCT_OPS,
+	BPF_PROG_TYPE_EXT,
 };
 
 enum bpf_attach_type {
-- 
cgit v1.2.3


From 5576b991e9c1a11d2cc21c4b94fc75ec27603896 Mon Sep 17 00:00:00 2001
From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 22 Jan 2020 15:36:46 -0800
Subject: bpf: Add BPF_FUNC_jiffies64

This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
---
 include/linux/bpf.h      | 1 +
 include/uapi/linux/bpf.h | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 05d16615054c..a9687861fd7e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1414,6 +1414,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
 extern const struct bpf_func_proto bpf_strtol_proto;
 extern const struct bpf_func_proto bpf_strtoul_proto;
 extern const struct bpf_func_proto bpf_tcp_sock_proto;
+extern const struct bpf_func_proto bpf_jiffies64_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e81628eb059c..f1d74a2bd234 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2886,6 +2886,12 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * u64 bpf_jiffies64(void)
+ *	Description
+ *		Obtain the 64bit jiffies
+ *	Return
+ *		The 64 bit jiffies
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3005,7 +3011,8 @@ union bpf_attr {
 	FN(probe_read_user_str),	\
 	FN(probe_read_kernel_str),	\
 	FN(tcp_send_ack),		\
-	FN(send_signal_thread),
+	FN(send_signal_thread),		\
+	FN(jiffies64),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
cgit v1.2.3


From 84bf557fb02f5924c109a21a160ffc353d878487 Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:24 +0530
Subject: net: sched: pie: move common code to pie.h

This patch moves macros, structures and small functions common
to PIE and FQ-PIE (to be added in a future commit) from the file
net/sched/sch_pie.c to the header file include/net/pie.h.
All the moved functions are made inline.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100644 include/net/pie.h

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
new file mode 100644
index 000000000000..440213ec83eb
--- /dev/null
+++ b/include/net/pie.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __NET_SCHED_PIE_H
+#define __NET_SCHED_PIE_H
+
+#include <linux/ktime.h>
+#include <linux/skbuff.h>
+#include <linux/types.h>
+#include <net/inet_ecn.h>
+#include <net/pkt_sched.h>
+
+#define QUEUE_THRESHOLD 16384
+#define DQCOUNT_INVALID -1
+#define DTIME_INVALID 0xffffffffffffffff
+#define MAX_PROB 0xffffffffffffffff
+#define PIE_SCALE 8
+
+/* parameters used */
+struct pie_params {
+	psched_time_t target;	/* user specified target delay in pschedtime */
+	u32 tupdate;		/* timer frequency (in jiffies) */
+	u32 limit;		/* number of packets that can be enqueued */
+	u32 alpha;		/* alpha and beta are between 0 and 32 */
+	u32 beta;		/* and are used for shift relative to 1 */
+	bool ecn;		/* true if ecn is enabled */
+	bool bytemode;		/* to scale drop early prob based on pkt size */
+	u8 dq_rate_estimator;	/* to calculate delay using Little's law */
+};
+
+/* variables used */
+struct pie_vars {
+	u64 prob;		/* probability but scaled by u64 limit. */
+	psched_time_t burst_time;
+	psched_time_t qdelay;
+	psched_time_t qdelay_old;
+	u64 dq_count;		/* measured in bytes */
+	psched_time_t dq_tstamp;	/* drain rate */
+	u64 accu_prob;		/* accumulated drop probability */
+	u32 avg_dq_rate;	/* bytes per pschedtime tick,scaled */
+	u32 qlen_old;		/* in bytes */
+	u8 accu_prob_overflows;	/* overflows of accu_prob */
+};
+
+/* statistics gathering */
+struct pie_stats {
+	u32 packets_in;		/* total number of packets enqueued */
+	u32 dropped;		/* packets dropped due to pie_action */
+	u32 overlimit;		/* dropped due to lack of space in queue */
+	u32 maxq;		/* maximum queue size */
+	u32 ecn_mark;		/* packets marked with ECN */
+};
+
+/* private skb vars */
+struct pie_skb_cb {
+	psched_time_t enqueue_time;
+};
+
+static inline void pie_params_init(struct pie_params *params)
+{
+	params->alpha = 2;
+	params->beta = 20;
+	params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC);	/* 15 ms */
+	params->limit = 1000;	/* default of 1000 packets */
+	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
+	params->ecn = false;
+	params->bytemode = false;
+	params->dq_rate_estimator = false;
+}
+
+static inline void pie_vars_init(struct pie_vars *vars)
+{
+	vars->dq_count = DQCOUNT_INVALID;
+	vars->dq_tstamp = DTIME_INVALID;
+	vars->accu_prob = 0;
+	vars->avg_dq_rate = 0;
+	/* default of 150 ms in pschedtime */
+	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
+	vars->accu_prob_overflows = 0;
+}
+
+static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb)
+{
+	qdisc_cb_private_validate(skb, sizeof(struct pie_skb_cb));
+	return (struct pie_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
+static inline psched_time_t pie_get_enqueue_time(const struct sk_buff *skb)
+{
+	return get_pie_cb(skb)->enqueue_time;
+}
+
+static inline void pie_set_enqueue_time(struct sk_buff *skb)
+{
+	get_pie_cb(skb)->enqueue_time = psched_get_time();
+}
+
+#endif
-- 
cgit v1.2.3


From 805a5a23a4c4ee126e781bd14a1deb242395e817 Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:25 +0530
Subject: pie: use U64_MAX to denote (2^64 - 1)

Use the U64_MAX macro to denote the constant (2^64 - 1).

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index 440213ec83eb..7ef375db5bab 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -10,8 +10,8 @@
 
 #define QUEUE_THRESHOLD 16384
 #define DQCOUNT_INVALID -1
-#define DTIME_INVALID 0xffffffffffffffff
-#define MAX_PROB 0xffffffffffffffff
+#define DTIME_INVALID U64_MAX
+#define MAX_PROB U64_MAX
 #define PIE_SCALE 8
 
 /* parameters used */
-- 
cgit v1.2.3


From cf4eeee5ff56180b525bfb6a204071216ca4000a Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:26 +0530
Subject: pie: rearrange macros in order of length

Rearrange macros in order of length and align the values to
improve readability.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index 7ef375db5bab..397c7abf0879 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -8,11 +8,11 @@
 #include <net/inet_ecn.h>
 #include <net/pkt_sched.h>
 
-#define QUEUE_THRESHOLD 16384
-#define DQCOUNT_INVALID -1
-#define DTIME_INVALID U64_MAX
-#define MAX_PROB U64_MAX
-#define PIE_SCALE 8
+#define MAX_PROB	U64_MAX
+#define DTIME_INVALID	U64_MAX
+#define QUEUE_THRESHOLD	16384
+#define DQCOUNT_INVALID	-1
+#define PIE_SCALE	8
 
 /* parameters used */
 struct pie_params {
-- 
cgit v1.2.3


From 1dbfc5e071db3f5acc3c7c87a564bf57b838cf49 Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:27 +0530
Subject: pie: use u8 instead of bool in pie_vars

Linux best practice recommends using u8 for true/false values in
structures.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index 397c7abf0879..f9c6a44bdb0c 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -21,8 +21,8 @@ struct pie_params {
 	u32 limit;		/* number of packets that can be enqueued */
 	u32 alpha;		/* alpha and beta are between 0 and 32 */
 	u32 beta;		/* and are used for shift relative to 1 */
-	bool ecn;		/* true if ecn is enabled */
-	bool bytemode;		/* to scale drop early prob based on pkt size */
+	u8 ecn;			/* true if ecn is enabled */
+	u8 bytemode;		/* to scale drop early prob based on pkt size */
 	u8 dq_rate_estimator;	/* to calculate delay using Little's law */
 };
 
-- 
cgit v1.2.3


From 2dfb1952a9a1fde0b515f58605c11902e69415bf Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:28 +0530
Subject: pie: rearrange structure members and their initializations

Rearrange the members of the structure such that closely
referenced members appear together and/or fit in the same
cacheline. Also, change the order of their initializations to
match the order in which they appear in the structure.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index f9c6a44bdb0c..ec0fbe98ec2f 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -28,13 +28,13 @@ struct pie_params {
 
 /* variables used */
 struct pie_vars {
-	u64 prob;		/* probability but scaled by u64 limit. */
-	psched_time_t burst_time;
 	psched_time_t qdelay;
 	psched_time_t qdelay_old;
-	u64 dq_count;		/* measured in bytes */
+	psched_time_t burst_time;
 	psched_time_t dq_tstamp;	/* drain rate */
+	u64 prob;		/* probability but scaled by u64 limit. */
 	u64 accu_prob;		/* accumulated drop probability */
+	u64 dq_count;		/* measured in bytes */
 	u32 avg_dq_rate;	/* bytes per pschedtime tick,scaled */
 	u32 qlen_old;		/* in bytes */
 	u8 accu_prob_overflows;	/* overflows of accu_prob */
@@ -45,8 +45,8 @@ struct pie_stats {
 	u32 packets_in;		/* total number of packets enqueued */
 	u32 dropped;		/* packets dropped due to pie_action */
 	u32 overlimit;		/* dropped due to lack of space in queue */
-	u32 maxq;		/* maximum queue size */
 	u32 ecn_mark;		/* packets marked with ECN */
+	u32 maxq;		/* maximum queue size */
 };
 
 /* private skb vars */
@@ -56,11 +56,11 @@ struct pie_skb_cb {
 
 static inline void pie_params_init(struct pie_params *params)
 {
-	params->alpha = 2;
-	params->beta = 20;
+	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
 	params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC);	/* 15 ms */
 	params->limit = 1000;	/* default of 1000 packets */
-	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
+	params->alpha = 2;
+	params->beta = 20;
 	params->ecn = false;
 	params->bytemode = false;
 	params->dq_rate_estimator = false;
@@ -68,12 +68,12 @@ static inline void pie_params_init(struct pie_params *params)
 
 static inline void pie_vars_init(struct pie_vars *vars)
 {
-	vars->dq_count = DQCOUNT_INVALID;
+	/* default of 150 ms in pschedtime */
+	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
 	vars->dq_tstamp = DTIME_INVALID;
 	vars->accu_prob = 0;
+	vars->dq_count = DQCOUNT_INVALID;
 	vars->avg_dq_rate = 0;
-	/* default of 150 ms in pschedtime */
-	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
 	vars->accu_prob_overflows = 0;
 }
 
-- 
cgit v1.2.3


From b42a3d7c7cfff3555d7057c20dbbe57fe75d77c0 Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:29 +0530
Subject: pie: improve comments and commenting style

Improve the comments along with the commenting style used to
describe the members of the structures and their initial values
in the init functions.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 85 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 58 insertions(+), 27 deletions(-)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index ec0fbe98ec2f..51a1984c2dce 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -14,42 +14,74 @@
 #define DQCOUNT_INVALID	-1
 #define PIE_SCALE	8
 
-/* parameters used */
+/**
+ * struct pie_params - contains pie parameters
+ * @target:		target delay in pschedtime
+ * @tudpate:		interval at which drop probability is calculated
+ * @limit:		total number of packets that can be in the queue
+ * @alpha:		parameter to control drop probability
+ * @beta:		parameter to control drop probability
+ * @ecn:		is ECN marking of packets enabled
+ * @bytemode:		is drop probability scaled based on pkt size
+ * @dq_rate_estimator:	is Little's law used for qdelay calculation
+ */
 struct pie_params {
-	psched_time_t target;	/* user specified target delay in pschedtime */
-	u32 tupdate;		/* timer frequency (in jiffies) */
-	u32 limit;		/* number of packets that can be enqueued */
-	u32 alpha;		/* alpha and beta are between 0 and 32 */
-	u32 beta;		/* and are used for shift relative to 1 */
-	u8 ecn;			/* true if ecn is enabled */
-	u8 bytemode;		/* to scale drop early prob based on pkt size */
-	u8 dq_rate_estimator;	/* to calculate delay using Little's law */
+	psched_time_t target;
+	u32 tupdate;
+	u32 limit;
+	u32 alpha;
+	u32 beta;
+	u8 ecn;
+	u8 bytemode;
+	u8 dq_rate_estimator;
 };
 
-/* variables used */
+/**
+ * struct pie_vars - contains pie variables
+ * @qdelay:			current queue delay
+ * @qdelay_old:			queue delay in previous qdelay calculation
+ * @burst_time:			burst time allowance
+ * @dq_tstamp:			timestamp at which dq rate was last calculated
+ * @prob:			drop probability
+ * @accu_prob:			accumulated drop probability
+ * @dq_count:			number of bytes dequeued in a measurement cycle
+ * @avg_dq_rate:		calculated average dq rate
+ * @qlen_old:			queue length during previous qdelay calculation
+ * @accu_prob_overflows:	number of times accu_prob overflows
+ */
 struct pie_vars {
 	psched_time_t qdelay;
 	psched_time_t qdelay_old;
 	psched_time_t burst_time;
-	psched_time_t dq_tstamp;	/* drain rate */
-	u64 prob;		/* probability but scaled by u64 limit. */
-	u64 accu_prob;		/* accumulated drop probability */
-	u64 dq_count;		/* measured in bytes */
-	u32 avg_dq_rate;	/* bytes per pschedtime tick,scaled */
-	u32 qlen_old;		/* in bytes */
-	u8 accu_prob_overflows;	/* overflows of accu_prob */
+	psched_time_t dq_tstamp;
+	u64 prob;
+	u64 accu_prob;
+	u64 dq_count;
+	u32 avg_dq_rate;
+	u32 qlen_old;
+	u8 accu_prob_overflows;
 };
 
-/* statistics gathering */
+/**
+ * struct pie_stats - contains pie stats
+ * @packets_in:	total number of packets enqueued
+ * @dropped:	packets dropped due to pie action
+ * @overlimit:	packets dropped due to lack of space in queue
+ * @ecn_mark:	packets marked with ECN
+ * @maxq:	maximum queue size
+ */
 struct pie_stats {
-	u32 packets_in;		/* total number of packets enqueued */
-	u32 dropped;		/* packets dropped due to pie_action */
-	u32 overlimit;		/* dropped due to lack of space in queue */
-	u32 ecn_mark;		/* packets marked with ECN */
-	u32 maxq;		/* maximum queue size */
+	u32 packets_in;
+	u32 dropped;
+	u32 overlimit;
+	u32 ecn_mark;
+	u32 maxq;
 };
 
-/* private skb vars */
+/**
+ * struct pie_skb_cb - contains private skb vars
+ * @enqueue_time:	timestamp when the packet is enqueued
+ */
 struct pie_skb_cb {
 	psched_time_t enqueue_time;
 };
@@ -58,7 +90,7 @@ static inline void pie_params_init(struct pie_params *params)
 {
 	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
 	params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC);	/* 15 ms */
-	params->limit = 1000;	/* default of 1000 packets */
+	params->limit = 1000;
 	params->alpha = 2;
 	params->beta = 20;
 	params->ecn = false;
@@ -68,8 +100,7 @@ static inline void pie_params_init(struct pie_params *params)
 
 static inline void pie_vars_init(struct pie_vars *vars)
 {
-	/* default of 150 ms in pschedtime */
-	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
+	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC); /* 150 ms */
 	vars->dq_tstamp = DTIME_INVALID;
 	vars->accu_prob = 0;
 	vars->dq_count = DQCOUNT_INVALID;
-- 
cgit v1.2.3


From 5205ea00cda1ac23cebfb97dfccca84722d58dfe Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:32 +0530
Subject: net: sched: pie: export symbols to be reused by FQ-PIE

This patch makes the drop_early(), calculate_probability() and
pie_process_dequeue() functions generic enough to be used by
both PIE and FQ-PIE (to be added in a future commit). The major
change here is in the way the functions take in arguments. This
patch exports these functions and makes FQ-PIE dependent on
sch_pie.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index 51a1984c2dce..90f5db3d29e7 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -124,4 +124,13 @@ static inline void pie_set_enqueue_time(struct sk_buff *skb)
 	get_pie_cb(skb)->enqueue_time = psched_get_time();
 }
 
+bool pie_drop_early(struct Qdisc *sch, struct pie_params *params,
+		    struct pie_vars *vars, u32 qlen, u32 packet_size);
+
+void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params,
+			 struct pie_vars *vars, u32 qlen);
+
+void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
+			       u32 qlen);
+
 #endif
-- 
cgit v1.2.3


From ec97ecf1ebe485a17cd8395a5f35e6b80b57665a Mon Sep 17 00:00:00 2001
From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Date: Wed, 22 Jan 2020 23:52:33 +0530
Subject: net: sched: add Flow Queue PIE packet scheduler

Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecnprob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pie.h              |  2 ++
 include/uapi/linux/pkt_sched.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

(limited to 'include')

diff --git a/include/net/pie.h b/include/net/pie.h
index 90f5db3d29e7..fd5a37cb7993 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -81,9 +81,11 @@ struct pie_stats {
 /**
  * struct pie_skb_cb - contains private skb vars
  * @enqueue_time:	timestamp when the packet is enqueued
+ * @mem_usage:		size of the skb during enqueue
  */
 struct pie_skb_cb {
 	psched_time_t enqueue_time;
+	u32 mem_usage;
 };
 
 static inline void pie_params_init(struct pie_params *params)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index bf5a5b1dfb0b..bbe791b24168 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -971,6 +971,37 @@ struct tc_pie_xstats {
 	__u32 ecn_mark;			/* packets marked with ecn*/
 };
 
+/* FQ PIE */
+enum {
+	TCA_FQ_PIE_UNSPEC,
+	TCA_FQ_PIE_LIMIT,
+	TCA_FQ_PIE_FLOWS,
+	TCA_FQ_PIE_TARGET,
+	TCA_FQ_PIE_TUPDATE,
+	TCA_FQ_PIE_ALPHA,
+	TCA_FQ_PIE_BETA,
+	TCA_FQ_PIE_QUANTUM,
+	TCA_FQ_PIE_MEMORY_LIMIT,
+	TCA_FQ_PIE_ECN_PROB,
+	TCA_FQ_PIE_ECN,
+	TCA_FQ_PIE_BYTEMODE,
+	TCA_FQ_PIE_DQ_RATE_ESTIMATOR,
+	__TCA_FQ_PIE_MAX
+};
+#define TCA_FQ_PIE_MAX   (__TCA_FQ_PIE_MAX - 1)
+
+struct tc_fq_pie_xstats {
+	__u32 packets_in;	/* total number of packets enqueued */
+	__u32 dropped;		/* packets dropped due to fq_pie_action */
+	__u32 overlimit;	/* dropped due to lack of space in queue */
+	__u32 overmemory;	/* dropped due to lack of memory in queue */
+	__u32 ecn_mark;		/* packets marked with ecn */
+	__u32 new_flow_count;	/* count of new flows created by packets */
+	__u32 new_flows_len;	/* count of flows in new list */
+	__u32 old_flows_len;	/* count of flows in old list */
+	__u32 memory_usage;	/* total memory across all queues */
+};
+
 /* CBS */
 struct tc_cbs_qopt {
 	__u8 offload;
-- 
cgit v1.2.3


From a5d29ae226812ae620f58a0726525d8af9b1d751 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Fri, 24 Jan 2020 13:40:21 +0200
Subject: net: bridge: vlan: add basic option setting support

This patch adds support for option modification of single vlans and
ranges. It allows to only modify options, i.e. skip create/delete by
using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
option changes we try to pack the notifications as much as possible.

v2: do full port (all vlans) notification only when creating/deleting
    vlans for compatibility, rework the range detection when changing
    options, add more verbose extack errors and check if a vlan should
    be used (br_vlan_should_use checks)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_bridge.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index ac38f0b674b8..c8b87ad52c5b 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -130,6 +130,7 @@ enum {
 #define BRIDGE_VLAN_INFO_RANGE_BEGIN	(1<<3) /* VLAN is start of vlan range */
 #define BRIDGE_VLAN_INFO_RANGE_END	(1<<4) /* VLAN is end of vlan range */
 #define BRIDGE_VLAN_INFO_BRENTRY	(1<<5) /* Global bridge VLAN entry */
+#define BRIDGE_VLAN_INFO_ONLY_OPTS	(1<<6) /* Skip create/delete/flags */
 
 struct bridge_vlan_info {
 	__u16 flags;
-- 
cgit v1.2.3


From a580c76d534c7360ba68042b19cb255e8420e987 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Fri, 24 Jan 2020 13:40:22 +0200
Subject: net: bridge: vlan: add per-vlan state

The first per-vlan option added is state, it is needed for EVPN and for
per-vlan STP. The state allows to control the forwarding on per-vlan
basis. The vlan state is considered only if the port state is forwarding
in order to avoid conflicts and be consistent. br_allowed_egress is
called only when the state is forwarding, but the ingress case is a bit
more complicated due to the fact that we may have the transition between
port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
allow the bridge to learn from the packet after vlan filtering and it will
be dropped after that. Also to optimize the pvid state check we keep a
copy in the vlan group to avoid one lookup. The state members are
modified with *_ONCE() to annotate the lockless access.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/if_bridge.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index c8b87ad52c5b..42f7ca38ad80 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -191,6 +191,7 @@ enum {
 	BRIDGE_VLANDB_ENTRY_UNSPEC,
 	BRIDGE_VLANDB_ENTRY_INFO,
 	BRIDGE_VLANDB_ENTRY_RANGE,
+	BRIDGE_VLANDB_ENTRY_STATE,
 	__BRIDGE_VLANDB_ENTRY_MAX,
 };
 #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1)
-- 
cgit v1.2.3


From f870fa0b5768842cb4690c1c11f19f28b731ae6d Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:15 -0800
Subject: mptcp: Add MPTCP socket stubs

Implements the infrastructure for MPTCP sockets.

MPTCP sockets open one in-kernel TCP socket per subflow. These subflow
sockets are only managed by the MPTCP socket that owns them and are not
visible from userspace. This commit allows a userspace program to open
an MPTCP socket with:

  sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);

The resulting socket is simply a wrapper around a single regular TCP
socket, without any of the MPTCP protocol implemented over the wire.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/mptcp.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

(limited to 'include')

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 0573ae75c3db..98ba22379117 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -28,6 +28,8 @@ struct mptcp_ext {
 
 #ifdef CONFIG_MPTCP
 
+void mptcp_init(void);
+
 /* move the skb extension owership, with the assumption that 'to' is
  * newly allocated
  */
@@ -70,6 +72,10 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
 
 #else
 
+static inline void mptcp_init(void)
+{
+}
+
 static inline void mptcp_skb_ext_move(struct sk_buff *to,
 				      const struct sk_buff *from)
 {
@@ -82,4 +88,14 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
 }
 
 #endif /* CONFIG_MPTCP */
+
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+int mptcpv6_init(void);
+#elif IS_ENABLED(CONFIG_IPV6)
+static inline int mptcpv6_init(void)
+{
+	return 0;
+}
+#endif
+
 #endif /* __NET_MPTCP_H */
-- 
cgit v1.2.3


From eda7acddf8080bb2d022a8d4b8b2345eb80c63ec Mon Sep 17 00:00:00 2001
From: Peter Krystad <peter.krystad@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:16 -0800
Subject: mptcp: Handle MPTCP TCP options

Add hooks to parse and format the MP_CAPABLE option.

This option is handled according to MPTCP version 0 (RFC6824).
MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in
coordination with related code changes.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h | 18 ++++++++++++++++++
 include/net/mptcp.h | 18 ++++++++++++++++++
 2 files changed, 36 insertions(+)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index ca6f01531e64..52798ab00394 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -78,6 +78,16 @@ struct tcp_sack_block {
 #define TCP_SACK_SEEN     (1 << 0)   /*1 = peer is SACK capable, */
 #define TCP_DSACK_SEEN    (1 << 2)   /*1 = DSACK was received from peer*/
 
+#if IS_ENABLED(CONFIG_MPTCP)
+struct mptcp_options_received {
+	u64	sndr_key;
+	u64	rcvr_key;
+	u8	mp_capable : 1,
+		mp_join : 1,
+		dss : 1;
+};
+#endif
+
 struct tcp_options_received {
 /*	PAWS/RTTM data	*/
 	int	ts_recent_stamp;/* Time we stored ts_recent (for aging) */
@@ -95,6 +105,9 @@ struct tcp_options_received {
 	u8	num_sacks;	/* Number of SACK blocks		*/
 	u16	user_mss;	/* mss requested by user in ioctl	*/
 	u16	mss_clamp;	/* Maximal mss, negotiated at connection setup */
+#if IS_ENABLED(CONFIG_MPTCP)
+	struct mptcp_options_received	mptcp;
+#endif
 };
 
 static inline void tcp_clear_options(struct tcp_options_received *rx_opt)
@@ -104,6 +117,11 @@ static inline void tcp_clear_options(struct tcp_options_received *rx_opt)
 #if IS_ENABLED(CONFIG_SMC)
 	rx_opt->smc_ok = 0;
 #endif
+#if IS_ENABLED(CONFIG_MPTCP)
+	rx_opt->mptcp.mp_capable = 0;
+	rx_opt->mptcp.mp_join = 0;
+	rx_opt->mptcp.dss = 0;
+#endif
 }
 
 /* This is the max number of SACKS that we'll generate and process. It's safe
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 98ba22379117..3daec2ceb3ff 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -9,6 +9,7 @@
 #define __NET_MPTCP_H
 
 #include <linux/skbuff.h>
+#include <linux/tcp.h>
 #include <linux/types.h>
 
 /* MPTCP sk_buff extension data */
@@ -26,10 +27,22 @@ struct mptcp_ext {
 	/* one byte hole */
 };
 
+struct mptcp_out_options {
+#if IS_ENABLED(CONFIG_MPTCP)
+	u16 suboptions;
+	u64 sndr_key;
+	u64 rcvr_key;
+#endif
+};
+
 #ifdef CONFIG_MPTCP
 
 void mptcp_init(void);
 
+void mptcp_parse_option(const unsigned char *ptr, int opsize,
+			struct tcp_options_received *opt_rx);
+void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts);
+
 /* move the skb extension owership, with the assumption that 'to' is
  * newly allocated
  */
@@ -76,6 +89,11 @@ static inline void mptcp_init(void)
 {
 }
 
+static inline void mptcp_parse_option(const unsigned char *ptr, int opsize,
+				      struct tcp_options_received *opt_rx)
+{
+}
+
 static inline void mptcp_skb_ext_move(struct sk_buff *to,
 				      const struct sk_buff *from)
 {
-- 
cgit v1.2.3


From 2303f994b3e187091fd08148066688b08f837efc Mon Sep 17 00:00:00 2001
From: Peter Krystad <peter.krystad@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:17 -0800
Subject: mptcp: Associate MPTCP context with TCP socket

Use ULP to associate a subflow_context structure with each TCP subflow
socket. Creating these sockets requires new bind and connect functions
to make sure ULP is set up immediately when the subflow sockets are
created.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 52798ab00394..877947475814 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -397,6 +397,9 @@ struct tcp_sock {
 	u32	mtu_info; /* We received an ICMP_FRAG_NEEDED / ICMPV6_PKT_TOOBIG
 			   * while socket was owned by user.
 			   */
+#if IS_ENABLED(CONFIG_MPTCP)
+	bool	is_mptcp;
+#endif
 
 #ifdef CONFIG_TCP_MD5SIG
 /* TCP AF-Specific parts; only used by MD5 Signature support so far */
-- 
cgit v1.2.3


From cec37a6e41aae7bf3df9a3da783380a4d9325fd8 Mon Sep 17 00:00:00 2001
From: Peter Krystad <peter.krystad@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:18 -0800
Subject: mptcp: Handle MP_CAPABLE options for outgoing connections

Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request,
to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to
the final ACK of the three-way handshake.

Use the .sk_rx_dst_set() handler in the subflow proto to capture when the
responding SYN-ACK is received and notify the MPTCP connection layer.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h |  3 +++
 include/net/mptcp.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 877947475814..e9ee06d887fa 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -137,6 +137,9 @@ struct tcp_request_sock {
 	const struct tcp_request_sock_ops *af_specific;
 	u64				snt_synack; /* first SYNACK sent time */
 	bool				tfo_listener;
+#if IS_ENABLED(CONFIG_MPTCP)
+	bool				is_mptcp;
+#endif
 	u32				txhash;
 	u32				rcv_isn;
 	u32				snt_isn;
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 3daec2ceb3ff..eabc57c3fde4 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -39,8 +39,27 @@ struct mptcp_out_options {
 
 void mptcp_init(void);
 
+static inline bool sk_is_mptcp(const struct sock *sk)
+{
+	return tcp_sk(sk)->is_mptcp;
+}
+
+static inline bool rsk_is_mptcp(const struct request_sock *req)
+{
+	return tcp_rsk(req)->is_mptcp;
+}
+
 void mptcp_parse_option(const unsigned char *ptr, int opsize,
 			struct tcp_options_received *opt_rx);
+bool mptcp_syn_options(struct sock *sk, unsigned int *size,
+		       struct mptcp_out_options *opts);
+void mptcp_rcv_synsent(struct sock *sk);
+bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
+			  struct mptcp_out_options *opts);
+bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
+			       unsigned int *size, unsigned int remaining,
+			       struct mptcp_out_options *opts);
+
 void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts);
 
 /* move the skb extension owership, with the assumption that 'to' is
@@ -89,11 +108,47 @@ static inline void mptcp_init(void)
 {
 }
 
+static inline bool sk_is_mptcp(const struct sock *sk)
+{
+	return false;
+}
+
+static inline bool rsk_is_mptcp(const struct request_sock *req)
+{
+	return false;
+}
+
 static inline void mptcp_parse_option(const unsigned char *ptr, int opsize,
 				      struct tcp_options_received *opt_rx)
 {
 }
 
+static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size,
+				     struct mptcp_out_options *opts)
+{
+	return false;
+}
+
+static inline void mptcp_rcv_synsent(struct sock *sk)
+{
+}
+
+static inline bool mptcp_synack_options(const struct request_sock *req,
+					unsigned int *size,
+					struct mptcp_out_options *opts)
+{
+	return false;
+}
+
+static inline bool mptcp_established_options(struct sock *sk,
+					     struct sk_buff *skb,
+					     unsigned int *size,
+					     unsigned int remaining,
+					     struct mptcp_out_options *opts)
+{
+	return false;
+}
+
 static inline void mptcp_skb_ext_move(struct sk_buff *to,
 				      const struct sk_buff *from)
 {
@@ -107,6 +162,8 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
 
 #endif /* CONFIG_MPTCP */
 
+void mptcp_handle_ipv6_mapped(struct sock *sk, bool mapped);
+
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 int mptcpv6_init(void);
 #elif IS_ENABLED(CONFIG_IPV6)
-- 
cgit v1.2.3


From 6d0060f600adfddaa43fefb96b6b12643331961e Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:23 -0800
Subject: mptcp: Write MPTCP DSS headers to outgoing data packets

Per-packet metadata required to write the MPTCP DSS option is written to
the skb_ext area. One write to the socket may contain more than one
packet of data, which is copied to page fragments and mapped in to MPTCP
DSS segments with size determined by the available page fragments and
the maximum mapping length allowed by the MPTCP specification. If
do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok -
the later skbs can either have duplicated DSS mapping information or
none at all, and the receiver can handle that.

The current implementation uses the subflow frag cache and tcp
sendpages to avoid excessive code duplication. More work is required to
ensure that it works correctly under memory pressure and to support
MPTCP-level retransmissions.

The MPTCP DSS checksum is not yet implemented.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/mptcp.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index eabc57c3fde4..06dcc665135e 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -32,6 +32,7 @@ struct mptcp_out_options {
 	u16 suboptions;
 	u64 sndr_key;
 	u64 rcvr_key;
+	struct mptcp_ext ext_copy;
 #endif
 };
 
-- 
cgit v1.2.3


From 648ef4b88673dadb8463bf0d4b10fbf33d55def8 Mon Sep 17 00:00:00 2001
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date: Tue, 21 Jan 2020 16:56:24 -0800
Subject: mptcp: Implement MPTCP receive path

Parses incoming DSS options and populates outgoing MPTCP ACK
fields. MPTCP fields are parsed from the TCP option header and placed in
an skb extension, allowing the upper MPTCP layer to access MPTCP
options after the skb has gone through the TCP stack.

The subflow implements its own data_ready() ops, which ensures that the
pending data is in sequence - according to MPTCP seq number - dropping
out-of-seq skbs. The DATA_READY bit flag is set if this is the case.
This allows the MPTCP socket layer to determine if more data is
available without having to consult the individual subflows.

It additionally validates the current mapping and propagates EoF events
to the connection socket.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h | 10 ++++++++++
 include/net/mptcp.h |  8 ++++++++
 2 files changed, 18 insertions(+)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index e9ee06d887fa..0d00dad4b85d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -82,9 +82,19 @@ struct tcp_sack_block {
 struct mptcp_options_received {
 	u64	sndr_key;
 	u64	rcvr_key;
+	u64	data_ack;
+	u64	data_seq;
+	u32	subflow_seq;
+	u16	data_len;
 	u8	mp_capable : 1,
 		mp_join : 1,
 		dss : 1;
+	u8	use_map:1,
+		dsn64:1,
+		data_fin:1,
+		use_ack:1,
+		ack64:1,
+		__unused:3;
 };
 #endif
 
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 06dcc665135e..8619c1fca741 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -60,6 +60,8 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
 bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 			       unsigned int *size, unsigned int remaining,
 			       struct mptcp_out_options *opts);
+void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb,
+			    struct tcp_options_received *opt_rx);
 
 void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts);
 
@@ -150,6 +152,12 @@ static inline bool mptcp_established_options(struct sock *sk,
 	return false;
 }
 
+static inline void mptcp_incoming_options(struct sock *sk,
+					  struct sk_buff *skb,
+					  struct tcp_options_received *opt_rx)
+{
+}
+
 static inline void mptcp_skb_ext_move(struct sk_buff *to,
 				      const struct sk_buff *from)
 {
-- 
cgit v1.2.3


From cc7972ea1932335e0a0ee00ac8a24b3e8304630d Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Tue, 21 Jan 2020 16:56:31 -0800
Subject: mptcp: parse and emit MP_CAPABLE option according to v1 spec

This implements MP_CAPABLE options parsing and writing according
to RFC 6824 bis / RFC 8684: MPTCP v1.

Local key is sent on syn/ack, and both keys are sent on 3rd ack.
MP_CAPABLE messages len are updated accordingly. We need the skbuff to
correctly emit the above, so we push the skbuff struct as an argument
all the way from tcp code to the relevant mptcp callbacks.

When processing incoming MP_CAPABLE + data, build a full blown DSS-like
map info, to simplify later processing.  On child socket creation, we
need to record the remote key, if available.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h |  3 ++-
 include/net/mptcp.h | 17 ++++++++++-------
 2 files changed, 12 insertions(+), 8 deletions(-)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 0d00dad4b85d..4e2124607d32 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -94,7 +94,8 @@ struct mptcp_options_received {
 		data_fin:1,
 		use_ack:1,
 		ack64:1,
-		__unused:3;
+		mpc_map:1,
+		__unused:2;
 };
 #endif
 
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 8619c1fca741..27627e2d1bc2 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -23,7 +23,8 @@ struct mptcp_ext {
 			data_fin:1,
 			use_ack:1,
 			ack64:1,
-			__unused:3;
+			mpc_map:1,
+			__unused:2;
 	/* one byte hole */
 };
 
@@ -50,10 +51,10 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
 	return tcp_rsk(req)->is_mptcp;
 }
 
-void mptcp_parse_option(const unsigned char *ptr, int opsize,
-			struct tcp_options_received *opt_rx);
-bool mptcp_syn_options(struct sock *sk, unsigned int *size,
-		       struct mptcp_out_options *opts);
+void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
+			int opsize, struct tcp_options_received *opt_rx);
+bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
+		       unsigned int *size, struct mptcp_out_options *opts);
 void mptcp_rcv_synsent(struct sock *sk);
 bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
 			  struct mptcp_out_options *opts);
@@ -121,12 +122,14 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
 	return false;
 }
 
-static inline void mptcp_parse_option(const unsigned char *ptr, int opsize,
+static inline void mptcp_parse_option(const struct sk_buff *skb,
+				      const unsigned char *ptr, int opsize,
 				      struct tcp_options_received *opt_rx)
 {
 }
 
-static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size,
+static inline bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
+				     unsigned int *size,
 				     struct mptcp_out_options *opts)
 {
 	return false;
-- 
cgit v1.2.3


From e42f1ac626e7f799717d006e0f8393b6d6f9fc8c Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw@strlen.de>
Date: Fri, 24 Jan 2020 16:04:02 -0800
Subject: mptcp: do not inherit inet proto ops

We need to initialise the struct ourselves, else we expose tcp-specific
callbacks such as tcp_splice_read which will then trigger splat because
the socket is an mptcp one:

BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478

CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
Call Trace:
 tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
 do_splice+0x1259/0x1560 fs/splice.c:1205

To prevent build error with ipv6, add the recv/sendmsg function
declaration to ipv6.h.  The functions are already accessible "thanks"
to retpoline related work, but they are currently only made visible
by socket.c specific INDIRECT_CALLABLE macros.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ipv6.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 4e95f6df508c..cec1a54401f2 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1113,6 +1113,9 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 
 int inet6_hash_connect(struct inet_timewait_death_row *death_row,
 			      struct sock *sk);
+int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
+		  int flags);
 
 /*
  * reassembly.c
-- 
cgit v1.2.3


From ef6aadcc76c97e25f62adc4e9d19684d3e5d0b87 Mon Sep 17 00:00:00 2001
From: Petr Machata <petrm@mellanox.com>
Date: Fri, 24 Jan 2020 15:23:06 +0200
Subject: net: sched: Make TBF Qdisc offloadable

Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying
and dumping of TBF Qdisc.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h     | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5ec3537fbdb1..11bdf6cb30bd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -850,6 +850,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_TAPRIO,
 	TC_SETUP_FT,
 	TC_SETUP_QDISC_ETS,
+	TC_SETUP_QDISC_TBF,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 47b115e2012a..ce036492986a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -854,4 +854,26 @@ struct tc_ets_qopt_offload {
 	};
 };
 
+enum tc_tbf_command {
+	TC_TBF_REPLACE,
+	TC_TBF_DESTROY,
+	TC_TBF_STATS,
+};
+
+struct tc_tbf_qopt_offload_replace_params {
+	struct psched_ratecfg rate;
+	u32 max_size;
+	struct gnet_stats_queue *qstats;
+};
+
+struct tc_tbf_qopt_offload {
+	enum tc_tbf_command command;
+	u32 handle;
+	u32 parent;
+	union {
+		struct tc_tbf_qopt_offload_replace_params replace_params;
+		struct tc_qopt_offload_stats stats;
+	};
+};
+
 #endif
-- 
cgit v1.2.3


From e9b4e606c2289d6610113253922bb8c9ac7f68b0 Mon Sep 17 00:00:00 2001
From: Jiri Olsa <jolsa@kernel.org>
Date: Thu, 23 Jan 2020 17:15:07 +0100
Subject: bpf: Allow to resolve bpf trampoline and dispatcher in unwind

When unwinding the stack we need to identify each address
to successfully continue. Adding latch tree to keep trampolines
for quick lookup during the unwind.

The patch uses first 48 bytes for latch tree node, leaving 4048
bytes from the rest of the page for trampoline or dispatcher
generated code.

It's still enough not to affect trampoline and dispatcher progs
maximum counts.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200123161508.915203-3-jolsa@kernel.org
---
 include/linux/bpf.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a9687861fd7e..8e9ad3943cd9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -525,7 +525,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
-void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_INIT(name) {			\
 	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
 	.func = &name##func,				\
@@ -557,6 +556,13 @@ void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_PTR(name) (&name)
 void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 				struct bpf_prog *to);
+struct bpf_image {
+	struct latch_tree_node tnode;
+	unsigned char data[];
+};
+#define BPF_IMAGE_SIZE (PAGE_SIZE - sizeof(struct bpf_image))
+bool is_bpf_image_address(unsigned long address);
+void *bpf_image_alloc(void);
 #else
 static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
@@ -578,6 +584,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 					      struct bpf_prog *from,
 					      struct bpf_prog *to) {}
+static inline bool is_bpf_image_address(unsigned long address)
+{
+	return false;
+}
 #endif
 
 struct bpf_func_info_aux {
-- 
cgit v1.2.3


From 32efcc06d2a15fa87585614d12d6c2308cc2d3f3 Mon Sep 17 00:00:00 2001
From: Abdul Kabbani <akabbani@google.com>
Date: Fri, 24 Jan 2020 16:34:02 -0500
Subject: tcp: export count for rehash attempts

Using IPv6 flow-label to swiftly route around avoid congested or
disconnected network path can greatly improve TCP reliability.

This patch adds SNMP counters and a OPT_STATS counter to track both
host-level and connection-level statistics. Network administrators
can use these counters to evaluate the impact of this new ability better.

Export count for rehash attempts to
1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts),
   and TcpDuplicateDataRehash (rehash due to receiving duplicate
   packets)
2) Timestamping API SOF_TIMESTAMPING_OPT_STATS.

Signed-off-by: Abdul Kabbani <akabbani@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/tcp.h       | 2 ++
 include/uapi/linux/snmp.h | 2 ++
 include/uapi/linux/tcp.h  | 1 +
 3 files changed, 5 insertions(+)

(limited to 'include')

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 4e2124607d32..1cf73e6f85ca 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -386,6 +386,8 @@ struct tcp_sock {
 #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
 #endif
 
+	u16 timeout_rehash;	/* Timeout-triggered rehash attempts */
+
 	u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */
 
 /* Receiver side RTT estimation */
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 7eee233e78d2..7d91f4debc48 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -285,6 +285,8 @@ enum
 	LINUX_MIB_TCPRCVQDROP,			/* TCPRcvQDrop */
 	LINUX_MIB_TCPWQUEUETOOBIG,		/* TCPWqueueTooBig */
 	LINUX_MIB_TCPFASTOPENPASSIVEALTKEY,	/* TCPFastOpenPassiveAltKey */
+	LINUX_MIB_TCPTIMEOUTREHASH,		/* TCPTimeoutRehash */
+	LINUX_MIB_TCPDUPLICATEDATAREHASH,	/* TCPDuplicateDataRehash */
 	__LINUX_MIB_MAX
 };
 
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index d87184e673ca..fd9eb8f6bcae 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -311,6 +311,7 @@ enum {
 	TCP_NLA_DSACK_DUPS,	/* DSACK blocks received */
 	TCP_NLA_REORD_SEEN,	/* reordering events seen */
 	TCP_NLA_SRTT,		/* smoothed RTT in usecs */
+	TCP_NLA_TIMEOUT_REHASH, /* Timeout-triggered rehash attempts */
 };
 
 /* for TCP_MD5SIG socket option */
-- 
cgit v1.2.3


From 7b225d0b5c6dda5fefab578175f210c6fc7e389a Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 22 Jan 2020 00:17:52 +0100
Subject: netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute

Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
  nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        | 14 +++++++++++++-
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index fe7c50acc681..504c0aa93805 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -231,6 +231,7 @@ struct nft_userdata {
  *	struct nft_set_elem - generic representation of set elements
  *
  *	@key: element key
+ *	@key_end: closing element key
  *	@priv: element private data and extensions
  */
 struct nft_set_elem {
@@ -238,6 +239,10 @@ struct nft_set_elem {
 		u32		buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)];
 		struct nft_data	val;
 	} key;
+	union {
+		u32		buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)];
+		struct nft_data	val;
+	} key_end;
 	void			*priv;
 };
 
@@ -502,6 +507,7 @@ void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set);
  *	enum nft_set_extensions - set extension type IDs
  *
  *	@NFT_SET_EXT_KEY: element key
+ *	@NFT_SET_EXT_KEY_END: upper bound element key, for ranges
  *	@NFT_SET_EXT_DATA: mapping data
  *	@NFT_SET_EXT_FLAGS: element flags
  *	@NFT_SET_EXT_TIMEOUT: element timeout
@@ -513,6 +519,7 @@ void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set);
  */
 enum nft_set_extensions {
 	NFT_SET_EXT_KEY,
+	NFT_SET_EXT_KEY_END,
 	NFT_SET_EXT_DATA,
 	NFT_SET_EXT_FLAGS,
 	NFT_SET_EXT_TIMEOUT,
@@ -606,6 +613,11 @@ static inline struct nft_data *nft_set_ext_key(const struct nft_set_ext *ext)
 	return nft_set_ext(ext, NFT_SET_EXT_KEY);
 }
 
+static inline struct nft_data *nft_set_ext_key_end(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_KEY_END);
+}
+
 static inline struct nft_data *nft_set_ext_data(const struct nft_set_ext *ext)
 {
 	return nft_set_ext(ext, NFT_SET_EXT_DATA);
@@ -655,7 +667,7 @@ static inline struct nft_object **nft_set_ext_obj(const struct nft_set_ext *ext)
 
 void *nft_set_elem_init(const struct nft_set *set,
 			const struct nft_set_ext_tmpl *tmpl,
-			const u32 *key, const u32 *data,
+			const u32 *key, const u32 *key_end, const u32 *data,
 			u64 timeout, u64 expiration, gfp_t gfp);
 void nft_set_elem_destroy(const struct nft_set *set, void *elem,
 			  bool destroy_expr);
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 261864736b26..c13106496bd2 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -370,6 +370,7 @@ enum nft_set_elem_flags {
  * @NFTA_SET_ELEM_USERDATA: user data (NLA_BINARY)
  * @NFTA_SET_ELEM_EXPR: expression (NLA_NESTED: nft_expr_attributes)
  * @NFTA_SET_ELEM_OBJREF: stateful object reference (NLA_STRING)
+ * @NFTA_SET_ELEM_KEY_END: closing key value (NLA_NESTED: nft_data)
  */
 enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_UNSPEC,
@@ -382,6 +383,7 @@ enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_EXPR,
 	NFTA_SET_ELEM_PAD,
 	NFTA_SET_ELEM_OBJREF,
+	NFTA_SET_ELEM_KEY_END,
 	__NFTA_SET_ELEM_MAX
 };
 #define NFTA_SET_ELEM_MAX	(__NFTA_SET_ELEM_MAX - 1)
-- 
cgit v1.2.3


From f3a2181e16f1dcbf5446ed43f6b5d9f56c459f85 Mon Sep 17 00:00:00 2001
From: Stefano Brivio <sbrivio@redhat.com>
Date: Wed, 22 Jan 2020 00:17:53 +0100
Subject: netfilter: nf_tables: Support for sets with multiple ranged fields

Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |  8 ++++++++
 include/uapi/linux/netfilter/nf_tables.h | 15 +++++++++++++++
 2 files changed, 23 insertions(+)

(limited to 'include')

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 504c0aa93805..4170c033d461 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -264,11 +264,15 @@ struct nft_set_iter {
  *	@klen: key length
  *	@dlen: data length
  *	@size: number of set elements
+ *	@field_len: length of each field in concatenation, bytes
+ *	@field_count: number of concatenated fields in element
  */
 struct nft_set_desc {
 	unsigned int		klen;
 	unsigned int		dlen;
 	unsigned int		size;
+	u8			field_len[NFT_REG32_COUNT];
+	u8			field_count;
 };
 
 /**
@@ -409,6 +413,8 @@ void nft_unregister_set(struct nft_set_type *type);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@objtype: object type (see NFT_OBJECT_* definitions)
  * 	@size: maximum set size
+ *	@field_len: length of each field in concatenation, bytes
+ *	@field_count: number of concatenated fields in element
  *	@use: number of rules references to this set
  * 	@nelems: number of elements
  * 	@ndeact: number of deactivated elements queued for removal
@@ -435,6 +441,8 @@ struct nft_set {
 	u32				dtype;
 	u32				objtype;
 	u32				size;
+	u8				field_len[NFT_REG32_COUNT];
+	u8				field_count;
 	u32				use;
 	atomic_t			nelems;
 	u32				ndeact;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index c13106496bd2..065218a20bb7 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -48,6 +48,7 @@ enum nft_registers {
 
 #define NFT_REG_SIZE	16
 #define NFT_REG32_SIZE	4
+#define NFT_REG32_COUNT	(NFT_REG32_15 - NFT_REG32_00 + 1)
 
 /**
  * enum nft_verdicts - nf_tables internal verdicts
@@ -301,14 +302,28 @@ enum nft_set_policies {
  * enum nft_set_desc_attributes - set element description
  *
  * @NFTA_SET_DESC_SIZE: number of elements in set (NLA_U32)
+ * @NFTA_SET_DESC_CONCAT: description of field concatenation (NLA_NESTED)
  */
 enum nft_set_desc_attributes {
 	NFTA_SET_DESC_UNSPEC,
 	NFTA_SET_DESC_SIZE,
+	NFTA_SET_DESC_CONCAT,
 	__NFTA_SET_DESC_MAX
 };
 #define NFTA_SET_DESC_MAX	(__NFTA_SET_DESC_MAX - 1)
 
+/**
+ * enum nft_set_field_attributes - attributes of concatenated fields
+ *
+ * @NFTA_SET_FIELD_LEN: length of single field, in bits (NLA_U32)
+ */
+enum nft_set_field_attributes {
+	NFTA_SET_FIELD_UNSPEC,
+	NFTA_SET_FIELD_LEN,
+	__NFTA_SET_FIELD_MAX
+};
+#define NFTA_SET_FIELD_MAX	(__NFTA_SET_FIELD_MAX - 1)
+
 /**
  * enum nft_set_attributes - nf_tables set netlink attributes
  *
-- 
cgit v1.2.3


From 2092767168f0681aa03727448b801600a364c013 Mon Sep 17 00:00:00 2001
From: Stefano Brivio <sbrivio@redhat.com>
Date: Wed, 22 Jan 2020 00:17:54 +0100
Subject: bitmap: Introduce bitmap_cut(): cut bits and shift remaining

The new bitmap function bitmap_cut() copies bits from source to
destination by removing the region specified by parameters first
and cut, and remapping the bits above the cut region by right
shifting them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/bitmap.h | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'include')

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index ff335b22f23c..f0f3a9fffa6a 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -53,6 +53,7 @@
  *  bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
  *  bitmap_shift_right(dst, src, n, nbits)      *dst = *src >> n
  *  bitmap_shift_left(dst, src, n, nbits)       *dst = *src << n
+ *  bitmap_cut(dst, src, first, n, nbits)       Cut n bits from first, copy rest
  *  bitmap_replace(dst, old, new, mask, nbits)  *dst = (*old & ~(*mask)) | (*new & *mask)
  *  bitmap_remap(dst, src, old, new, nbits)     *dst = map(old, new)(src)
  *  bitmap_bitremap(oldbit, old, new, nbits)    newbit = map(old, new)(oldbit)
@@ -133,6 +134,9 @@ extern void __bitmap_shift_right(unsigned long *dst, const unsigned long *src,
 				unsigned int shift, unsigned int nbits);
 extern void __bitmap_shift_left(unsigned long *dst, const unsigned long *src,
 				unsigned int shift, unsigned int nbits);
+extern void bitmap_cut(unsigned long *dst, const unsigned long *src,
+		       unsigned int first, unsigned int cut,
+		       unsigned int nbits);
 extern int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
 			const unsigned long *bitmap2, unsigned int nbits);
 extern void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
-- 
cgit v1.2.3


From 3c4287f62044a90e73a561aa05fc46e62da173da Mon Sep 17 00:00:00 2001
From: Stefano Brivio <sbrivio@redhat.com>
Date: Wed, 22 Jan 2020 00:17:55 +0100
Subject: nf_tables: Add set type for arbitrary concatenation of ranges

This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.

Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
	http://www.cse.usf.edu/~ligatti/projects/grouper/

Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.

In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.

The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.

A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.

A stand-alone implementation of this algorithm is available at:
	https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).

This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.

At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):

TEST: performance
  net,port                                                      [ OK ]
    baseline (drop from netdev hook):              10190076pps
    baseline hash (non-ranged entries):             6179564pps
    baseline rbtree (match on first field only):    2950341pps
    set with  1000 full, ranged entries:            2304165pps
  port,net                                                      [ OK ]
    baseline (drop from netdev hook):              10143615pps
    baseline hash (non-ranged entries):             6135776pps
    baseline rbtree (match on first field only):    4311934pps
    set with   100 full, ranged entries:            4131471pps
  net6,port                                                     [ OK ]
    baseline (drop from netdev hook):               9730404pps
    baseline hash (non-ranged entries):             4809557pps
    baseline rbtree (match on first field only):    1501699pps
    set with  1000 full, ranged entries:            1092557pps
  port,proto                                                    [ OK ]
    baseline (drop from netdev hook):              10812426pps
    baseline hash (non-ranged entries):             6929353pps
    baseline rbtree (match on first field only):    3027105pps
    set with 30000 full, ranged entries:             284147pps
  net6,port,mac                                                 [ OK ]
    baseline (drop from netdev hook):               9660114pps
    baseline hash (non-ranged entries):             3778877pps
    baseline rbtree (match on first field only):    3179379pps
    set with    10 full, ranged entries:            2082880pps
  net6,port,mac,proto                                           [ OK ]
    baseline (drop from netdev hook):               9718324pps
    baseline hash (non-ranged entries):             3799021pps
    baseline rbtree (match on first field only):    1506689pps
    set with  1000 full, ranged entries:             783810pps
  net,mac                                                       [ OK ]
    baseline (drop from netdev hook):              10190029pps
    baseline hash (non-ranged entries):             5172218pps
    baseline rbtree (match on first field only):    2946863pps
    set with  1000 full, ranged entries:            1279122pps

v4:
 - fix build for 32-bit architectures: 64-bit division needs
   div_u64() (kbuild test robot <lkp@intel.com>)
v3:
 - rework interface for field length specification,
   NFT_SET_SUBKEY disappears and information is stored in
   description
 - remove scratch area to store closing element of ranges,
   as elements now come with an actual attribute to specify
   the upper range limit (Pablo Neira Ayuso)
 - also remove pointer to 'start' element from mapping table,
   closing key is now accessible via extension data
 - use bytes right away instead of bits for field lengths,
   this way we can also double the inner loop of the lookup
   function to take care of upper and lower bits in a single
   iteration (minor performance improvement)
 - make it clearer that set operations are actually atomic
   API-wise, but we can't e.g. implement flush() as one-shot
   action
 - fix type for 'dup' in nft_pipapo_insert(), check for
   duplicates only in the next generation, and in general take
   care of differentiating generation mask cases depending on
   the operation (Pablo Neira Ayuso)
 - report C implementation matching rate in commit message, so
   that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
 - protect access to scratch maps in nft_pipapo_lookup() with
   local_bh_disable/enable() (Florian Westphal)
 - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
   already implied (Florian Westphal)
 - explain why partial allocation failures don't need handling
   in pipapo_realloc_scratch(), rename 'm' to clone and update
   related kerneldoc to make it clear we're not operating on
   the live copy (Florian Westphal)
 - add expicit check for priv->start_elem in
   nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
   with a NULL start element, and also zero it out in every
   operation that might make it invalid, so that insertion
   doesn't proceed with an invalid element (Florian Westphal)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables_core.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 2656155b4069..29e7e1021267 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -74,6 +74,7 @@ extern struct nft_set_type nft_set_hash_type;
 extern struct nft_set_type nft_set_hash_fast_type;
 extern struct nft_set_type nft_set_rbtree_type;
 extern struct nft_set_type nft_set_bitmap_type;
+extern struct nft_set_type nft_set_pipapo_type;
 
 struct nft_expr;
 struct nft_regs;
-- 
cgit v1.2.3


From 2e24cd755552350b94a7617617c6877b8cbcb701 Mon Sep 17 00:00:00 2001
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu, 23 Jan 2020 16:26:18 -0800
Subject: net_sched: fix ops->bind_class() implementations

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/pkt_cls.h     | 33 +++++++++++++++++++--------------
 include/net/sch_generic.h |  3 ++-
 2 files changed, 21 insertions(+), 15 deletions(-)

(limited to 'include')

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ce036492986a..a972244ab193 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -141,31 +141,38 @@ __cls_set_class(unsigned long *clp, unsigned long cl)
 	return xchg(clp, cl);
 }
 
-static inline unsigned long
-cls_set_class(struct Qdisc *q, unsigned long *clp, unsigned long cl)
+static inline void
+__tcf_bind_filter(struct Qdisc *q, struct tcf_result *r, unsigned long base)
 {
-	unsigned long old_cl;
+	unsigned long cl;
 
-	sch_tree_lock(q);
-	old_cl = __cls_set_class(clp, cl);
-	sch_tree_unlock(q);
-	return old_cl;
+	cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
+	cl = __cls_set_class(&r->class, cl);
+	if (cl)
+		q->ops->cl_ops->unbind_tcf(q, cl);
 }
 
 static inline void
 tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
 {
 	struct Qdisc *q = tp->chain->block->q;
-	unsigned long cl;
 
 	/* Check q as it is not set for shared blocks. In that case,
 	 * setting class is not supported.
 	 */
 	if (!q)
 		return;
-	cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
-	cl = cls_set_class(q, &r->class, cl);
-	if (cl)
+	sch_tree_lock(q);
+	__tcf_bind_filter(q, r, base);
+	sch_tree_unlock(q);
+}
+
+static inline void
+__tcf_unbind_filter(struct Qdisc *q, struct tcf_result *r)
+{
+	unsigned long cl;
+
+	if ((cl = __cls_set_class(&r->class, 0)) != 0)
 		q->ops->cl_ops->unbind_tcf(q, cl);
 }
 
@@ -173,12 +180,10 @@ static inline void
 tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
 {
 	struct Qdisc *q = tp->chain->block->q;
-	unsigned long cl;
 
 	if (!q)
 		return;
-	if ((cl = __cls_set_class(&r->class, 0)) != 0)
-		q->ops->cl_ops->unbind_tcf(q, cl);
+	__tcf_unbind_filter(q, r);
 }
 
 struct tcf_exts {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf89592a..151208704ed2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -318,7 +318,8 @@ struct tcf_proto_ops {
 					  void *type_data);
 	void			(*hw_del)(struct tcf_proto *tp,
 					  void *type_data);
-	void			(*bind_class)(void *, u32, unsigned long);
+	void			(*bind_class)(void *, u32, unsigned long,
+					      void *, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
 						struct tcf_chain *chain,
 						struct nlattr **tca,
-- 
cgit v1.2.3


From 3b33583265ed3b0ae76eddbabf9d038b4076d1a9 Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Sat, 25 Jan 2020 11:26:42 +0100
Subject: net: Add fraglist GRO/GSO feature flags

This adds new Fraglist GRO/GSO feature flags. They will be used
to configure fraglist GRO/GSO what will be implemented with some
followup paches.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdev_features.h | 6 +++++-
 include/linux/netdevice.h       | 1 +
 include/linux/skbuff.h          | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 4b19c544c59a..b239507da2a0 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -53,8 +53,9 @@ enum {
 	NETIF_F_GSO_ESP_BIT,		/* ... ESP with TSO */
 	NETIF_F_GSO_UDP_BIT,		/* ... UFO, deprecated except tuntap */
 	NETIF_F_GSO_UDP_L4_BIT,		/* ... UDP payload GSO (not UFO) */
+	NETIF_F_GSO_FRAGLIST_BIT,		/* ... Fraglist GSO */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_UDP_L4_BIT,
+		NETIF_F_GSO_FRAGLIST_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
@@ -80,6 +81,7 @@ enum {
 
 	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
 	NETIF_F_HW_TLS_RECORD_BIT,	/* Offload TLS record */
+	NETIF_F_GRO_FRAGLIST_BIT,	/* Fraglist GRO */
 
 	/*
 	 * Add your fresh new feature above and remember to update
@@ -150,6 +152,8 @@ enum {
 #define NETIF_F_GSO_UDP_L4	__NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX	__NETIF_F(HW_TLS_TX)
 #define NETIF_F_HW_TLS_RX	__NETIF_F(HW_TLS_RX)
+#define NETIF_F_GRO_FRAGLIST	__NETIF_F(GRO_FRAGLIST)
+#define NETIF_F_GSO_FRAGLIST	__NETIF_F(GSO_FRAGLIST)
 
 /* Finds the next feature with the highest number of the range of start till 0.
  */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 78e9c6c1b131..fcc76b890f50 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4570,6 +4570,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 26beae7db264..23aaaf08e1e9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -592,6 +592,8 @@ enum {
 	SKB_GSO_UDP = 1 << 16,
 
 	SKB_GSO_UDP_L4 = 1 << 17,
+
+	SKB_GSO_FRAGLIST = 1 << 18,
 };
 
 #if BITS_PER_LONG > 32
-- 
cgit v1.2.3


From 1a3c998f3a27ab6ecf56bdbb17e27e55fd6d47cd Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Sat, 25 Jan 2020 11:26:43 +0100
Subject: net: Add a netdev software feature set that defaults to off.

The previous patch added the NETIF_F_GRO_FRAGLIST feature.
This is a software feature that should default to off.
Current software features default to on, so add a new
feature set that defaults to off.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdev_features.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b239507da2a0..34d050bb1ae6 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -230,6 +230,9 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
 /* changeable features with no special hardware requirements */
 #define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
 
+/* Changeable features with no special hardware requirements that defaults to off. */
+#define NETIF_F_SOFT_FEATURES_OFF	NETIF_F_GRO_FRAGLIST
+
 #define NETIF_F_VLAN_FEATURES	(NETIF_F_HW_VLAN_CTAG_FILTER | \
 				 NETIF_F_HW_VLAN_CTAG_RX | \
 				 NETIF_F_HW_VLAN_CTAG_TX | \
-- 
cgit v1.2.3


From 3a1296a38d0cf62bffb9a03c585cbd5dbf15d596 Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Sat, 25 Jan 2020 11:26:44 +0100
Subject: net: Support GRO/GSO fraglist chaining.

This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h | 4 +++-
 include/linux/skbuff.h    | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcc76b890f50..20445f94eb1c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2326,7 +2326,8 @@ struct napi_gro_cb {
 	/* Number of gro_receive callbacks this packet already went through */
 	u8 recursion_counter:4;
 
-	/* 1 bit hole */
+	/* GRO is done by frag_list pointer chaining. */
+	u8	is_flist:1;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
@@ -2694,6 +2695,7 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 int netdev_get_name(struct net *net, char *name, int ifindex);
 int dev_restart(struct net_device *dev);
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb);
 
 static inline unsigned int skb_gro_offset(const struct sk_buff *skb)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 23aaaf08e1e9..3d13a4b717e9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3535,6 +3535,8 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
+struct sk_buff *skb_segment_list(struct sk_buff *skb, netdev_features_t features,
+				 unsigned int offset);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
 int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci);
-- 
cgit v1.2.3


From 9fd1ff5d2ac7181844735806b0a703c942365291 Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Sat, 25 Jan 2020 11:26:45 +0100
Subject: udp: Support UDP fraglist GRO/GSO.

This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
If the feature is enabled, all UDP packets are going to
fraglist GRO (local input and forward).

After validating the csum,  we mark ip_summed as
CHECKSUM_UNNECESSARY for fraglist GRO packets to
make sure that the csum is not touched.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/udp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'include')

diff --git a/include/net/udp.h b/include/net/udp.h
index bad74f780831..44e0e52b585c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -167,7 +167,7 @@ typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
 				     __be16 dport);
 
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
-				struct udphdr *uh, udp_lookup_t lookup);
+				struct udphdr *uh, struct sock *sk);
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
-- 
cgit v1.2.3


From 93642e14bd50e59b11cf6389ce3fc243e932777a Mon Sep 17 00:00:00 2001
From: Jiri Pirko <jiri@mellanox.com>
Date: Sat, 25 Jan 2020 12:17:08 +0100
Subject: net: introduce dev_net notifier register/unregister variants

Introduce dev_net variants of netdev notifier register/unregister functions
and allow per-net notifier to follow the netdevice into the namespace it is
moved to.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 20445f94eb1c..4626188a754b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -939,6 +939,11 @@ struct netdev_name_node {
 int netdev_name_node_alt_create(struct net_device *dev, const char *name);
 int netdev_name_node_alt_destroy(struct net_device *dev, const char *name);
 
+struct netdev_net_notifier {
+	struct list_head list;
+	struct notifier_block *nb;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1793,6 +1798,10 @@ enum netdev_priv_flags {
  *
  *	@wol_enabled:	Wake-on-LAN is enabled
  *
+ *	@net_notifier_list:	List of per-net netdev notifier block
+ *				that follow this device when it is moved
+ *				to another network namespace.
+ *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
  */
@@ -2085,6 +2094,8 @@ struct net_device {
 	struct lock_class_key	addr_list_lock_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
+
+	struct list_head	net_notifier_list;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -2529,6 +2540,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb);
 int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb);
 int unregister_netdevice_notifier_net(struct net *net,
 				      struct notifier_block *nb);
+int register_netdevice_notifier_dev_net(struct net_device *dev,
+					struct notifier_block *nb,
+					struct netdev_net_notifier *nn);
+int unregister_netdevice_notifier_dev_net(struct net_device *dev,
+					  struct notifier_block *nb,
+					  struct netdev_net_notifier *nn);
 
 struct netdev_notifier_info {
 	struct net_device	*dev;
-- 
cgit v1.2.3


From a85dd3a5170c8812cd835ea968ccadf0ebf1648e Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 25 Jan 2020 13:42:14 +0100
Subject: net: remove eth_change_mtu

All usage of this function was removed three years ago, and the
function was marked as deprecated:
a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage")
So I think we can remove it now.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/etherdevice.h | 1 -
 1 file changed, 1 deletion(-)

(limited to 'include')

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index f6564b572d77..8801f1f986e5 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -43,7 +43,6 @@ __be16 eth_header_parse_protocol(const struct sk_buff *skb);
 int eth_prepare_mac_addr_change(struct net_device *dev, void *p);
 void eth_commit_mac_addr_change(struct net_device *dev, void *p);
 int eth_mac_addr(struct net_device *dev, void *p);
-int eth_change_mtu(struct net_device *dev, int new_mtu);
 int eth_validate_addr(struct net_device *dev);
 
 struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
-- 
cgit v1.2.3


From 6a94b8ccf6b77f005ab1b36a878e1d81df0c033e Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:04 +0100
Subject: ethtool: provide message mask with DEBUG_GET request

Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h            | 56 ++++++++++++++++++++++++++----------
 include/uapi/linux/ethtool.h         |  2 ++
 include/uapi/linux/ethtool_netlink.h | 14 +++++++++
 3 files changed, 57 insertions(+), 15 deletions(-)

(limited to 'include')

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4626188a754b..a9c6b5c61d27 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3913,22 +3913,48 @@ void netif_device_attach(struct net_device *dev);
  */
 
 enum {
-	NETIF_MSG_DRV		= 0x0001,
-	NETIF_MSG_PROBE		= 0x0002,
-	NETIF_MSG_LINK		= 0x0004,
-	NETIF_MSG_TIMER		= 0x0008,
-	NETIF_MSG_IFDOWN	= 0x0010,
-	NETIF_MSG_IFUP		= 0x0020,
-	NETIF_MSG_RX_ERR	= 0x0040,
-	NETIF_MSG_TX_ERR	= 0x0080,
-	NETIF_MSG_TX_QUEUED	= 0x0100,
-	NETIF_MSG_INTR		= 0x0200,
-	NETIF_MSG_TX_DONE	= 0x0400,
-	NETIF_MSG_RX_STATUS	= 0x0800,
-	NETIF_MSG_PKTDATA	= 0x1000,
-	NETIF_MSG_HW		= 0x2000,
-	NETIF_MSG_WOL		= 0x4000,
+	NETIF_MSG_DRV_BIT,
+	NETIF_MSG_PROBE_BIT,
+	NETIF_MSG_LINK_BIT,
+	NETIF_MSG_TIMER_BIT,
+	NETIF_MSG_IFDOWN_BIT,
+	NETIF_MSG_IFUP_BIT,
+	NETIF_MSG_RX_ERR_BIT,
+	NETIF_MSG_TX_ERR_BIT,
+	NETIF_MSG_TX_QUEUED_BIT,
+	NETIF_MSG_INTR_BIT,
+	NETIF_MSG_TX_DONE_BIT,
+	NETIF_MSG_RX_STATUS_BIT,
+	NETIF_MSG_PKTDATA_BIT,
+	NETIF_MSG_HW_BIT,
+	NETIF_MSG_WOL_BIT,
+
+	/* When you add a new bit above, update netif_msg_class_names array
+	 * in net/ethtool/common.c
+	 */
+	NETIF_MSG_CLASS_COUNT,
 };
+/* Both ethtool_ops interface and internal driver implementation use u32 */
+static_assert(NETIF_MSG_CLASS_COUNT <= 32);
+
+#define __NETIF_MSG_BIT(bit)	((u32)1 << (bit))
+#define __NETIF_MSG(name)	__NETIF_MSG_BIT(NETIF_MSG_ ## name ## _BIT)
+
+#define NETIF_MSG_DRV		__NETIF_MSG(DRV)
+#define NETIF_MSG_PROBE		__NETIF_MSG(PROBE)
+#define NETIF_MSG_LINK		__NETIF_MSG(LINK)
+#define NETIF_MSG_TIMER		__NETIF_MSG(TIMER)
+#define NETIF_MSG_IFDOWN	__NETIF_MSG(IFDOWN)
+#define NETIF_MSG_IFUP		__NETIF_MSG(IFUP)
+#define NETIF_MSG_RX_ERR	__NETIF_MSG(RX_ERR)
+#define NETIF_MSG_TX_ERR	__NETIF_MSG(TX_ERR)
+#define NETIF_MSG_TX_QUEUED	__NETIF_MSG(TX_QUEUED)
+#define NETIF_MSG_INTR		__NETIF_MSG(INTR)
+#define NETIF_MSG_TX_DONE	__NETIF_MSG(TX_DONE)
+#define NETIF_MSG_RX_STATUS	__NETIF_MSG(RX_STATUS)
+#define NETIF_MSG_PKTDATA	__NETIF_MSG(PKTDATA)
+#define NETIF_MSG_HW		__NETIF_MSG(HW)
+#define NETIF_MSG_WOL		__NETIF_MSG(WOL)
 
 #define netif_msg_drv(p)	((p)->msg_enable & NETIF_MSG_DRV)
 #define netif_msg_probe(p)	((p)->msg_enable & NETIF_MSG_PROBE)
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 116bcbf09c74..456fb2aa0fad 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -594,6 +594,7 @@ struct ethtool_pauseparam {
  * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
  * @ETH_SS_PHY_TUNABLES: PHY tunable names
  * @ETH_SS_LINK_MODES: link mode names
+ * @ETH_SS_MSG_CLASSES: debug message class names
  */
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -606,6 +607,7 @@ enum ethtool_stringset {
 	ETH_SS_PHY_STATS,
 	ETH_SS_PHY_TUNABLES,
 	ETH_SS_LINK_MODES,
+	ETH_SS_MSG_CLASSES,
 
 	/* add new constants above here */
 	ETH_SS_COUNT
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 02f82f42a889..0d39e04567cb 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -20,6 +20,7 @@ enum {
 	ETHTOOL_MSG_LINKMODES_GET,
 	ETHTOOL_MSG_LINKMODES_SET,
 	ETHTOOL_MSG_LINKSTATE_GET,
+	ETHTOOL_MSG_DEBUG_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -35,6 +36,7 @@ enum {
 	ETHTOOL_MSG_LINKMODES_GET_REPLY,
 	ETHTOOL_MSG_LINKMODES_NTF,
 	ETHTOOL_MSG_LINKSTATE_GET_REPLY,
+	ETHTOOL_MSG_DEBUG_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -195,6 +197,18 @@ enum {
 	ETHTOOL_A_LINKSTATE_MAX = __ETHTOOL_A_LINKSTATE_CNT - 1
 };
 
+/* DEBUG */
+
+enum {
+	ETHTOOL_A_DEBUG_UNSPEC,
+	ETHTOOL_A_DEBUG_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_DEBUG_MSGMASK,		/* bitset */
+
+	/* add new constants above here */
+	__ETHTOOL_A_DEBUG_CNT,
+	ETHTOOL_A_DEBUG_MAX = __ETHTOOL_A_DEBUG_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From e54d04e3afead22d8e7d6edaaac487a1205bac39 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:07 +0100
Subject: ethtool: set message mask with DEBUG_SET request

Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 0d39e04567cb..8f0b6fd277d5 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -21,6 +21,7 @@ enum {
 	ETHTOOL_MSG_LINKMODES_SET,
 	ETHTOOL_MSG_LINKSTATE_GET,
 	ETHTOOL_MSG_DEBUG_GET,
+	ETHTOOL_MSG_DEBUG_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
-- 
cgit v1.2.3


From 0bda7af39d2ba6ef83bd18e26bcb027f1630004e Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:10 +0100
Subject: ethtool: add DEBUG_NTF notification

Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.

The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 8f0b6fd277d5..67a06b94bf28 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -38,6 +38,7 @@ enum {
 	ETHTOOL_MSG_LINKMODES_NTF,
 	ETHTOOL_MSG_LINKSTATE_GET_REPLY,
 	ETHTOOL_MSG_DEBUG_GET_REPLY,
+	ETHTOOL_MSG_DEBUG_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
-- 
cgit v1.2.3


From 51ea22b04ea0c210f9ce87b8a600965dbe476bc2 Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:13 +0100
Subject: ethtool: provide WoL settings with WOL_GET request

Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.

As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool.h         |  4 ++++
 include/uapi/linux/ethtool_netlink.h | 15 +++++++++++++++
 2 files changed, 19 insertions(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 456fb2aa0fad..4295ebfa2f91 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -595,6 +595,7 @@ struct ethtool_pauseparam {
  * @ETH_SS_PHY_TUNABLES: PHY tunable names
  * @ETH_SS_LINK_MODES: link mode names
  * @ETH_SS_MSG_CLASSES: debug message class names
+ * @ETH_SS_WOL_MODES: wake-on-lan modes
  */
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -608,6 +609,7 @@ enum ethtool_stringset {
 	ETH_SS_PHY_TUNABLES,
 	ETH_SS_LINK_MODES,
 	ETH_SS_MSG_CLASSES,
+	ETH_SS_WOL_MODES,
 
 	/* add new constants above here */
 	ETH_SS_COUNT
@@ -1695,6 +1697,8 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define WAKE_MAGICSECURE	(1 << 6) /* only meaningful if WAKE_MAGIC */
 #define WAKE_FILTER		(1 << 7)
 
+#define WOL_MODE_COUNT		8
+
 /* L2-L4 network traffic flow types */
 #define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
 #define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 67a06b94bf28..dcc5c32dc018 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -22,6 +22,7 @@ enum {
 	ETHTOOL_MSG_LINKSTATE_GET,
 	ETHTOOL_MSG_DEBUG_GET,
 	ETHTOOL_MSG_DEBUG_SET,
+	ETHTOOL_MSG_WOL_GET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -39,6 +40,7 @@ enum {
 	ETHTOOL_MSG_LINKSTATE_GET_REPLY,
 	ETHTOOL_MSG_DEBUG_GET_REPLY,
 	ETHTOOL_MSG_DEBUG_NTF,
+	ETHTOOL_MSG_WOL_GET_REPLY,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -211,6 +213,19 @@ enum {
 	ETHTOOL_A_DEBUG_MAX = __ETHTOOL_A_DEBUG_CNT - 1
 };
 
+/* WOL */
+
+enum {
+	ETHTOOL_A_WOL_UNSPEC,
+	ETHTOOL_A_WOL_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_WOL_MODES,			/* bitset */
+	ETHTOOL_A_WOL_SOPASS,			/* binary */
+
+	/* add new constants above here */
+	__ETHTOOL_A_WOL_CNT,
+	ETHTOOL_A_WOL_MAX = __ETHTOOL_A_WOL_CNT - 1
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
-- 
cgit v1.2.3


From 8d425b19b305a77cb1ba67b84e7c1ca547de032f Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:16 +0100
Subject: ethtool: set wake-on-lan settings with WOL_SET request

Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index dcc5c32dc018..59de35695521 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -23,6 +23,7 @@ enum {
 	ETHTOOL_MSG_DEBUG_GET,
 	ETHTOOL_MSG_DEBUG_SET,
 	ETHTOOL_MSG_WOL_GET,
+	ETHTOOL_MSG_WOL_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
-- 
cgit v1.2.3


From 67bffa79231f15ba372007972563cc8aa5e48bfa Mon Sep 17 00:00:00 2001
From: Michal Kubecek <mkubecek@suse.cz>
Date: Sun, 26 Jan 2020 23:11:19 +0100
Subject: ethtool: add WOL_NTF notification

Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.

As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/uapi/linux/ethtool_netlink.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 59de35695521..7e0b460f872c 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -42,6 +42,7 @@ enum {
 	ETHTOOL_MSG_DEBUG_GET_REPLY,
 	ETHTOOL_MSG_DEBUG_NTF,
 	ETHTOOL_MSG_WOL_GET_REPLY,
+	ETHTOOL_MSG_WOL_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
-- 
cgit v1.2.3


From 41c0d917d11edf9a34461f56e14f067aadb36101 Mon Sep 17 00:00:00 2001
From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Date: Mon, 27 Jan 2020 04:56:25 -0500
Subject: devlink: add macro for "fw.roce"

Add definition and documentation for the new generic info "fw.roce".

v2: Remove board.nvm_cfg since fw.psid is similar.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'include')

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5e46c24bb6e6..ce5cea428fdc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -487,6 +487,8 @@ enum devlink_param_generic_id {
 #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI	"fw.ncsi"
 /* FW parameter set id */
 #define DEVLINK_INFO_VERSION_GENERIC_FW_PSID	"fw.psid"
+/* RoCE FW version */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_ROCE	"fw.roce"
 
 struct devlink_region;
 struct devlink_info_req;
-- 
cgit v1.2.3


From 2924e0699963b839f88f8c4e855929ea49185870 Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:07 +0200
Subject: qed: FW 8.42.2.0 Internal ram offsets modifications

IRO stands for internal RAM offsets. Updating the FW binary produces
different iro offsets. This file contains the different values,
and a new representation of the values.
Update the FW version

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/common_hsi.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index 03f59a28fefd..3f437e826a4c 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -109,8 +109,8 @@
 #define MAX_NUM_LL2_TX_STATS_COUNTERS	48
 
 #define FW_MAJOR_VERSION	8
-#define FW_MINOR_VERSION        37
-#define FW_REVISION_VERSION     7
+#define FW_MINOR_VERSION	42
+#define FW_REVISION_VERSION	2
 #define FW_ENGINEERING_VERSION	0
 
 /***********************/
-- 
cgit v1.2.3


From 997af5df230e3288ec1f5b332955f9be643e450b Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:12 +0200
Subject: qed: FW 8.42.2.0 Additional ll2 type

LL2 queues were a limited resource due to FW constraints.
This FW introduced a new resource which is a context based ll2 queue
(memory on host). The additional ll2 queues are required for RDMA SRIOV.
The code refers to the previous ll2 queues as ram-based or legacy, and the
new queues as ctx-based.
This change decreased the "legacy" ram-based queues therefore the first ll2
queue used for iWARP was converted to the ctx-based ll2 queue.
This feature also exposed a bug in the DIRECT_REG_WR64 macro implementation
which didn't have an effect in other use cases.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/common_hsi.h | 15 +++++++++++++--
 include/linux/qed/qed_if.h     |  4 +++-
 include/linux/qed/qed_ll2_if.h |  7 +++++++
 3 files changed, 23 insertions(+), 3 deletions(-)

(limited to 'include')

diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index 3f437e826a4c..a2b7826b36f0 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -105,8 +105,15 @@
 
 #define CORE_SPQE_PAGE_SIZE_BYTES	4096
 
-#define MAX_NUM_LL2_RX_QUEUES		48
-#define MAX_NUM_LL2_TX_STATS_COUNTERS	48
+/* Number of LL2 RAM based queues */
+#define MAX_NUM_LL2_RX_RAM_QUEUES 32
+
+/* Number of LL2 context based queues */
+#define MAX_NUM_LL2_RX_CTX_QUEUES 208
+#define MAX_NUM_LL2_RX_QUEUES \
+	(MAX_NUM_LL2_RX_RAM_QUEUES + MAX_NUM_LL2_RX_CTX_QUEUES)
+
+#define MAX_NUM_LL2_TX_STATS_COUNTERS  48
 
 #define FW_MAJOR_VERSION	8
 #define FW_MINOR_VERSION	42
@@ -340,6 +347,10 @@
 #define DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD		(DQ_PWM_OFFSET_TCM16_BASE + 1)
 #define DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD		(DQ_PWM_OFFSET_TCM16_BASE + 3)
 
+/* DQ_DEMS_AGG_VAL_BASE */
+#define DQ_PWM_OFFSET_TCM_LL2_PROD_UPDATE \
+	(DQ_PWM_OFFSET_TCM32_BASE + DQ_TCM_AGG_VAL_SEL_REG9 - 4)
+
 #define	DQ_REGION_SHIFT			(12)
 
 /* DPM */
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index b5db1ee96d78..9bcb2f419004 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -463,7 +463,7 @@ enum qed_db_rec_space {
 
 #define DIRECT_REG_RD(reg_addr) readl((void __iomem *)(reg_addr))
 
-#define DIRECT_REG_WR64(reg_addr, val) writeq((u32)val,	\
+#define DIRECT_REG_WR64(reg_addr, val) writeq((u64)val,	\
 					      (void __iomem *)(reg_addr))
 
 #define QED_COALESCE_MAX 0x1FF
@@ -1177,6 +1177,8 @@ struct qed_common_ops {
 #define GET_FIELD(value, name) \
 	(((value) >> (name ## _SHIFT)) & name ## _MASK)
 
+#define DB_ADDR_SHIFT(addr) ((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
+
 /* Debug print definitions */
 #define DP_ERR(cdev, fmt, ...)					\
 	do {							\
diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h
index 5eb022953aca..1313c34d9a68 100644
--- a/include/linux/qed/qed_ll2_if.h
+++ b/include/linux/qed/qed_ll2_if.h
@@ -52,6 +52,12 @@ enum qed_ll2_conn_type {
 	QED_LL2_TYPE_ROCE,
 	QED_LL2_TYPE_IWARP,
 	QED_LL2_TYPE_RESERVED3,
+	MAX_QED_LL2_CONN_TYPE
+};
+
+enum qed_ll2_rx_conn_type {
+	QED_LL2_RX_TYPE_LEGACY,
+	QED_LL2_RX_TYPE_CTX,
 	MAX_QED_LL2_RX_CONN_TYPE
 };
 
@@ -165,6 +171,7 @@ struct qed_ll2_cbs {
 };
 
 struct qed_ll2_acquire_data_inputs {
+	enum qed_ll2_rx_conn_type rx_conn_type;
 	enum qed_ll2_conn_type conn_type;
 	u16 mtu;
 	u16 rx_num_desc;
-- 
cgit v1.2.3


From 1392d19ff1d6ddd370cefa73b552a0262f9c35ea Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:13 +0200
Subject: qed: Add abstraction for different hsi values per chip

The number of BTB blocks was modified to be different between the two chip
flavors supported (BB/K2) as a result, this lead to a re-write of selecting
the default hsi value based on the chip.
This patch creates a lookup table for hsi values per chip rather than
ask again and again for every value.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/common_hsi.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'include')

diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index a2b7826b36f0..718ce72e5965 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -663,8 +663,8 @@
 #define PBF_MAX_CMD_LINES	3328
 
 /* Number of BTB blocks. Each block is 256B. */
-#define BTB_MAX_BLOCKS		1440
-
+#define BTB_MAX_BLOCKS_BB 1440
+#define BTB_MAX_BLOCKS_K2 1840
 /*****************/
 /* PRS CONSTANTS */
 /*****************/
-- 
cgit v1.2.3


From 6459d93619b5bc21f775e7eb12bc4d051743d7aa Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:14 +0200
Subject: qed: FW 8.42.2.0 iscsi/fcoe changes

- Remove struct iscsi_slow_path_hdr and field fw_cid from several structs
- Remove struct iscsi_spe_func_dstry
- Remove fields pbe_page_size_log and pbl_page_size_log from struct
  iscsi_conn_offload_param

Signed-off-by: Manish Rangankar <manish.rangankar@marvell.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@marvell.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/iscsi_common.h   | 64 ++++++++++++++------------------------
 include/linux/qed/storage_common.h |  3 +-
 2 files changed, 26 insertions(+), 41 deletions(-)

(limited to 'include')

diff --git a/include/linux/qed/iscsi_common.h b/include/linux/qed/iscsi_common.h
index 66aba505ec56..2f0a771a9176 100644
--- a/include/linux/qed/iscsi_common.h
+++ b/include/linux/qed/iscsi_common.h
@@ -999,7 +999,6 @@ struct iscsi_conn_offload_params {
 	struct regpair r2tq_pbl_addr;
 	struct regpair xhq_pbl_addr;
 	struct regpair uhq_pbl_addr;
-	__le32 initial_ack;
 	__le16 physical_q0;
 	__le16 physical_q1;
 	u8 flags;
@@ -1011,10 +1010,10 @@ struct iscsi_conn_offload_params {
 #define ISCSI_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT	2
 #define ISCSI_CONN_OFFLOAD_PARAMS_RESERVED1_MASK	0x1F
 #define ISCSI_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT	3
-	u8 pbl_page_size_log;
-	u8 pbe_page_size_log;
 	u8 default_cq;
+	__le16 reserved0;
 	__le32 stat_sn;
+	__le32 initial_ack;
 };
 
 /* iSCSI connection statistics */
@@ -1029,25 +1028,14 @@ struct iscsi_conn_stats_params {
 	__le32 reserved;
 };
 
-/* spe message header */
-struct iscsi_slow_path_hdr {
-	u8 op_code;
-	u8 flags;
-#define ISCSI_SLOW_PATH_HDR_RESERVED0_MASK	0xF
-#define ISCSI_SLOW_PATH_HDR_RESERVED0_SHIFT	0
-#define ISCSI_SLOW_PATH_HDR_LAYER_CODE_MASK	0x7
-#define ISCSI_SLOW_PATH_HDR_LAYER_CODE_SHIFT	4
-#define ISCSI_SLOW_PATH_HDR_RESERVED1_MASK	0x1
-#define ISCSI_SLOW_PATH_HDR_RESERVED1_SHIFT	7
-};
 
 /* iSCSI connection update params passed by driver to FW in ISCSI update
  *ramrod.
  */
 struct iscsi_conn_update_ramrod_params {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	u8 flags;
 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK		0x1
 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT		0
@@ -1065,7 +1053,7 @@ struct iscsi_conn_update_ramrod_params {
 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_DIF_ON_IMM_EN_SHIFT	6
 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_LUN_MAPPER_EN_MASK	0x1
 #define ISCSI_CONN_UPDATE_RAMROD_PARAMS_LUN_MAPPER_EN_SHIFT	7
-	u8 reserved0[3];
+	u8 reserved3[3];
 	__le32 max_seq_size;
 	__le32 max_send_pdu_length;
 	__le32 max_recv_pdu_length;
@@ -1251,22 +1239,22 @@ enum iscsi_ramrod_cmd_id {
 
 /* iSCSI connection termination request */
 struct iscsi_spe_conn_mac_update {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	__le16 remote_mac_addr_lo;
 	__le16 remote_mac_addr_mid;
 	__le16 remote_mac_addr_hi;
-	u8 reserved0[2];
+	u8 reserved2[2];
 };
 
 /* iSCSI and TCP connection (Option 1) offload params passed by driver to FW in
  * iSCSI offload ramrod.
  */
 struct iscsi_spe_conn_offload {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	struct iscsi_conn_offload_params iscsi;
 	struct tcp_offload_params tcp;
 };
@@ -1275,44 +1263,36 @@ struct iscsi_spe_conn_offload {
  * iSCSI offload ramrod.
  */
 struct iscsi_spe_conn_offload_option2 {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	struct iscsi_conn_offload_params iscsi;
 	struct tcp_offload_params_opt2 tcp;
 };
 
 /* iSCSI collect connection statistics request */
 struct iscsi_spe_conn_statistics {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	u8 reset_stats;
-	u8 reserved0[7];
+	u8 reserved2[7];
 	struct regpair stats_cnts_addr;
 };
 
 /* iSCSI connection termination request */
 struct iscsi_spe_conn_termination {
-	struct iscsi_slow_path_hdr hdr;
+	__le16 reserved0;
 	__le16 conn_id;
-	__le32 fw_cid;
+	__le32 reserved1;
 	u8 abortive;
-	u8 reserved0[7];
+	u8 reserved2[7];
 	struct regpair queue_cnts_addr;
 	struct regpair query_params_addr;
 };
 
-/* iSCSI firmware function destroy parameters */
-struct iscsi_spe_func_dstry {
-	struct iscsi_slow_path_hdr hdr;
-	__le16 reserved0;
-	__le32 reserved1;
-};
-
 /* iSCSI firmware function init parameters */
 struct iscsi_spe_func_init {
-	struct iscsi_slow_path_hdr hdr;
 	__le16 half_way_close_timeout;
 	u8 num_sq_pages_in_ring;
 	u8 num_r2tq_pages_in_ring;
@@ -1324,8 +1304,12 @@ struct iscsi_spe_func_init {
 #define ISCSI_SPE_FUNC_INIT_RESERVED0_MASK	0x7F
 #define ISCSI_SPE_FUNC_INIT_RESERVED0_SHIFT	1
 	struct iscsi_debug_modes debug_mode;
-	__le16 reserved1;
-	__le32 reserved2;
+	u8 params;
+#define ISCSI_SPE_FUNC_INIT_MAX_SYN_RT_MASK	0xF
+#define ISCSI_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT	0
+#define ISCSI_SPE_FUNC_INIT_RESERVED1_MASK	0xF
+#define ISCSI_SPE_FUNC_INIT_RESERVED1_SHIFT	4
+	u8 reserved2[7];
 	struct scsi_init_func_params func_params;
 	struct scsi_init_func_queues q_params;
 };
diff --git a/include/linux/qed/storage_common.h b/include/linux/qed/storage_common.h
index 505c0b48a761..9a973ffbbff5 100644
--- a/include/linux/qed/storage_common.h
+++ b/include/linux/qed/storage_common.h
@@ -107,8 +107,9 @@ struct scsi_drv_cmdq {
 struct scsi_init_func_params {
 	__le16 num_tasks;
 	u8 log_page_size;
+	u8 log_page_size_conn;
 	u8 debug_mode;
-	u8 reserved2[12];
+	u8 reserved2[11];
 };
 
 /* SCSI RQ/CQ/CMDQ firmware function init parameters */
-- 
cgit v1.2.3


From 0500a70d6e071040ffdaadebb966986afa83c5e9 Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:15 +0200
Subject: qed: FW 8.42.2.0 HSI changes

This patch contains several HSI changes. The changes are part of
features like RDMA VF and OVS, the patch also contains a fix to
how the init code determines if the dmae is ready to be used.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/common_hsi.h | 21 +++++++-----
 include/linux/qed/eth_common.h | 78 ++++++++++++++++++++++++++++++------------
 2 files changed, 68 insertions(+), 31 deletions(-)

(limited to 'include')

diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h
index 718ce72e5965..2c4737e6694a 100644
--- a/include/linux/qed/common_hsi.h
+++ b/include/linux/qed/common_hsi.h
@@ -76,7 +76,6 @@
 
 #define FW_ASSERT_GENERAL_ATTN_IDX		32
 
-#define MAX_PINNED_CCFC				32
 
 /* Queue Zone sizes in bytes */
 #define TSTORM_QZONE_SIZE	8
@@ -139,10 +138,10 @@
 #define MAX_NUM_VFS	(MAX_NUM_VFS_K2)
 
 #define MAX_NUM_FUNCTIONS_BB	(MAX_NUM_PFS_BB + MAX_NUM_VFS_BB)
-#define MAX_NUM_FUNCTIONS	(MAX_NUM_PFS + MAX_NUM_VFS)
 
 #define MAX_FUNCTION_NUMBER_BB	(MAX_NUM_PFS + MAX_NUM_VFS_BB)
-#define MAX_FUNCTION_NUMBER	(MAX_NUM_PFS + MAX_NUM_VFS)
+#define MAX_FUNCTION_NUMBER_K2  (MAX_NUM_PFS + MAX_NUM_VFS_K2)
+#define MAX_NUM_FUNCTIONS	(MAX_FUNCTION_NUMBER_K2)
 
 #define MAX_NUM_VPORTS_K2	(208)
 #define MAX_NUM_VPORTS_BB	(160)
@@ -229,6 +228,7 @@
 #define DQ_XCM_TOE_TX_BD_PROD_CMD		DQ_XCM_AGG_VAL_SEL_WORD4
 #define DQ_XCM_TOE_MORE_TO_SEND_SEQ_CMD		DQ_XCM_AGG_VAL_SEL_REG3
 #define DQ_XCM_TOE_LOCAL_ADV_WND_SEQ_CMD	DQ_XCM_AGG_VAL_SEL_REG4
+#define DQ_XCM_ROCE_ACK_EDPM_DORQ_SEQ_CMD	DQ_XCM_AGG_VAL_SEL_WORD5
 
 /* UCM agg val selection (HW) */
 #define	DQ_UCM_AGG_VAL_SEL_WORD0	0
@@ -406,6 +406,7 @@
 
 /* Number of Protocol Indices per Status Block */
 #define PIS_PER_SB_E4	12
+#define MAX_PIS_PER_SB	PIS_PER_SB
 
 #define CAU_HC_STOPPED_STATE	3
 #define CAU_HC_DISABLE_STATE	4
@@ -436,8 +437,6 @@
 #define IGU_MEM_PBA_MSIX_RESERVED_UPPER	0x03ff
 
 #define IGU_CMD_INT_ACK_BASE		0x0400
-#define IGU_CMD_INT_ACK_UPPER		(IGU_CMD_INT_ACK_BASE +	\
-					 MAX_TOT_SB_PER_PATH - 1)
 #define IGU_CMD_INT_ACK_RESERVED_UPPER	0x05ff
 
 #define IGU_CMD_ATTN_BIT_UPD_UPPER	0x05f0
@@ -450,8 +449,6 @@
 #define IGU_REG_SISR_MDPC_WOMASK_UPPER		0x05f6
 
 #define IGU_CMD_PROD_UPD_BASE			0x0600
-#define IGU_CMD_PROD_UPD_UPPER			(IGU_CMD_PROD_UPD_BASE +\
-						 MAX_TOT_SB_PER_PATH - 1)
 #define IGU_CMD_PROD_UPD_RESERVED_UPPER		0x07ff
 
 /*****************/
@@ -741,6 +738,8 @@ enum protocol_type {
 	PROTOCOLID_PREROCE,
 	PROTOCOLID_COMMON,
 	PROTOCOLID_RESERVED1,
+	PROTOCOLID_RDMA,
+	PROTOCOLID_SCSI,
 	MAX_PROTOCOL_TYPE
 };
 
@@ -761,6 +760,10 @@ union rdma_eqe_data {
 	struct rdma_eqe_destroy_qp rdma_destroy_qp_data;
 };
 
+struct tstorm_queue_zone {
+	__le32 reserved[2];
+};
+
 /* Ustorm Queue Zone */
 struct ustorm_eth_queue_zone {
 	struct coalescing_timeset int_coalescing_timeset;
@@ -883,8 +886,8 @@ struct db_l2_dpm_data {
 #define DB_L2_DPM_DATA_RESERVED0_SHIFT 27
 #define DB_L2_DPM_DATA_SGE_NUM_MASK	0x7
 #define DB_L2_DPM_DATA_SGE_NUM_SHIFT	28
-#define DB_L2_DPM_DATA_GFS_SRC_EN_MASK	0x1
-#define DB_L2_DPM_DATA_GFS_SRC_EN_SHIFT	31
+#define DB_L2_DPM_DATA_TGFS_SRC_EN_MASK  0x1
+#define DB_L2_DPM_DATA_TGFS_SRC_EN_SHIFT 31
 };
 
 /* Structure for SGE in a DPM doorbell of type DPM_L2_BD */
diff --git a/include/linux/qed/eth_common.h b/include/linux/qed/eth_common.h
index d9416ad5ef59..95f5fd615852 100644
--- a/include/linux/qed/eth_common.h
+++ b/include/linux/qed/eth_common.h
@@ -38,9 +38,11 @@
 /********************/
 
 #define ETH_HSI_VER_MAJOR		3
-#define ETH_HSI_VER_MINOR		10
+#define ETH_HSI_VER_MINOR		11
 
-#define ETH_HSI_VER_NO_PKT_LEN_TUNN	5
+#define ETH_HSI_VER_NO_PKT_LEN_TUNN         5
+/* Maximum number of pinned L2 connections (CIDs) */
+#define ETH_PINNED_CONN_MAX_NUM             32
 
 #define ETH_CACHE_LINE_SIZE		64
 #define ETH_RX_CQE_GAP			32
@@ -61,6 +63,7 @@
 #define ETH_TX_MIN_BDS_PER_TUNN_IPV6_WITH_EXT_PKT	3
 #define ETH_TX_MIN_BDS_PER_IPV6_WITH_EXT_PKT		2
 #define ETH_TX_MIN_BDS_PER_PKT_W_LOOPBACK_MODE		2
+#define ETH_TX_MIN_BDS_PER_PKT_W_VPORT_FORWARDING	4
 #define ETH_TX_MAX_NON_LSO_PKT_LEN		(9700 - (4 + 4 + 12 + 8))
 #define ETH_TX_MAX_LSO_HDR_BYTES			510
 #define ETH_TX_LSO_WINDOW_BDS_NUM			(18 - 1)
@@ -75,9 +78,8 @@
 #define ETH_NUM_STATISTIC_COUNTERS_QUAD_VF_ZONE \
 	(ETH_NUM_STATISTIC_COUNTERS - 3 * MAX_NUM_VFS / 4)
 
-/* Maximum number of buffers, used for RX packet placement */
 #define ETH_RX_MAX_BUFF_PER_PKT		5
-#define ETH_RX_BD_THRESHOLD		12
+#define ETH_RX_BD_THRESHOLD             16
 
 /* Num of MAC/VLAN filters */
 #define ETH_NUM_MAC_FILTERS		512
@@ -96,24 +98,24 @@
 #define ETH_RSS_ENGINE_NUM_BB		127
 
 /* TPA constants */
-#define ETH_TPA_MAX_AGGS_NUM		64
-#define ETH_TPA_CQE_START_LEN_LIST_SIZE	ETH_RX_MAX_BUFF_PER_PKT
-#define ETH_TPA_CQE_CONT_LEN_LIST_SIZE	6
-#define ETH_TPA_CQE_END_LEN_LIST_SIZE	4
+#define ETH_TPA_MAX_AGGS_NUM                64
+#define ETH_TPA_CQE_START_BW_LEN_LIST_SIZE  2
+#define ETH_TPA_CQE_CONT_LEN_LIST_SIZE      6
+#define ETH_TPA_CQE_END_LEN_LIST_SIZE       4
 
 /* Control frame check constants */
-#define ETH_CTL_FRAME_ETH_TYPE_NUM	4
+#define ETH_CTL_FRAME_ETH_TYPE_NUM        4
 
 /* GFS constants */
 #define ETH_GFT_TRASHCAN_VPORT         0x1FF	/* GFT drop flow vport number */
 
 /* Destination port mode */
-enum dest_port_mode {
-	DEST_PORT_PHY,
-	DEST_PORT_LOOPBACK,
-	DEST_PORT_PHY_LOOPBACK,
-	DEST_PORT_DROP,
-	MAX_DEST_PORT_MODE
+enum dst_port_mode {
+	DST_PORT_PHY,
+	DST_PORT_LOOPBACK,
+	DST_PORT_PHY_LOOPBACK,
+	DST_PORT_DROP,
+	MAX_DST_PORT_MODE
 };
 
 /* Ethernet address type */
@@ -167,8 +169,8 @@ struct eth_tx_data_2nd_bd {
 #define ETH_TX_DATA_2ND_BD_TUNN_INNER_L2_HDR_SIZE_W_SHIFT	0
 #define ETH_TX_DATA_2ND_BD_TUNN_INNER_ETH_TYPE_MASK		0x3
 #define ETH_TX_DATA_2ND_BD_TUNN_INNER_ETH_TYPE_SHIFT		4
-#define ETH_TX_DATA_2ND_BD_DEST_PORT_MODE_MASK			0x3
-#define ETH_TX_DATA_2ND_BD_DEST_PORT_MODE_SHIFT			6
+#define ETH_TX_DATA_2ND_BD_DST_PORT_MODE_MASK			0x3
+#define ETH_TX_DATA_2ND_BD_DST_PORT_MODE_SHIFT			6
 #define ETH_TX_DATA_2ND_BD_START_BD_MASK			0x1
 #define ETH_TX_DATA_2ND_BD_START_BD_SHIFT			8
 #define ETH_TX_DATA_2ND_BD_TUNN_TYPE_MASK			0x3
@@ -244,8 +246,9 @@ struct eth_fast_path_rx_reg_cqe {
 	struct eth_tunnel_parsing_flags tunnel_pars_flags;
 	u8 bd_num;
 	u8 reserved;
-	__le16 flow_id;
-	u8 reserved1[11];
+	__le16 reserved2;
+	__le32 flow_id_or_resource_id;
+	u8 reserved1[7];
 	struct eth_pmd_flow_flags pmd_flags;
 };
 
@@ -296,9 +299,10 @@ struct eth_fast_path_rx_tpa_start_cqe {
 	struct eth_tunnel_parsing_flags tunnel_pars_flags;
 	u8 tpa_agg_index;
 	u8 header_len;
-	__le16 ext_bd_len_list[ETH_TPA_CQE_START_LEN_LIST_SIZE];
-	__le16 flow_id;
-	u8 reserved;
+	__le16 bw_ext_bd_len_list[ETH_TPA_CQE_START_BW_LEN_LIST_SIZE];
+	__le16 reserved2;
+	__le32 flow_id_or_resource_id;
+	u8 reserved[3];
 	struct eth_pmd_flow_flags pmd_flags;
 };
 
@@ -407,6 +411,29 @@ struct eth_tx_3rd_bd {
 	struct eth_tx_data_3rd_bd data;
 };
 
+/* The parsing information data for the forth tx bd of a given packet. */
+struct eth_tx_data_4th_bd {
+	u8 dst_vport_id;
+	u8 reserved4;
+	__le16 bitfields;
+#define ETH_TX_DATA_4TH_BD_DST_VPORT_ID_VALID_MASK  0x1
+#define ETH_TX_DATA_4TH_BD_DST_VPORT_ID_VALID_SHIFT 0
+#define ETH_TX_DATA_4TH_BD_RESERVED1_MASK           0x7F
+#define ETH_TX_DATA_4TH_BD_RESERVED1_SHIFT          1
+#define ETH_TX_DATA_4TH_BD_START_BD_MASK            0x1
+#define ETH_TX_DATA_4TH_BD_START_BD_SHIFT           8
+#define ETH_TX_DATA_4TH_BD_RESERVED2_MASK           0x7F
+#define ETH_TX_DATA_4TH_BD_RESERVED2_SHIFT          9
+	__le16 reserved3;
+};
+
+/* The forth tx bd of a given packet */
+struct eth_tx_4th_bd {
+	struct regpair addr; /* Single continuous buffer */
+	__le16 nbytes; /* Number of bytes in this BD */
+	struct eth_tx_data_4th_bd data; /* Parsing information data */
+};
+
 /* Complementary information for the regular tx bd of a given packet */
 struct eth_tx_data_bd {
 	__le16 reserved0;
@@ -431,6 +458,7 @@ union eth_tx_bd_types {
 	struct eth_tx_1st_bd first_bd;
 	struct eth_tx_2nd_bd second_bd;
 	struct eth_tx_3rd_bd third_bd;
+	struct eth_tx_4th_bd fourth_bd;
 	struct eth_tx_bd reg_bd;
 };
 
@@ -443,6 +471,12 @@ enum eth_tx_tunn_type {
 	MAX_ETH_TX_TUNN_TYPE
 };
 
+/* Mstorm Queue Zone */
+struct mstorm_eth_queue_zone {
+	struct eth_rx_prod_data rx_producers;
+	__le32 reserved[3];
+};
+
 /* Ystorm Queue Zone */
 struct xstorm_eth_queue_zone {
 	struct coalescing_timeset int_coalescing_timeset;
-- 
cgit v1.2.3


From 8a52bbab39c9791480cbae86c69ad0d47f62972e Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:17 +0200
Subject: qed: Debug feature: ilt and mdump

Part of the FW drop includes new debug capabilities implemented in the
qed_debug file. This patch dumps additional information during ethtool -d
for better debugging. The data dumped is the ilt (internal logical table)
and information gathered by the management firmware incase there was a
crash and driver was not able to extract the information (mdump).

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/qed_if.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 9bcb2f419004..1b27c22d39af 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -159,6 +159,7 @@ struct qed_dcbx_get {
 enum qed_nvm_images {
 	QED_NVM_IMAGE_ISCSI_CFG,
 	QED_NVM_IMAGE_FCOE_CFG,
+	QED_NVM_IMAGE_MDUMP,
 	QED_NVM_IMAGE_NVM_CFG1,
 	QED_NVM_IMAGE_DEFAULT_CFG,
 	QED_NVM_IMAGE_NVM_META,
-- 
cgit v1.2.3


From 2d22bc8354b15abe413dff76cfe0f7aeb88ef9aa Mon Sep 17 00:00:00 2001
From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 27 Jan 2020 15:26:19 +0200
Subject: qed: FW 8.42.2.0 debug features

Add to debug dump more information on the platform it was collected
from (pci func, path id).
Provide human readable reg fifo erros.

Removed static debug arrays from HSI Functions, and move them to
the hwfn.

Some structures were slightly changed (removing reserved chip id
for example) which lead to many long initializations being modified
with one parameter less during initialization. This leads to
some long diffs that don't really change anything.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/qed/qed_if.h | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'include')

diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 1b27c22d39af..8f29e0d8a7b3 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -1178,6 +1178,15 @@ struct qed_common_ops {
 #define GET_FIELD(value, name) \
 	(((value) >> (name ## _SHIFT)) & name ## _MASK)
 
+#define GET_MFW_FIELD(name, field) \
+	(((name) & (field ## _MASK)) >> (field ## _OFFSET))
+
+#define SET_MFW_FIELD(name, field, value)				 \
+	do {								 \
+		(name) &= ~(field ## _MASK);				 \
+		(name) |= (((value) << (field ## _OFFSET)) & (field ## _MASK));\
+	} while (0)
+
 #define DB_ADDR_SHIFT(addr) ((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
 
 /* Debug print definitions */
-- 
cgit v1.2.3


From 6cd021a58c18a1731f7e47f83e172c0c302d65e5 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Mon, 27 Jan 2020 15:40:31 -0500
Subject: udp: segment looped gso packets correctly

Multicast and broadcast packets can be looped from egress to ingress
pre segmentation with dev_loopback_xmit. That function unconditionally
sets ip_summed to CHECKSUM_UNNECESSARY.

udp_rcv_segment segments gso packets in the udp rx path. Segmentation
usually executes on egress, and does not expect packets of this type.
__udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
the offsets are not correct for gso_make_checksum.

UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
to the correct pseudo header checksum. Reset ip_summed to this type.
(CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/udp.h | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'include')

diff --git a/include/net/udp.h b/include/net/udp.h
index 44e0e52b585c..4a180f2a13e3 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -476,6 +476,9 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 	if (!inet_get_convert_csum(sk))
 		features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 
+	if (skb->pkt_type == PACKET_LOOPBACK)
+		skb->ip_summed = CHECKSUM_PARTIAL;
+
 	/* the GSO CB lays after the UDP one, no need to save and restore any
 	 * CB fragment
 	 */
-- 
cgit v1.2.3