From 24426654ed3ae83d1127511891fb782c54f49203 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:17:17 -0700 Subject: bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a has_current_bpf_ctx() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the has_current_bpf_ctx() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test has_current_bpf_ctx() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/sock.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index 05a1bbdf5805..352b9458fdc6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1749,6 +1749,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) } } +void sockopt_lock_sock(struct sock *sk); +void sockopt_release_sock(struct sock *sk); + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming -- cgit v1.2.3 From e42c7beee71d0d84a6193357e3525d0cf2a3e168 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:17:23 -0700 Subject: bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt() When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit 8d650cdedaab ("tcp: fix tcp_set_congestion_control() use from bpf hook"), tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. This patch adds sockopt_ns_capable() and sockopt_capable() for the sk_setsockopt() to use. They will consider the has_current_bpf_ctx() before doing the ns_capable() and capable() test. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Suggested-by: Stanislav Fomichev Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061723.4175820-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index 352b9458fdc6..13089d88a2e2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1751,6 +1751,8 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) void sockopt_lock_sock(struct sock *sk); void sockopt_release_sock(struct sock *sk); +bool sockopt_ns_capable(struct user_namespace *ns, int cap); +bool sockopt_capable(int cap); /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it -- cgit v1.2.3 From 29003875bd5bab262a29d1c6e76a2124bd07e4c2 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:18:04 -0700 Subject: bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt() After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sk_setsockopt(). The sock ptr test is added to the SO_RCVLOWAT because the sk->sk_socket could be NULL in some of the bpf hooks. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061804.4178920-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index 13089d88a2e2..ee44b424d952 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1828,6 +1828,8 @@ void sock_pfree(struct sk_buff *skb); #define sock_edemux sock_efree #endif +int sk_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); -- cgit v1.2.3 From 0c751f7071ef98d334ed06ca3f8f4cc1f7458cf5 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:18:19 -0700 Subject: bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt() After the prep work in the previous patches, this patch removes all the dup code from bpf_setsockopt(SOL_TCP) and reuses the do_tcp_setsockopt(). The existing optname white-list is refactored into a new function sol_tcp_setsockopt(). The sol_tcp_setsockopt() also calls the bpf_sol_tcp_setsockopt() to handle the TCP_BPF_XXX specific optnames. bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to save the eth header also and it comes for free from do_tcp_setsockopt(). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061819.4180146-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/tcp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/tcp.h b/include/net/tcp.h index d10962b9f0d0..c03a50c72f40 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -405,6 +405,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); +int do_tcp_setsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); void tcp_set_keepalive(struct sock *sk, int val); -- cgit v1.2.3 From ee7f1e1302f5cb29168f70827c12855f1d8c9845 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:18:26 -0700 Subject: bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt() After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IP) and reuses the implementation in do_ip_setsockopt(). The existing optname white-list is refactored into a new function sol_ip_setsockopt(). NOTE, the current bpf_setsockopt(IP_TOS) is quite different from the the do_ip_setsockopt(IP_TOS). For example, it does not take the INET_ECN_MASK into the account for tcp and also does not adjust sk->sk_priority. It looks like the current bpf_setsockopt(IP_TOS) was referencing the IPV6_TCLASS implementation instead of IP_TOS. This patch tries to rectify that by using the do_ip_setsockopt(IP_TOS). While this is a behavior change, the do_ip_setsockopt(IP_TOS) behavior is arguably what the user is expecting. At least, the INET_ECN_MASK bits should be masked out for tcp. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061826.4180990-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/ip.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/ip.h b/include/net/ip.h index 1c979fd1904c..34fa5b0f0a0e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -743,6 +743,8 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk, int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc, bool allow_ipv6); DECLARE_STATIC_KEY_FALSE(ip4_min_ttl); +int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, -- cgit v1.2.3 From 75b64b68ee3f9fe90ad8f21e5c5c92de58abf725 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:18:34 -0700 Subject: bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt() After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IPV6) and reuses the implementation in do_ipv6_setsockopt(). ipv6 could be compiled as a module. Like how other code solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt to the ipv6_bpf_stub. The current bpf_setsockopt(IPV6_TCLASS) does not take the INET_ECN_MASK into the account for tcp. The do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly. The existing optname white-list is refactored into a new function sol_ipv6_setsockopt(). After this last SOL_IPV6 dup code removal, the __bpf_setsockopt() is simplified enough that the extra "{ }" around the if statement can be removed. Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061834.4181198-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/ipv6.h | 2 ++ include/net/ipv6_stubs.h | 2 ++ 2 files changed, 4 insertions(+) (limited to 'include/net') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index de9dcc5652c4..c110d9032083 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1156,6 +1156,8 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6, */ DECLARE_STATIC_KEY_FALSE(ip6_min_hopcount); +int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, + unsigned int optlen); int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ipv6_getsockopt(struct sock *sk, int level, int optname, diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h index 45e0339be6fa..8692698b01cf 100644 --- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -81,6 +81,8 @@ struct ipv6_bpf_stub { const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, struct sk_buff *skb); + int (*ipv6_setsockopt)(struct sock *sk, int level, int optname, + sockptr_t optval, unsigned int optlen); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; -- cgit v1.2.3 From 6dadbe4bac68309eb46ab0f30e8ff47a789df49a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:28:53 -0700 Subject: bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument . This patch also changes do_ipv6_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt(). Note on the change in ip6_mc_msfget(). This function is to return an array of sockaddr_storage in optval. This function is shared between ipv6_get_msfilter() and compat_ipv6_get_msfilter(). However, the sockaddr_storage is stored at different offset of the optval because of the difference between group_filter and compat_group_filter. Thus, a new 'ss_offset' argument is added to ip6_mc_msfget(). Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002853.2892532-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/net') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index c110d9032083..a4f24573ed7a 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1209,7 +1209,7 @@ int ip6_mc_source(int add, int omode, struct sock *sk, int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf, struct sockaddr_storage *list); int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf, - struct sockaddr_storage __user *p); + sockptr_t optval, size_t ss_offset); #ifdef CONFIG_PROC_FS int ac6_proc_init(struct net *net); -- cgit v1.2.3 From 65ddc82d3b96be5555a36de4e2b4547433a00532 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:29:12 -0700 Subject: bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt() This patch changes bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt(). It removes all duplicated code from bpf_getsockopt(SOL_SOCKET). Before this patch, there were some optnames available to bpf_setsockopt(SOL_SOCKET) but missing in bpf_getsockopt(SOL_SOCKET). It surprises users from time to time. For example, SO_REUSEADDR, SO_KEEPALIVE, SO_RCVLOWAT, and SO_MAX_PACING_RATE. This patch automatically closes this gap without duplicating more code. The only exception is SO_BINDTODEVICE because it needs to acquire a blocking lock. Thus, SO_BINDTODEVICE is not supported. Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002912.2894040-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/sock.h b/include/net/sock.h index ee44b424d952..ea7965524133 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1833,6 +1833,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); +int sk_getsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, sockptr_t optlen); int sock_getsockopt(struct socket *sock, int level, int op, char __user *optval, int __user *optlen); int sock_gettstamp(struct socket *sock, void __user *userstamp, -- cgit v1.2.3 From 273b7f0fb44847c41814a59901977be284daa447 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:29:18 -0700 Subject: bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() This patch changes bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt(). It removes the duplicated code from bpf_getsockopt(SOL_TCP). Before this patch, there were some optnames available to bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP). For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL, and a few more. It surprises users from time to time. This patch automatically closes this gap without duplicating more code. bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn, so it stays in sol_tcp_sockopt(). For string name value like TCP_CONGESTION, bpf expects it is always null terminated, so sol_tcp_sockopt() decrements optlen by one before calling do_tcp_getsockopt() and the 'if (optlen < saved_optlen) memset(..,0,..);' in __bpf_getsockopt() will always do a null termination. Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002918.2894511-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/tcp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/tcp.h b/include/net/tcp.h index c03a50c72f40..735e957f7f4b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -402,6 +402,8 @@ void tcp_init_sock(struct sock *sk); void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb); __poll_t tcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait); +int do_tcp_getsockopt(struct sock *sk, int level, + int optname, sockptr_t optval, sockptr_t optlen); int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); -- cgit v1.2.3 From fd969f25fe24be515278d28cbf86dde39be8a495 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:29:25 -0700 Subject: bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() This patch changes bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() and remove the duplicated code. Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002925.2895416-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/ip.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/net') diff --git a/include/net/ip.h b/include/net/ip.h index 34fa5b0f0a0e..038097c2a152 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -747,6 +747,8 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); +int do_ip_getsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, sockptr_t optlen); int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); int ip_ra_control(struct sock *sk, unsigned char on, -- cgit v1.2.3 From 38566ec06f52250c4abaa7447aae676e0b881c46 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:29:31 -0700 Subject: bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() This patch changes bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt(). It removes the duplicated code from bpf_getsockopt(SOL_IPV6). This also makes bpf_getsockopt(SOL_IPV6) supporting the same set of optnames as in bpf_setsockopt(SOL_IPV6). In particular, this adds IPV6_AUTOFLOWLABEL support to bpf_getsockopt(SOL_IPV6). ipv6 could be compiled as a module. Like how other code solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_getsockopt to the ipv6_bpf_stub. Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002931.2896218-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/net/ipv6.h | 2 ++ include/net/ipv6_stubs.h | 2 ++ 2 files changed, 4 insertions(+) (limited to 'include/net') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index a4f24573ed7a..d664ba5812d8 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1160,6 +1160,8 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval unsigned int optlen); int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); +int do_ipv6_getsockopt(struct sock *sk, int level, int optname, + sockptr_t optval, sockptr_t optlen); int ipv6_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h index 8692698b01cf..c48186bf4737 100644 --- a/include/net/ipv6_stubs.h +++ b/include/net/ipv6_stubs.h @@ -83,6 +83,8 @@ struct ipv6_bpf_stub { struct sk_buff *skb); int (*ipv6_setsockopt)(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); + int (*ipv6_getsockopt)(struct sock *sk, int level, int optname, + sockptr_t optval, sockptr_t optlen); }; extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly; -- cgit v1.2.3