From 936a9180e81763a368adb92b6235edd5d5a67ca1 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Sat, 16 Mar 2019 14:41:30 +0100 Subject: packets: Always register packet sk in the same order [ Upstream commit a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a ] When using fanouts with AF_PACKET, the demux functions such as fanout_demux_cpu will return an index in the fanout socket array, which corresponds to the selected socket. The ordering of this array depends on the order the sockets were added to a given fanout group, so for FANOUT_CPU this means sockets are bound to cpus in the order they are configured, which is OK. However, when stopping then restarting the interface these sockets are bound to, the sockets are reassigned to the fanout group in the reverse order, due to the fact that they were inserted at the head of the interface's AF_PACKET socket list. This means that traffic that was directed to the first socket in the fanout group is now directed to the last one after an interface restart. In the case of FANOUT_CPU, traffic from CPU0 will be directed to the socket that used to receive traffic from the last CPU after an interface restart. This commit introduces a helper to add a socket at the tail of a list, then uses it to register AF_PACKET sockets. Note that this changes the order in which sockets are listed in /proc and with sock_diag. Fixes: dc99f600698d ("packet: Add fanout support") Signed-off-by: Maxime Chevallier Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/sock.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 15bb04dec40e..116308632fae 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -650,6 +650,12 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list) hlist_add_head_rcu(&sk->sk_node, list); } +static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list) +{ + sock_hold(sk); + hlist_add_tail_rcu(&sk->sk_node, list); +} + static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); -- cgit v1.2.3 From 4f05457010d17012a3af167539f54c68ce6a68cc Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 18 Mar 2019 19:47:00 +0800 Subject: sctp: get sctphdr by offset in sctp_compute_cksum [ Upstream commit 273160ffc6b993c7c91627f5a84799c66dfe4dee ] sctp_hdr(skb) only works when skb->transport_header is set properly. But in Netfilter, skb->transport_header for ipv6 is not guaranteed to be right value for sctphdr. It would cause to fail to check the checksum for sctp packets. So fix it by using offset, which is always right in all places. v1->v2: - Fix the changelog. Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code") Reported-by: Li Shuang Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/sctp/checksum.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/sctp/checksum.h b/include/net/sctp/checksum.h index 4a5b9a306c69..803fc26ef0ba 100644 --- a/include/net/sctp/checksum.h +++ b/include/net/sctp/checksum.h @@ -60,7 +60,7 @@ static inline __wsum sctp_csum_combine(__wsum csum, __wsum csum2, static inline __le32 sctp_compute_cksum(const struct sk_buff *skb, unsigned int offset) { - struct sctphdr *sh = sctp_hdr(skb); + struct sctphdr *sh = (struct sctphdr *)(skb->data + offset); __le32 ret, old = sh->checksum; const struct skb_checksum_ops ops = { .update = sctp_csum_update, -- cgit v1.2.3 From 3085d41e89f0a268842e3a66b8436faed79d504f Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 8 Mar 2019 11:32:04 -0800 Subject: tracing: kdb: Fix ftdump to not sleep [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ] As reported back in 2016-11 [1], the "ftdump" kdb command triggers a BUG for "sleeping function called from invalid context". kdb's "ftdump" command wants to call ring_buffer_read_prepare() in atomic context. A very simple solution for this is to add allocation flags to ring_buffer_read_prepare() so kdb can call it without triggering the allocation error. This patch does that. Note that in the original email thread about this, it was suggested that perhaps the solution for kdb was to either preallocate the buffer ahead of time or create our own iterator. I'm hoping that this alternative of adding allocation flags to ring_buffer_read_prepare() can be considered since it means I don't need to duplicate more of the core trace code into "trace_kdb.c" (for either creating my own iterator or re-preparing a ring allocator whose memory was already allocated). NOTE: another option for kdb is to actually figure out how to make it reuse the existing ftrace_dump() function and totally eliminate the duplication. This sounds very appealing and actually works (the "sr z" command can be seen to properly dump the ftrace buffer). The downside here is that ftrace_dump() fully consumes the trace buffer. Unless that is changed I'd rather not use it because it means "ftdump | grep xyz" won't be very useful to search the ftrace buffer since it will throw away the whole trace on the first grep. A future patch to dump only the last few lines of the buffer will also be hard to implement. [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org Reported-by: Brian Norris Signed-off-by: Douglas Anderson Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin --- include/linux/ring_buffer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 19d0778ec382..121c8f99ecdd 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -125,7 +125,7 @@ ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts, unsigned long *lost_events); struct ring_buffer_iter * -ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu); +ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu, gfp_t flags); void ring_buffer_read_prepare_sync(void); void ring_buffer_read_start(struct ring_buffer_iter *iter); void ring_buffer_read_finish(struct ring_buffer_iter *iter); -- cgit v1.2.3 From 553be48d7459b569a83e676daaa6ba69f32cf288 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Thu, 7 Mar 2019 16:31:28 -0800 Subject: include/linux/relay.h: fix percpu annotation in struct rchan [ Upstream commit 62461ac2e5b6520b6d65fc6d7d7b4b8df4b848d8 ] The percpu member of this structure is declared as: struct ... ** __percpu member; So its type is: __percpu pointer to pointer to struct ... But looking at how it's used, its type should be: pointer to __percpu pointer to struct ... and it should thus be declared as: struct ... * __percpu *member; So fix the placement of '__percpu' in the definition of this structures. This silents a few Sparse's warnings like: warning: incorrect type in initializer (different address spaces) expected void const [noderef] *__vpp_verify got struct sched_domain ** Link: http://lkml.kernel.org/r/20190118144902.79065-1-luc.vanoostenryck@gmail.com Fixes: 017c59c042d01 ("relay: Use per CPU constructs for the relay channel buffer pointers") Signed-off-by: Luc Van Oostenryck Cc: Jens Axboe Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/relay.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/relay.h b/include/linux/relay.h index 68c1448e56bb..2560f8706408 100644 --- a/include/linux/relay.h +++ b/include/linux/relay.h @@ -65,7 +65,7 @@ struct rchan struct kref kref; /* channel refcount */ void *private_data; /* for user-defined data */ size_t last_toobig; /* tried to log event > subbuf size */ - struct rchan_buf ** __percpu buf; /* per-cpu channel buffers */ + struct rchan_buf * __percpu *buf; /* per-cpu channel buffers */ int is_global; /* One global buffer ? */ struct list_head list; /* for channel list */ struct dentry *parent; /* parent dentry passed to open */ -- cgit v1.2.3 From acb5aefd789bd138c5a0616581efe0ccf1087781 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 8 Feb 2019 14:48:03 +0100 Subject: genirq: Avoid summation loops for /proc/stat [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ] Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. This can be largely avoided for interrupts which are not marked as 'PER_CPU' interrupts by simply adding a per interrupt summation counter which is incremented along with the per interrupt per cpu counter. The PER_CPU interrupts need to avoid that and use only per cpu accounting because they share the interrupt number and the interrupt descriptor and concurrent updates would conflict or require unwanted synchronization. Reported-by: Waiman Long Signed-off-by: Thomas Gleixner Reviewed-by: Waiman Long Reviewed-by: Marc Zyngier Reviewed-by: Davidlohr Bueso Cc: Matthew Wilcox Cc: Andrew Morton Cc: Alexey Dobriyan Cc: Kees Cook Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso Cc: Miklos Szeredi Cc: Daniel Colascione Cc: Dave Chinner Cc: Randy Dunlap Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de 8<------------- v2: Undo the unintentional layout change of struct irq_desc. include/linux/irqdesc.h | 1 + kernel/irq/chip.c | 12 ++++++++++-- kernel/irq/internals.h | 8 +++++++- kernel/irq/irqdesc.c | 7 ++++++- 4 files changed, 24 insertions(+), 4 deletions(-) Signed-off-by: Sasha Levin --- include/linux/irqdesc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index c9be57931b58..bb5547a83daf 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -61,6 +61,7 @@ struct irq_desc { unsigned int core_internal_state__do_not_mess_with_it; unsigned int depth; /* nested irq disables */ unsigned int wake_depth; /* nested wake enables */ + unsigned int tot_count; unsigned int irq_count; /* For detecting broken IRQs */ unsigned long last_unhandled; /* Aging timer for unhandled count */ unsigned int irqs_unhandled; -- cgit v1.2.3 From 2ae06da84a63d16efffac07215beb0cf12f0c21a Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 11 Jan 2019 14:46:15 +0100 Subject: netfilter: physdev: relax br_netfilter dependency [ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ] Following command: iptables -D FORWARD -m physdev ... causes connectivity loss in some setups. Reason is that iptables userspace will probe kernel for the module revision of the physdev patch, and physdev has an artificial dependency on br_netfilter (xt_physdev use makes no sense unless a br_netfilter module is loaded). This causes the "phydev" module to be loaded, which in turn enables the "call-iptables" infrastructure. bridged packets might then get dropped by the iptables ruleset. The better fix would be to change the "call-iptables" defaults to 0 and enforce explicit setting to 1, but that breaks backwards compatibility. This does the next best thing: add a request_module call to checkentry. This was a stray '-D ... -m physdev' won't activate br_netfilter anymore. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- include/net/netfilter/br_netfilter.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h index 0b0c35c37125..238d1b83a45a 100644 --- a/include/net/netfilter/br_netfilter.h +++ b/include/net/netfilter/br_netfilter.h @@ -48,7 +48,6 @@ static inline struct rtable *bridge_parent_rtable(const struct net_device *dev) } struct net_device *setup_pre_routing(struct sk_buff *skb); -void br_netfilter_enable(void); #if IS_ENABLED(CONFIG_IPV6) int br_validate_ipv6(struct net *net, struct sk_buff *skb); -- cgit v1.2.3 From 7eceaf5bbfc0ecc0efc89b1f7318e44ea87391cc Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Fri, 5 Apr 2019 18:38:45 -0700 Subject: lib/string.c: implement a basic bcmp [ Upstream commit 5f074f3e192f10c9fade898b9b3b8812e3d83342 ] A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works. This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future. Other ideas discussed: * A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly. * -ffreestanding also is used sporadically throughout the kernel. * -fno-builtin-bcmp doesn't work when doing LTO. Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18eedcff13 Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers Reported-by: Nathan Chancellor Reported-by: Adhemerval Zanella Suggested-by: Arnd Bergmann Suggested-by: James Y Knight Suggested-by: Masahiro Yamada Suggested-by: Nathan Chancellor Suggested-by: Rasmus Villemoes Acked-by: Steven Rostedt (VMware) Reviewed-by: Nathan Chancellor Tested-by: Nathan Chancellor Reviewed-by: Masahiro Yamada Reviewed-by: Andy Shevchenko Cc: David Laight Cc: Rasmus Villemoes Cc: Namhyung Kim Cc: Greg Kroah-Hartman Cc: Alexander Shishkin Cc: Dan Williams Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/string.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/string.h b/include/linux/string.h index 60042e5e88ff..42eed573ebb6 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -111,6 +111,9 @@ extern void * memscan(void *,int,__kernel_size_t); #ifndef __HAVE_ARCH_MEMCMP extern int memcmp(const void *,const void *,__kernel_size_t); #endif +#ifndef __HAVE_ARCH_BCMP +extern int bcmp(const void *,const void *,__kernel_size_t); +#endif #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif -- cgit v1.2.3 From 5f5d628adc1dd5542f7d1a08b2bbd808c1186c7e Mon Sep 17 00:00:00 2001 From: Stephen Suryaputra Date: Mon, 1 Apr 2019 09:17:32 -0400 Subject: vrf: check accept_source_route on the original netdevice [ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ] Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev. v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern). Signed-off-by: Stephen Suryaputra Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/ip.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/ip.h b/include/net/ip.h index f06cd30bb44c..a3c1b9dfc9a1 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -580,7 +580,7 @@ int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp, unsigned char __user *data, int optlen); void ip_options_undo(struct ip_options *opt); void ip_forward_options(struct sk_buff *skb); -int ip_options_rcv_srr(struct sk_buff *skb); +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev); /* * Functions provided by ip_sockglue.c -- cgit v1.2.3 From 9a739f1ad0b12f73afc176b2788c6d0f4e9cc739 Mon Sep 17 00:00:00 2001 From: Yuval Avnery Date: Mon, 11 Mar 2019 06:18:24 +0200 Subject: net/mlx5e: Add a lock on tir list [ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ] Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required. Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 859fd209603a..509e99076c57 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -578,6 +578,8 @@ enum mlx5_pci_status { }; struct mlx5_td { + /* protects tirs list changes while tirs refresh */ + struct mutex list_lock; struct list_head tirs_list; u32 tdn; }; -- cgit v1.2.3 From 6996763856e1fb27ccae260e41fd73a3fff56678 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 27 Mar 2019 08:21:30 -0700 Subject: netns: provide pure entropy for net_hash_mix() [ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ] net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated) I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first. Also provide entropy regardless of CONFIG_NET_NS. Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet Reported-by: Amit Klein Reported-by: Benny Pinkas Cc: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 15 ++------------- 2 files changed, 3 insertions(+), 13 deletions(-) (limited to 'include') diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index c05db6ff2515..0cdafe3935a6 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -53,6 +53,7 @@ struct net { */ spinlock_t rules_mod_lock; + u32 hash_mix; atomic64_t cookie_gen; struct list_head list; /* list of network namespaces */ diff --git a/include/net/netns/hash.h b/include/net/netns/hash.h index 69a6715d9f3f..a347b2f9e748 100644 --- a/include/net/netns/hash.h +++ b/include/net/netns/hash.h @@ -1,21 +1,10 @@ #ifndef __NET_NS_HASH_H__ #define __NET_NS_HASH_H__ -#include - -struct net; +#include static inline u32 net_hash_mix(const struct net *net) { -#ifdef CONFIG_NET_NS - /* - * shift this right to eliminate bits, that are - * always zeroed - */ - - return (u32)(((unsigned long)net) >> L1_CACHE_SHIFT); -#else - return 0; -#endif + return net->hash_mix; } #endif -- cgit v1.2.3 From a25d4ede3f6fc3b35da90c2ac22a5081c25a75d8 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 5 Apr 2019 18:38:53 -0700 Subject: include/linux/bitrev.h: fix constant bitrev commit 6147e136ff5071609b54f18982dea87706288e21 upstream. clang points out with hundreds of warnings that the bitrev macros have a problem with constant input: drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^ Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other. The obvious fix is to rename one of the variables, so this adds an extra '_'. It seems we got away with this because - there are only a few drivers using bitrev macros - usually there are no constant arguments to those - when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance. In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing). Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann Reviewed-by: Nick Desaulniers Cc: Zhao Qiang Cc: Yalin Wang Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/bitrev.h | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) (limited to 'include') diff --git a/include/linux/bitrev.h b/include/linux/bitrev.h index fb790b8449c1..333e42cf08de 100644 --- a/include/linux/bitrev.h +++ b/include/linux/bitrev.h @@ -31,32 +31,32 @@ static inline u32 __bitrev32(u32 x) #define __constant_bitrev32(x) \ ({ \ - u32 __x = x; \ - __x = (__x >> 16) | (__x << 16); \ - __x = ((__x & (u32)0xFF00FF00UL) >> 8) | ((__x & (u32)0x00FF00FFUL) << 8); \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = (___x >> 16) | (___x << 16); \ + ___x = ((___x & (u32)0xFF00FF00UL) >> 8) | ((___x & (u32)0x00FF00FFUL) << 8); \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ }) #define __constant_bitrev16(x) \ ({ \ - u16 __x = x; \ - __x = (__x >> 8) | (__x << 8); \ - __x = ((__x & (u16)0xF0F0U) >> 4) | ((__x & (u16)0x0F0FU) << 4); \ - __x = ((__x & (u16)0xCCCCU) >> 2) | ((__x & (u16)0x3333U) << 2); \ - __x = ((__x & (u16)0xAAAAU) >> 1) | ((__x & (u16)0x5555U) << 1); \ - __x; \ + u16 ___x = x; \ + ___x = (___x >> 8) | (___x << 8); \ + ___x = ((___x & (u16)0xF0F0U) >> 4) | ((___x & (u16)0x0F0FU) << 4); \ + ___x = ((___x & (u16)0xCCCCU) >> 2) | ((___x & (u16)0x3333U) << 2); \ + ___x = ((___x & (u16)0xAAAAU) >> 1) | ((___x & (u16)0x5555U) << 1); \ + ___x; \ }) #define __constant_bitrev8(x) \ ({ \ - u8 __x = x; \ - __x = (__x >> 4) | (__x << 4); \ - __x = ((__x & (u8)0xCCU) >> 2) | ((__x & (u8)0x33U) << 2); \ - __x = ((__x & (u8)0xAAU) >> 1) | ((__x & (u8)0x55U) << 1); \ - __x; \ + u8 ___x = x; \ + ___x = (___x >> 4) | (___x << 4); \ + ___x = ((___x & (u8)0xCCU) >> 2) | ((___x & (u8)0x33U) << 2); \ + ___x = ((___x & (u8)0xAAU) >> 1) | ((___x & (u8)0x55U) << 1); \ + ___x; \ }) #define bitrev32(x) \ -- cgit v1.2.3 From 208d25a7ae49746d7ff08efcea6cc4e190fa6cfb Mon Sep 17 00:00:00 2001 From: Cornelia Huck Date: Mon, 8 Apr 2019 14:33:22 +0200 Subject: virtio: Honour 'may_reduce_num' in vring_create_virtqueue commit cf94db21905333e610e479688add629397a4b384 upstream. vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified. However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.) Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified. While at it, fix a typo in the usage instructions. Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck Signed-off-by: Michael S. Tsirkin Reviewed-by: Halil Pasic Reviewed-by: Jens Freimann Signed-off-by: Greg Kroah-Hartman --- include/linux/virtio_ring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index e8d36938f09a..b38c1871b735 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -62,7 +62,7 @@ struct virtqueue; /* * Creates a virtqueue and allocates the descriptor ring. If * may_reduce_num is set, then this may allocate a smaller ring than - * expected. The caller should query virtqueue_get_ring_size to learn + * expected. The caller should query virtqueue_get_vring_size to learn * the actual size of the ring. */ struct virtqueue *vring_create_virtqueue(unsigned int index, -- cgit v1.2.3 From 057a0da1899f00a4ac9a4c4c452cf2cf652bdbf0 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 1 Mar 2019 10:57:57 +0800 Subject: appletalk: Fix use-after-free in atalk_proc_exit [ Upstream commit 6377f787aeb945cae7abbb6474798de129e1f3ac ] KASAN report this: BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806 CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667 atalk_proc_exit+0x18/0x820 [appletalk] atalk_exit+0xf/0x5a [appletalk] __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff Allocated by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2739 [inline] slab_alloc mm/slub.c:2747 [inline] kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752 kmem_cache_zalloc include/linux/slab.h:730 [inline] __proc_create+0x30f/0xa20 fs/proc/generic.c:408 proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469 0xffffffffc10c01bb 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002 pde_put+0x6e/0x80 fs/proc/generic.c:647 remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684 0xffffffffc10c031c 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881f41fe500 which belongs to the cache proc_dir_entry of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff8881f41fe500, ffff8881f41fe600) The buggy address belongs to the page: page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00 raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb It should check the return value of atalk_proc_init fails, otherwise atalk_exit will trgger use-after-free in pde_subdir_find while unload the module.This patch fix error cleanup path of atalk_init Reported-by: Hulk Robot Signed-off-by: YueHaibing Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/linux/atalk.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/atalk.h b/include/linux/atalk.h index 73fd8b7e9534..716d53799d1f 100644 --- a/include/linux/atalk.h +++ b/include/linux/atalk.h @@ -150,7 +150,7 @@ extern int sysctl_aarp_retransmit_limit; extern int sysctl_aarp_resolve_time; #ifdef CONFIG_SYSCTL -extern void atalk_register_sysctl(void); +extern int atalk_register_sysctl(void); extern void atalk_unregister_sysctl(void); #else #define atalk_register_sysctl() do { } while(0) -- cgit v1.2.3 From 477a4484730aa09d346e13cb34b660971a5d8a04 Mon Sep 17 00:00:00 2001 From: Pi-Hsun Shih Date: Wed, 13 Mar 2019 11:44:33 -0700 Subject: include/linux/swap.h: use offsetof() instead of custom __swapoffset macro [ Upstream commit a4046c06be50a4f01d435aa7fe57514818e6cc82 ] Use offsetof() to calculate offset of a field to take advantage of compiler built-in version when possible, and avoid UBSAN warning when compiling with Clang: UBSAN: Undefined behaviour in mm/swapfile.c:3010:38 member access within null pointer of type 'union swap_header' CPU: 6 PID: 1833 Comm: swapon Tainted: G S 4.19.23 #43 Call trace: dump_backtrace+0x0/0x194 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0x70/0x94 ubsan_epilogue+0x14/0x44 ubsan_type_mismatch_common+0xf4/0xfc __ubsan_handle_type_mismatch_v1+0x34/0x54 __se_sys_swapon+0x654/0x1084 __arm64_sys_swapon+0x1c/0x24 el0_svc_common+0xa8/0x150 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x18 Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org Signed-off-by: Pi-Hsun Shih Acked-by: Michal Hocko Reviewed-by: Andrew Morton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/swap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/swap.h b/include/linux/swap.h index 55ff5593c193..2228907d08ff 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -135,9 +135,9 @@ struct swap_extent { /* * Max bad pages in the new format.. */ -#define __swapoffset(x) ((unsigned long)&((union swap_header *)0)->x) #define MAX_SWAP_BADPAGES \ - ((__swapoffset(magic.magic) - __swapoffset(info.badpages)) / sizeof(int)) + ((offsetof(union swap_header, magic.magic) - \ + offsetof(union swap_header, info.badpages)) / sizeof(int)) enum { SWP_USED = (1 << 0), /* is slot in swap_info[] used? */ -- cgit v1.2.3 From 3be15cd448e5c5ebff3a8f7f9077db8b26697afa Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 6 Mar 2019 11:52:36 +0100 Subject: appletalk: Fix compile regression [ Upstream commit 27da0d2ef998e222a876c0cec72aa7829a626266 ] A bugfix just broke compilation of appletalk when CONFIG_SYSCTL is disabled: In file included from net/appletalk/ddp.c:65: net/appletalk/ddp.c: In function 'atalk_init': include/linux/atalk.h:164:34: error: expected expression before 'do' #define atalk_register_sysctl() do { } while(0) ^~ net/appletalk/ddp.c:1934:7: note: in expansion of macro 'atalk_register_sysctl' rc = atalk_register_sysctl(); This is easier to avoid by using conventional inline functions as stubs rather than macros. The header already has inline functions for other purposes, so I'm changing over all the macros for consistency. Fixes: 6377f787aeb9 ("appletalk: Fix use-after-free in atalk_proc_exit") Signed-off-by: Arnd Bergmann Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/linux/atalk.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/atalk.h b/include/linux/atalk.h index 716d53799d1f..af43ed404ff4 100644 --- a/include/linux/atalk.h +++ b/include/linux/atalk.h @@ -153,16 +153,26 @@ extern int sysctl_aarp_resolve_time; extern int atalk_register_sysctl(void); extern void atalk_unregister_sysctl(void); #else -#define atalk_register_sysctl() do { } while(0) -#define atalk_unregister_sysctl() do { } while(0) +static inline int atalk_register_sysctl(void) +{ + return 0; +} +static inline void atalk_unregister_sysctl(void) +{ +} #endif #ifdef CONFIG_PROC_FS extern int atalk_proc_init(void); extern void atalk_proc_exit(void); #else -#define atalk_proc_init() ({ 0; }) -#define atalk_proc_exit() do { } while(0) +static inline int atalk_proc_init(void) +{ + return 0; +} +static inline void atalk_proc_exit(void) +{ +} #endif /* CONFIG_PROC_FS */ #endif /* __LINUX_ATALK_H__ */ -- cgit v1.2.3 From c9c83bb2adc468d745dc0dd19ce44814e3696951 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sun, 24 Feb 2019 01:49:52 +0900 Subject: x86/kprobes: Verify stack frame on kretprobe commit 3ff9c075cc767b3060bdac12da72fc94dd7da1b8 upstream. Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Linus Torvalds Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/kprobes.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index e23392517db9..cb527c78de9f 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -197,6 +197,7 @@ struct kretprobe_instance { struct kretprobe *rp; kprobe_opcode_t *ret_addr; struct task_struct *task; + void *fp; char data[0]; }; -- cgit v1.2.3 From aaee29edc09ac67c3efe9236e43bbc225ef1c5db Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Fri, 26 Apr 2019 08:41:05 -0700 Subject: net: IP defrag: encapsulate rbtree defrag code into callable functions [ Upstream commit c23f35d19db3b36ffb9e04b08f1d91565d15f84f ] This is a refactoring patch: without changing runtime behavior, it moves rbtree-related code from IPv4-specific files/functions into .h/.c defrag files shared with IPv6 defragmentation code. Signed-off-by: Peter Oskolkov Cc: Eric Dumazet Cc: Florian Westphal Cc: Tom Herbert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/inet_frag.h | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index a3812e9c8fee..c2c724abde57 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -76,8 +76,8 @@ struct inet_frag_queue { struct timer_list timer; spinlock_t lock; atomic_t refcnt; - struct sk_buff *fragments; /* Used in IPv6. */ - struct rb_root rb_fragments; /* Used in IPv4. */ + struct sk_buff *fragments; /* used in 6lopwpan IPv6. */ + struct rb_root rb_fragments; /* Used in IPv4/IPv6. */ struct sk_buff *fragments_tail; struct sk_buff *last_run_head; ktime_t stamp; @@ -152,4 +152,16 @@ static inline void add_frag_mem_limit(struct netns_frags *nf, long val) extern const u8 ip_frag_ecn_table[16]; +/* Return values of inet_frag_queue_insert() */ +#define IPFRAG_OK 0 +#define IPFRAG_DUP 1 +#define IPFRAG_OVERLAP 2 +int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, + int offset, int end); +void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, + struct sk_buff *parent); +void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, + void *reasm_data); +struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q); + #endif -- cgit v1.2.3 From 33336cdde18bc81095fdf01481cb35767c2af2d1 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 26 Apr 2019 08:41:06 -0700 Subject: ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module [ Upstream commit 70b095c84326640eeacfd69a411db8fc36e8ab1a ] IPV6=m DEFRAG_IPV6=m CONNTRACK=y yields: net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get': net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable' net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6' Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params ip6_frag_init and ip6_expire_frag_queue so it would be needed to force IPV6=y too. This patch gets rid of the 'followup linker error' by removing the dependency of ipv6.ko symbols from netfilter ipv6 defrag. Shared code is placed into a header, then used from both. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- include/net/ipv6.h | 29 -------------- include/net/ipv6_frag.h | 104 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 29 deletions(-) create mode 100644 include/net/ipv6_frag.h (limited to 'include') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 7cb100d25bb5..168009eef5e4 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -511,35 +511,6 @@ static inline bool ipv6_prefix_equal(const struct in6_addr *addr1, } #endif -struct inet_frag_queue; - -enum ip6_defrag_users { - IP6_DEFRAG_LOCAL_DELIVER, - IP6_DEFRAG_CONNTRACK_IN, - __IP6_DEFRAG_CONNTRACK_IN = IP6_DEFRAG_CONNTRACK_IN + USHRT_MAX, - IP6_DEFRAG_CONNTRACK_OUT, - __IP6_DEFRAG_CONNTRACK_OUT = IP6_DEFRAG_CONNTRACK_OUT + USHRT_MAX, - IP6_DEFRAG_CONNTRACK_BRIDGE_IN, - __IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX, -}; - -void ip6_frag_init(struct inet_frag_queue *q, const void *a); -extern const struct rhashtable_params ip6_rhash_params; - -/* - * Equivalent of ipv4 struct ip - */ -struct frag_queue { - struct inet_frag_queue q; - - int iif; - unsigned int csum; - __u16 nhoffset; - u8 ecn; -}; - -void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq); - static inline bool ipv6_addr_any(const struct in6_addr *a) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h new file mode 100644 index 000000000000..6ced1e6899b6 --- /dev/null +++ b/include/net/ipv6_frag.h @@ -0,0 +1,104 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _IPV6_FRAG_H +#define _IPV6_FRAG_H +#include +#include +#include +#include + +enum ip6_defrag_users { + IP6_DEFRAG_LOCAL_DELIVER, + IP6_DEFRAG_CONNTRACK_IN, + __IP6_DEFRAG_CONNTRACK_IN = IP6_DEFRAG_CONNTRACK_IN + USHRT_MAX, + IP6_DEFRAG_CONNTRACK_OUT, + __IP6_DEFRAG_CONNTRACK_OUT = IP6_DEFRAG_CONNTRACK_OUT + USHRT_MAX, + IP6_DEFRAG_CONNTRACK_BRIDGE_IN, + __IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX, +}; + +/* + * Equivalent of ipv4 struct ip + */ +struct frag_queue { + struct inet_frag_queue q; + + int iif; + __u16 nhoffset; + u8 ecn; +}; + +#if IS_ENABLED(CONFIG_IPV6) +static inline void ip6frag_init(struct inet_frag_queue *q, const void *a) +{ + struct frag_queue *fq = container_of(q, struct frag_queue, q); + const struct frag_v6_compare_key *key = a; + + q->key.v6 = *key; + fq->ecn = 0; +} + +static inline u32 ip6frag_key_hashfn(const void *data, u32 len, u32 seed) +{ + return jhash2(data, + sizeof(struct frag_v6_compare_key) / sizeof(u32), seed); +} + +static inline u32 ip6frag_obj_hashfn(const void *data, u32 len, u32 seed) +{ + const struct inet_frag_queue *fq = data; + + return jhash2((const u32 *)&fq->key.v6, + sizeof(struct frag_v6_compare_key) / sizeof(u32), seed); +} + +static inline int +ip6frag_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr) +{ + const struct frag_v6_compare_key *key = arg->key; + const struct inet_frag_queue *fq = ptr; + + return !!memcmp(&fq->key, key, sizeof(*key)); +} + +static inline void +ip6frag_expire_frag_queue(struct net *net, struct frag_queue *fq) +{ + struct net_device *dev = NULL; + struct sk_buff *head; + + rcu_read_lock(); + spin_lock(&fq->q.lock); + + if (fq->q.flags & INET_FRAG_COMPLETE) + goto out; + + inet_frag_kill(&fq->q); + + dev = dev_get_by_index_rcu(net, fq->iif); + if (!dev) + goto out; + + __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS); + __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); + + /* Don't send error if the first segment did not arrive. */ + head = fq->q.fragments; + if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) + goto out; + + head->dev = dev; + skb_get(head); + spin_unlock(&fq->q.lock); + + icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); + kfree_skb(head); + goto out_rcu_unlock; + +out: + spin_unlock(&fq->q.lock); +out_rcu_unlock: + rcu_read_unlock(); + inet_frag_put(&fq->q); +} +#endif +#endif -- cgit v1.2.3 From eccf76e1fa34ae926ce2b73965c3017399a07920 Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Fri, 26 Apr 2019 08:41:07 -0700 Subject: net: IP6 defrag: use rbtrees for IPv6 defrag [ Upstream commit d4289fcc9b16b89619ee1c54f829e05e56de8b9a ] Currently, IPv6 defragmentation code drops non-last fragments that are smaller than 1280 bytes: see commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu") This behavior is not specified in IPv6 RFCs and appears to break compatibility with some IPv6 implemenations, as reported here: https://www.spinics.net/lists/netdev/msg543846.html This patch re-uses common IP defragmentation queueing and reassembly code in IPv6, removing the 1280 byte restriction. Signed-off-by: Peter Oskolkov Reported-by: Tom Herbert Cc: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/ipv6_frag.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h index 6ced1e6899b6..28aa9b30aece 100644 --- a/include/net/ipv6_frag.h +++ b/include/net/ipv6_frag.h @@ -82,8 +82,15 @@ ip6frag_expire_frag_queue(struct net *net, struct frag_queue *fq) __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); /* Don't send error if the first segment did not arrive. */ - head = fq->q.fragments; - if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) + if (!(fq->q.flags & INET_FRAG_FIRST_IN)) + goto out; + + /* sk_buff::dev and sk_buff::rbnode are unionized. So we + * pull the head out of the tree in order to be able to + * deal with head->dev. + */ + head = inet_frag_pull_head(&fq->q); + if (!head) goto out; head->dev = dev; -- cgit v1.2.3 From 51a27f037fb328d55389252ae67bcc2b9d70e7ac Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 16 Mar 2017 16:40:21 -0700 Subject: kasan: add a prototype of task_struct to avoid warning commit 5be9b730b09c45c358bbfe7f51d254e306cccc07 upstream. Add a prototype of task_struct to fix below warning on arm64. In file included from arch/arm64/kernel/probes/kprobes.c:19:0: include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] static inline void kasan_unpoison_task_stack(struct task_struct *task) {} As same as other types (kmem_cache, page, and vm_struct) this adds a prototype of task_struct data structure on top of kasan.h. [arnd] A related warning was fixed before, but now appears in a different line in the same file in v4.11-rc2. The patch from Masami Hiramatsu still seems appropriate, so let's take his version. Fixes: 71af2ed5eeea ("kasan, sched/headers: Remove from ") Link: https://patchwork.kernel.org/patch/9569839/ Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.de Signed-off-by: Arnd Bergmann Signed-off-by: Masami Hiramatsu Acked-by: Alexander Potapenko Acked-by: Andrey Ryabinin Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Andrey Konovalov Signed-off-by: Greg Kroah-Hartman --- include/linux/kasan.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 820c0ad54a01..c9df9e180610 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -7,6 +7,7 @@ struct kmem_cache; struct page; struct vm_struct; +struct task_struct; #ifdef CONFIG_KASAN -- cgit v1.2.3 From ff5ca554b16608b2a35665210e62b5afb4429f51 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 16 Jan 2018 17:34:00 +0100 Subject: caif: reduce stack size with KASAN commit ce6289661b14a8b391d90db918c91b6d6da6540a upstream. When CONFIG_KASAN is set, we can use relatively large amounts of kernel stack space: net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=] This adds convenience wrappers around cfpkt_extr_head(), which is responsible for most of the stack growth. With those wrapper functions, gcc apparently starts reusing the stack slots for each instance, thus avoiding the problem. Signed-off-by: Arnd Bergmann Signed-off-by: David S. Miller Signed-off-by: Andrey Konovalov Signed-off-by: Greg Kroah-Hartman --- include/net/caif/cfpkt.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'include') diff --git a/include/net/caif/cfpkt.h b/include/net/caif/cfpkt.h index fe328c52c46b..801489bb14c3 100644 --- a/include/net/caif/cfpkt.h +++ b/include/net/caif/cfpkt.h @@ -32,6 +32,33 @@ void cfpkt_destroy(struct cfpkt *pkt); */ int cfpkt_extr_head(struct cfpkt *pkt, void *data, u16 len); +static inline u8 cfpkt_extr_head_u8(struct cfpkt *pkt) +{ + u8 tmp; + + cfpkt_extr_head(pkt, &tmp, 1); + + return tmp; +} + +static inline u16 cfpkt_extr_head_u16(struct cfpkt *pkt) +{ + __le16 tmp; + + cfpkt_extr_head(pkt, &tmp, 2); + + return le16_to_cpu(tmp); +} + +static inline u32 cfpkt_extr_head_u32(struct cfpkt *pkt) +{ + __le32 tmp; + + cfpkt_extr_head(pkt, &tmp, 4); + + return le32_to_cpu(tmp); +} + /* * Peek header from packet. * Reads data from packet without changing packet. -- cgit v1.2.3 From 3c65c7660b6ebd9a8bdb0fa1583f4aeaac1bbec6 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 19 Apr 2019 13:52:38 -0400 Subject: USB: core: Fix bug caused by duplicate interface PM usage counter commit c2b71462d294cf517a0bc6e4fd6424d7cee5596f upstream. The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/usb.h b/include/linux/usb.h index 346665a0c49d..9b5ca59271d9 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -129,7 +129,6 @@ enum usb_interface_condition { * @dev: driver model's view of this device * @usb_dev: if an interface is bound to the USB major, this will point * to the sysfs representation for that device. - * @pm_usage_cnt: PM usage counter for this interface * @reset_ws: Used for scheduling resets from atomic context. * @resetting_device: USB core reset the device, so use alt setting 0 as * current; needs bandwidth alloc after reset. @@ -186,7 +185,6 @@ struct usb_interface { struct device dev; /* interface specific device info */ struct device *usb_dev; - atomic_t pm_usage_cnt; /* usage counter for autosuspend */ struct work_struct reset_ws; /* for resets in atomic context */ }; #define to_usb_interface(d) container_of(d, struct usb_interface, dev) -- cgit v1.2.3 From ef29106690b019d5f3060245aab79dfa9fdc0ab5 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Fri, 29 Mar 2019 22:46:49 +0100 Subject: linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() [ Upstream commit a0fe2c6479aab5723239b315ef1b552673f434a3 ] Use parentheses around uses of the argument in u64_to_user_ptr() to ensure that the cast doesn't apply to part of the argument. There are existing uses of the macro of the form u64_to_user_ptr(A + B) which expands to (void __user *)(uintptr_t)A + B (the cast applies to the first operand of the addition, the addition is a pointer addition). This happens to still work as intended, the semantic difference doesn't cause a difference in behavior. But I want to use u64_to_user_ptr() with a ternary operator in the argument, like so: u64_to_user_ptr(A ? B : C) This currently doesn't work as intended. Signed-off-by: Jann Horn Signed-off-by: Borislav Petkov Reviewed-by: Mukesh Ojha Cc: Andrei Vagin Cc: Andrew Morton Cc: Dan Carpenter Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jani Nikula Cc: Kees Cook Cc: Masahiro Yamada Cc: NeilBrown Cc: Peter Zijlstra Cc: Qiaowei Ren Cc: Thomas Gleixner Cc: x86-ml Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com Signed-off-by: Sasha Levin --- include/linux/kernel.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 61054f12be7c..d83fc669eeac 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -55,8 +55,8 @@ #define u64_to_user_ptr(x) ( \ { \ - typecheck(u64, x); \ - (void __user *)(uintptr_t)x; \ + typecheck(u64, (x)); \ + (void __user *)(uintptr_t)(x); \ } \ ) -- cgit v1.2.3 From 604d7b594c6d18582650dd06b201643b15202232 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 11 Apr 2019 10:14:59 -0700 Subject: mm: add 'try_get_page()' helper function [ Upstream commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 ] This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox Cc: Jann Horn Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/mm.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index 11a5a46ce72b..e3c8d40a18b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -777,6 +777,15 @@ static inline void get_page(struct page *page) get_zone_device_page(page); } +static inline __must_check bool try_get_page(struct page *page) +{ + page = compound_head(page); + if (WARN_ON_ONCE(page_ref_count(page) <= 0)) + return false; + page_ref_inc(page); + return true; +} + static inline void put_page(struct page *page) { page = compound_head(page); -- cgit v1.2.3 From 745f5c5f2ac14ac1cbb7fe3cbdc893c9d1af1356 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Wed, 24 Apr 2019 22:19:17 +0200 Subject: Bluetooth: Align minimum encryption key size for LE and BR/EDR connections commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream. The minimum encryption key size for LE connections is 56 bits and to align LE with BR/EDR, enforce 56 bits of minimum encryption key size for BR/EDR connections as well. Signed-off-by: Marcel Holtmann Signed-off-by: Johan Hedberg Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- include/net/bluetooth/hci_core.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 4931787193c3..57a7dba49d29 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -176,6 +176,9 @@ struct adv_info { #define HCI_MAX_SHORT_NAME_LENGTH 10 +/* Min encryption key size to match with SMP */ +#define HCI_MIN_ENC_KEY_SIZE 7 + /* Default LE RPA expiry time, 15 minutes */ #define HCI_DEFAULT_RPA_TIMEOUT (15 * 60) -- cgit v1.2.3 From c6693781ddaf21dd3746bd74ba0c66e013782b06 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Fri, 8 Sep 2017 16:14:33 -0700 Subject: bitops: avoid integer overflow in GENMASK(_ULL) commit c32ee3d9abd284b4fcaacc250b101f93829c7bae upstream. GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically results in an integer overflow. clang raises a warning if the overflow occurs in a preprocessor expression. Clear the low-order bits through a substraction instead of the left-shift to avoid the overflow. (akpm: no change in .text size in my testing) Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index a83c822c35c2..8fbe259b197c 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -19,10 +19,11 @@ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. */ #define GENMASK(h, l) \ - (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) #define GENMASK_ULL(h, l) \ - (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); -- cgit v1.2.3 From b995196b9da4e2486d50e132539c848a60ea88da Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 19 Jun 2018 13:53:08 +0100 Subject: locking/atomics, asm-generic: Move some macros from to a new file commit 8bd9cb51daac89337295b6f037b0486911e1b408 upstream. In preparation for implementing the asm-generic atomic bitops in terms of atomic_long_*(), we need to prevent implementations from pulling in . A common reason for this include is for the BITS_PER_BYTE definition, so move this and some other BIT() and masking macros into a new header file, . Signed-off-by: Will Deacon Acked-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 22 +--------------------- include/linux/bits.h | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 21 deletions(-) create mode 100644 include/linux/bits.h (limited to 'include') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 8fbe259b197c..d4b167fc9ecb 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -1,29 +1,9 @@ #ifndef _LINUX_BITOPS_H #define _LINUX_BITOPS_H #include +#include -#ifdef __KERNEL__ -#define BIT(nr) (1UL << (nr)) -#define BIT_ULL(nr) (1ULL << (nr)) -#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) -#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) -#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG)) -#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) -#define BITS_PER_BYTE 8 #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) -#endif - -/* - * Create a contiguous bitmask starting at bit position @l and ending at - * position @h. For example - * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. - */ -#define GENMASK(h, l) \ - (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) - -#define GENMASK_ULL(h, l) \ - (((~0ULL) - (1ULL << (l)) + 1) & \ - (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); diff --git a/include/linux/bits.h b/include/linux/bits.h new file mode 100644 index 000000000000..2b7b532c1d51 --- /dev/null +++ b/include/linux/bits.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_BITS_H +#define __LINUX_BITS_H +#include + +#define BIT(nr) (1UL << (nr)) +#define BIT_ULL(nr) (1ULL << (nr)) +#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) +#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG)) +#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) +#define BITS_PER_BYTE 8 + +/* + * Create a contiguous bitmask starting at bit position @l and ending at + * position @h. For example + * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. + */ +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + +#define GENMASK_ULL(h, l) \ + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + +#endif /* __LINUX_BITS_H */ -- cgit v1.2.3 From 822e5d5358bb945c5a22f7de50de307c8a782dbe Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Tue, 25 Sep 2018 14:38:18 +0200 Subject: x86/speculation: Apply IBPB more strictly to avoid cross-process data leak commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream. Currently, IBPB is only issued in cases when switching into a non-dumpable process, the rationale being to protect such 'important and security sensitive' processess (such as GPG) from data leaking into a different userspace process via spectre v2. This is however completely insufficient to provide proper userspace-to-userpace spectrev2 protection, as any process can poison branch buffers before being scheduled out, and the newly scheduled process immediately becomes spectrev2 victim. In order to minimize the performance impact (for usecases that do require spectrev2 protection), issue the barrier only in cases when switching between processess where the victim can't be ptraced by the potential attacker (as in such cases, the attacker doesn't have to bother with branch buffers at all). [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably fine-grained ] Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch") Originally-by: Tim Chen Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Josh Poimboeuf Cc: Andrea Arcangeli Cc: "WoodhouseDavid" Cc: Andi Kleen Cc: "SchauflerCasey" Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/ptrace.h | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index d53a23100401..58ae371556bc 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -60,14 +60,17 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead); #define PTRACE_MODE_READ 0x01 #define PTRACE_MODE_ATTACH 0x02 #define PTRACE_MODE_NOAUDIT 0x04 -#define PTRACE_MODE_FSCREDS 0x08 -#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_FSCREDS 0x08 +#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_SCHED 0x20 +#define PTRACE_MODE_IBPB 0x40 /* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */ #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_READ_REALCREDS (PTRACE_MODE_READ | PTRACE_MODE_REALCREDS) #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS) +#define PTRACE_MODE_SPEC_IBPB (PTRACE_MODE_ATTACH_REALCREDS | PTRACE_MODE_IBPB) /** * ptrace_may_access - check whether the caller is permitted to access @@ -85,6 +88,20 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead); */ extern bool ptrace_may_access(struct task_struct *task, unsigned int mode); +/** + * ptrace_may_access - check whether the caller is permitted to access + * a target task. + * @task: target task + * @mode: selects type of access and caller credentials + * + * Returns true on success, false on denial. + * + * Similar to ptrace_may_access(). Only to be called from context switch + * code. Does not call into audit and the regular LSM hooks due to locking + * constraints. + */ +extern bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode); + static inline int ptrace_reparented(struct task_struct *child) { return !same_thread_group(child->real_parent, child->parent); -- cgit v1.2.3 From c803409910a6e2ca70c3edd557aeda0055827f7a Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Fri, 10 May 2019 00:46:25 +0100 Subject: sched: Add sched_smt_active() Add the sched_smt_active() function needed for some x86 speculation mitigations. This was introduced upstream by commits 1b568f0aabf2 "sched/core: Optimize SCHED_SMT", ba2591a5993e "sched/smt: Update sched_smt_present at runtime", c5511d03ec09 "sched/smt: Make sched_smt_present track topology", and 321a874a7ef8 "sched/smt: Expose sched_smt_present static key". The upstream implementation uses the static_key_{disable,enable}_cpuslocked() functions, which aren't practical to backport. Signed-off-by: Ben Hutchings Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Peter Zijlstra (Intel) Cc: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- include/linux/sched/smt.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 include/linux/sched/smt.h (limited to 'include') diff --git a/include/linux/sched/smt.h b/include/linux/sched/smt.h new file mode 100644 index 000000000000..5209c268c6fd --- /dev/null +++ b/include/linux/sched/smt.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SCHED_SMT_H +#define _LINUX_SCHED_SMT_H + +#include + +#ifdef CONFIG_SCHED_SMT +extern atomic_t sched_smt_present; + +static __always_inline bool sched_smt_active(void) +{ + return atomic_read(&sched_smt_present); +} +#else +static inline bool sched_smt_active(void) { return false; } +#endif + +#endif -- cgit v1.2.3 From a3c901bfdb2e37f281cc8087d5a01bb35da64b20 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 25 Nov 2018 19:33:39 +0100 Subject: x86/speculation: Rework SMT state change commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream. arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Cc: Peter Zijlstra Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Jiri Kosina Cc: Tom Lendacky Cc: Josh Poimboeuf Cc: Andrea Arcangeli Cc: David Woodhouse Cc: Tim Chen Cc: Andi Kleen Cc: Dave Hansen Cc: Casey Schaufler Cc: Asit Mallick Cc: Arjan van de Ven Cc: Jon Masters Cc: Waiman Long Cc: Greg KH Cc: Dave Stewart Cc: Kees Cook Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/sched/smt.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/sched/smt.h b/include/linux/sched/smt.h index 5209c268c6fd..559ac4590593 100644 --- a/include/linux/sched/smt.h +++ b/include/linux/sched/smt.h @@ -15,4 +15,6 @@ static __always_inline bool sched_smt_active(void) static inline bool sched_smt_active(void) { return false; } #endif +void arch_smt_update(void); + #endif -- cgit v1.2.3 From 2d99bc055e458eaaf78e4901e78961546eecf5f4 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 25 Nov 2018 19:33:53 +0100 Subject: x86/speculation: Add prctl() control for indirect branch speculation commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream. Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB. Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); See Documentation/userspace-api/spec_ctrl.rst. Signed-off-by: Tim Chen Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Cc: Peter Zijlstra Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Jiri Kosina Cc: Tom Lendacky Cc: Josh Poimboeuf Cc: Andrea Arcangeli Cc: David Woodhouse Cc: Andi Kleen Cc: Dave Hansen Cc: Casey Schaufler Cc: Asit Mallick Cc: Arjan van de Ven Cc: Jon Masters Cc: Waiman Long Cc: Greg KH Cc: Dave Stewart Cc: Kees Cook Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de [bwh: Backported to 4.9: - Renumber the PFA flags - Drop changes in tools/include/uapi/linux/prctl.h - Adjust filename] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/sched.h | 9 +++++++++ include/uapi/linux/prctl.h | 1 + 2 files changed, 10 insertions(+) (limited to 'include') diff --git a/include/linux/sched.h b/include/linux/sched.h index ebd0afb35d16..1c487a3abd84 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2357,6 +2357,8 @@ static inline void memalloc_noio_restore(unsigned int flags) #define PFA_LMK_WAITING 3 /* Lowmemorykiller is waiting */ #define PFA_SPEC_SSB_DISABLE 4 /* Speculative Store Bypass disabled */ #define PFA_SPEC_SSB_FORCE_DISABLE 5 /* Speculative Store Bypass force disabled*/ +#define PFA_SPEC_IB_DISABLE 6 /* Indirect branch speculation restricted */ +#define PFA_SPEC_IB_FORCE_DISABLE 7 /* Indirect branch speculation permanently restricted */ #define TASK_PFA_TEST(name, func) \ @@ -2390,6 +2392,13 @@ TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ssb_disable) TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) +TASK_PFA_TEST(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_SET(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable) + +TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) +TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) + /* * task->jobctl flags */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 64776b72e1eb..64ec0d62e5f5 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -202,6 +202,7 @@ struct prctl_mm_map { #define PR_SET_SPECULATION_CTRL 53 /* Speculation control variants */ # define PR_SPEC_STORE_BYPASS 0 +# define PR_SPEC_INDIRECT_BRANCH 1 /* Return and control values for PR_SET/GET_SPECULATION_CTRL */ # define PR_SPEC_NOT_AFFECTED 0 # define PR_SPEC_PRCTL (1UL << 0) -- cgit v1.2.3 From ba08d562b066f044e2985ece32b7890f556ee5ed Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 18 Feb 2019 22:51:43 +0100 Subject: x86/speculation/mds: Add sysfs reporting for MDS commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream. Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Borislav Petkov Reviewed-by: Jon Masters Tested-by: Jon Masters [bwh: Backported to 4.9: test x86_hyper instead of using hypervisor_is_type()] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index ae5ac89324df..1f88e86193ae 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -54,6 +54,8 @@ extern ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_mds(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit v1.2.3 From edda9c38930f5088a740952d5181bc1aa443e63c Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 12 Apr 2019 15:39:28 -0500 Subject: cpu/speculation: Add 'mitigations=' cmdline option commit 98af8452945c55652de68536afdde3b520fec429 upstream. Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina (on x86) Reviewed-by: Jiri Kosina Cc: Borislav Petkov Cc: "H . Peter Anvin" Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Jiri Kosina Cc: Waiman Long Cc: Andrea Arcangeli Cc: Jon Masters Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman Cc: Tyler Hicks Cc: Linus Torvalds Cc: Randy Dunlap Cc: Steven Price Cc: Phil Auld Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com [bwh: Backported to 4.9: adjust filename] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 1f88e86193ae..166686209f2c 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -278,4 +278,28 @@ static inline void cpu_smt_check_topology_early(void) { } static inline void cpu_smt_check_topology(void) { } #endif +/* + * These are used for a global "mitigations=" cmdline option for toggling + * optional CPU mitigations. + */ +enum cpu_mitigations { + CPU_MITIGATIONS_OFF, + CPU_MITIGATIONS_AUTO, + CPU_MITIGATIONS_AUTO_NOSMT, +}; + +extern enum cpu_mitigations cpu_mitigations; + +/* mitigations=off */ +static inline bool cpu_mitigations_off(void) +{ + return cpu_mitigations == CPU_MITIGATIONS_OFF; +} + +/* mitigations=auto,nosmt */ +static inline bool cpu_mitigations_auto_nosmt(void) +{ + return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT; +} + #endif /* _LINUX_CPU_H_ */ -- cgit v1.2.3 From 82303dd64addd098f1ec7029bc2c97990ae2bf2a Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Thu, 9 May 2019 19:33:54 -0700 Subject: bpf: convert htab map to hlist_nulls commit 4fe8435909fddc97b81472026aa954e06dd192a5 upstream. when all map elements are pre-allocated one cpu can delete and reuse htab_elem while another cpu is still walking the hlist. In such case the lookup may miss the element. Convert hlist to hlist_nulls to avoid such scenario. When bucket lock is taken there is no need to take such precautions, so only convert map_lookup and map_get_next to nulls. The race window is extremely small and only reproducible with explicit udelay() inside lookup_nulls_elem_raw() Similar to hlist add hlist_nulls_for_each_entry_safe() and hlist_nulls_entry_safe() helpers. Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Reported-by: Jonathan Perry Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Chenbo Feng Signed-off-by: Sasha Levin --- include/linux/list_nulls.h | 5 +++++ include/linux/rculist_nulls.h | 14 ++++++++++++++ 2 files changed, 19 insertions(+) (limited to 'include') diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h index b01fe1009084..87ff4f58a2f0 100644 --- a/include/linux/list_nulls.h +++ b/include/linux/list_nulls.h @@ -29,6 +29,11 @@ struct hlist_nulls_node { ((ptr)->first = (struct hlist_nulls_node *) NULLS_MARKER(nulls)) #define hlist_nulls_entry(ptr, type, member) container_of(ptr,type,member) + +#define hlist_nulls_entry_safe(ptr, type, member) \ + ({ typeof(ptr) ____ptr = (ptr); \ + !is_a_nulls(____ptr) ? hlist_nulls_entry(____ptr, type, member) : NULL; \ + }) /** * ptr_is_a_nulls - Test if a ptr is a nulls * @ptr: ptr to be tested diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h index 6224a0ab0b1e..2720b2fbfb86 100644 --- a/include/linux/rculist_nulls.h +++ b/include/linux/rculist_nulls.h @@ -118,5 +118,19 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \ pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos))) +/** + * hlist_nulls_for_each_entry_safe - + * iterate over list of given type safe against removal of list entry + * @tpos: the type * to use as a loop cursor. + * @pos: the &struct hlist_nulls_node to use as a loop cursor. + * @head: the head for your list. + * @member: the name of the hlist_nulls_node within the struct. + */ +#define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \ + for (({barrier();}), \ + pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \ + (!is_a_nulls(pos)) && \ + ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); \ + pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });) #endif #endif -- cgit v1.2.3 From 906b45fd16116e71a5b49d26f18a1cdfc1ded959 Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Fri, 12 Apr 2019 16:01:53 +0800 Subject: x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T [ Upstream commit 0082517fa4bce073e7cf542633439f26538a14cc ] Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the shutdown process, but then it hangs in BIOS POST with a black screen. The problem is intermittent - at some points it has appeared related to Secure Boot settings or different kernel builds, but ultimately we have not been able to identify the exact conditions that trigger the issue to come and go. Besides, the EFI mode cannot be disabled in the BIOS of this model. However, after extensive testing, we observe that using the EFI reboot method reliably avoids the issue in all cases. So add a boot time quirk to use EFI reboot on such systems. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119 Signed-off-by: Jian-Hong Pan Signed-off-by: Daniel Drake Cc: Ard Biesheuvel Cc: Borislav Petkov Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Cc: linux@endlessm.com Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com [ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ] Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- include/linux/efi.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index 80b1b8faf503..e6711bf9f0d1 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1433,7 +1433,12 @@ efi_status_t efi_setup_gop(efi_system_table_t *sys_table_arg, struct screen_info *si, efi_guid_t *proto, unsigned long size); -bool efi_runtime_disabled(void); +#ifdef CONFIG_EFI +extern bool efi_runtime_disabled(void); +#else +static inline bool efi_runtime_disabled(void) { return true; } +#endif + extern void efi_call_virt_check_flags(unsigned long flags, const char *call); /* -- cgit v1.2.3 From 946023a4b7ad1113f5f16d8dcfea83a0fa46a6ed Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Wed, 14 Jun 2017 19:30:03 +0900 Subject: ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command commit e11f0f90a626f93899687b1cc909ee37dd6c5809 upstream. Drivers can implement 'struct snd_pcm_ops.ioctl' to handle some requests from ALSA PCM core. These requests are internal purpose in kernel land. Usually common set of operations are used for it. SNDRV_PCM_IOCTL1_INFO is one of the requests. According to code comment, it has been obsoleted in the old days. We can see old releases in ftp.alsa-project.org. The command was firstly introduced in v0.5.0 release as SND_PCM_IOCTL1_INFO, to allow drivers to fill data of 'struct snd_pcm_channel_info' type. In v0.9.0 release, this was obsoleted by the other commands for ioctl(2) such as SNDRV_PCM_IOCTL_CHANNEL_INFO. This commit removes the long-abandoned command, bye. Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai Signed-off-by: Nobuhiro Iwamatsu Signed-off-by: Greg Kroah-Hartman --- include/sound/pcm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/sound/pcm.h b/include/sound/pcm.h index af1fb37c6b26..b1d3cce26ce2 100644 --- a/include/sound/pcm.h +++ b/include/sound/pcm.h @@ -100,7 +100,7 @@ struct snd_pcm_ops { #endif #define SNDRV_PCM_IOCTL1_RESET 0 -#define SNDRV_PCM_IOCTL1_INFO 1 +/* 1 is absent slot. */ #define SNDRV_PCM_IOCTL1_CHANNEL_INFO 2 #define SNDRV_PCM_IOCTL1_GSTATE 3 #define SNDRV_PCM_IOCTL1_FIFO_SIZE 4 -- cgit v1.2.3 From fc7fab70541f70d214448e3f41c9ea2e9854f0f5 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Thu, 16 May 2019 21:30:49 -0400 Subject: net: core: another layer of lists, around PF_MEMALLOC skb handling [ Upstream commit 78ed8cc25986ac5c21762eeddc1e86e94d422e36 ] First example of a layer splitting the list (rather than merely taking individual packets off it). Involves new list.h function, list_cut_before(), like list_cut_position() but cuts on the other side of the given entry. Signed-off-by: Edward Cree Signed-off-by: David S. Miller [sl: cut out non list.h bits, we only want list_cut_before] Signed-off-by: Sasha Levin --- include/linux/list.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include') diff --git a/include/linux/list.h b/include/linux/list.h index 5809e9a2de5b..6f935018ea05 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -271,6 +271,36 @@ static inline void list_cut_position(struct list_head *list, __list_cut_position(list, head, entry); } +/** + * list_cut_before - cut a list into two, before given entry + * @list: a new list to add all removed entries + * @head: a list with entries + * @entry: an entry within head, could be the head itself + * + * This helper moves the initial part of @head, up to but + * excluding @entry, from @head to @list. You should pass + * in @entry an element you know is on @head. @list should + * be an empty list or a list you do not care about losing + * its data. + * If @entry == @head, all entries on @head are moved to + * @list. + */ +static inline void list_cut_before(struct list_head *list, + struct list_head *head, + struct list_head *entry) +{ + if (head->next == entry) { + INIT_LIST_HEAD(list); + return; + } + list->next = head->next; + list->next->prev = list; + list->prev = entry->prev; + list->prev->next = list; + head->next = entry; + entry->prev = head; +} + static inline void __list_splice(const struct list_head *list, struct list_head *prev, struct list_head *next) -- cgit v1.2.3 From 56c8a5d5ec6f28440cb3291978264ec08f8da2ff Mon Sep 17 00:00:00 2001 From: Steve Twiss Date: Fri, 26 Apr 2019 14:33:35 +0100 Subject: mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream. Mismatch between what is found in the Datasheets for DA9063 and DA9063L provided by Dialog Semiconductor, and the register names provided in the MFD registers file. The changes are for the OTP (one-time-programming) control registers. The two naming errors are OPT instead of OTP, and COUNT instead of CONT (i.e. control). Cc: Stable Signed-off-by: Steve Twiss Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- include/linux/mfd/da9063/registers.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/mfd/da9063/registers.h b/include/linux/mfd/da9063/registers.h index 5d42859cb441..844fc2973392 100644 --- a/include/linux/mfd/da9063/registers.h +++ b/include/linux/mfd/da9063/registers.h @@ -215,9 +215,9 @@ /* DA9063 Configuration registers */ /* OTP */ -#define DA9063_REG_OPT_COUNT 0x101 -#define DA9063_REG_OPT_ADDR 0x102 -#define DA9063_REG_OPT_DATA 0x103 +#define DA9063_REG_OTP_CONT 0x101 +#define DA9063_REG_OTP_ADDR 0x102 +#define DA9063_REG_OTP_DATA 0x103 /* Customer Trim and Configuration */ #define DA9063_REG_T_OFFSET 0x104 -- cgit v1.2.3 From db4a55c04efea7d7770fde40d8b86b330f79dc70 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Sun, 5 May 2019 18:43:22 +0300 Subject: mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values commit ea611d1cc180fbb56982c83cd5142a2b34881f5c upstream. The FPS_PERIOD_MAX_US definitions are swapped for MAX20024 and MAX77620, fix it. Cc: stable Signed-off-by: Dmitry Osipenko Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- include/linux/mfd/max77620.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mfd/max77620.h b/include/linux/mfd/max77620.h index 3ca0af07fc78..0a68dc8fc25f 100644 --- a/include/linux/mfd/max77620.h +++ b/include/linux/mfd/max77620.h @@ -136,8 +136,8 @@ #define MAX77620_FPS_PERIOD_MIN_US 40 #define MAX20024_FPS_PERIOD_MIN_US 20 -#define MAX77620_FPS_PERIOD_MAX_US 2560 -#define MAX20024_FPS_PERIOD_MAX_US 5120 +#define MAX20024_FPS_PERIOD_MAX_US 2560 +#define MAX77620_FPS_PERIOD_MAX_US 5120 #define MAX77620_REG_FPS_GPIO1 0x54 #define MAX77620_REG_FPS_GPIO2 0x55 -- cgit v1.2.3 From 1cfaba5b49a5e9133b28da6d1a459240c1f61993 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 12 Dec 2017 08:38:30 -0800 Subject: writeback: synchronize sync(2) against cgroup writeback membership switches commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream. sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo Reported-by: Jiufei Xue Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/backing-dev-defs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 4ea779b25a51..34056ec64c7c 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -157,6 +157,7 @@ struct backing_dev_info { struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */ struct rb_root cgwb_congested_tree; /* their congested states */ atomic_t usage_cnt; /* counts both cgwbs and cgwb_contested's */ + struct rw_semaphore wb_switch_rwsem; /* no cgwb switch while syncing */ #else struct bdi_writeback_congested *wb_congested; #endif -- cgit v1.2.3 From 8f358d63bffe3d457a5fb1c485eb5e964d2b6fed Mon Sep 17 00:00:00 2001 From: Phong Tran Date: Tue, 30 Apr 2019 21:56:24 +0700 Subject: of: fix clang -Wunsequenced for be32_to_cpu() commit 440868661f36071886ed360d91de83bd67c73b4f upstream. Now, make the loop explicit to avoid clang warning. ./include/linux/of.h:238:37: warning: multiple unsequenced modifications to 'cell' [-Wunsequenced] r = (r << 32) | be32_to_cpu(*(cell++)); ^~ ./include/linux/byteorder/generic.h:95:21: note: expanded from macro 'be32_to_cpu' ^ ./include/uapi/linux/byteorder/little_endian.h:40:59: note: expanded from macro '__be32_to_cpu' ^ ./include/uapi/linux/swab.h:118:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ ./include/uapi/linux/swab.h:18:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0x000000ffUL) << 24) | \ ^ Signed-off-by: Phong Tran Reported-by: Nick Desaulniers Link: https://github.com/ClangBuiltLinux/linux/issues/460 Suggested-by: David Laight Reviewed-by: Nick Desaulniers Cc: stable@vger.kernel.org [robh: fix up whitespace] Signed-off-by: Rob Herring Signed-off-by: Greg Kroah-Hartman --- include/linux/of.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/of.h b/include/linux/of.h index aac3f09c5d90..56d83c2a6bbb 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -220,8 +220,8 @@ extern struct device_node *of_find_all_nodes(struct device_node *prev); static inline u64 of_read_number(const __be32 *cell, int size) { u64 r = 0; - while (size--) - r = (r << 32) | be32_to_cpu(*(cell++)); + for (; size--; cell++) + r = (r << 32) | be32_to_cpu(*cell); return r; } -- cgit v1.2.3 From d0a3f25d63855d37d453ca009483aeae08b1c1dc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stefan=20M=C3=A4tje?= Date: Fri, 29 Mar 2019 18:07:35 +0100 Subject: PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 4ec73791a64bab25cabf16a6067ee478692e506d upstream. Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode (conventional PCI on primary side, PCIe on downstream side), the Retrain Link bit needs to be cleared manually to allow the link training to complete successfully. If it is not cleared manually, the link training is continuously restarted and no devices below the PCI-to-PCIe bridge can be accessed. That means drivers for devices below the bridge will be loaded but won't work and may even crash because the driver is only reading 0xffff. See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL, PI7C9X130. Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected devices set this bit. Note that pcie_retrain_link() lives in aspm.c because that's currently the only place we use it, but this erratum is not specific to ASPM, and we may retrain links for other reasons in the future. Signed-off-by: Stefan Mätje [bhelgaas: apply regardless of CONFIG_PCIEASPM] Signed-off-by: Bjorn Helgaas Reviewed-by: Andy Shevchenko CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/pci.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/pci.h b/include/linux/pci.h index 534cb43e8635..b9ac0ba81221 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -320,6 +320,8 @@ struct pci_dev { unsigned int hotplug_user_indicators:1; /* SlotCtl indicators controlled exclusively by user sysfs */ + unsigned int clear_retrain_link:1; /* Need to clear Retrain Link + bit manually */ unsigned int d3_delay; /* D3->D0 transition time in ms */ unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */ -- cgit v1.2.3 From 6aab0ad9a4aef44cb596fbffbca4d956ce6fec4d Mon Sep 17 00:00:00 2001 From: Andrea Parri Date: Mon, 20 May 2019 19:23:56 +0200 Subject: bio: fix improper use of smp_mb__before_atomic() commit f381c6a4bd0ae0fde2d6340f1b9bb0f58d915de6 upstream. This barrier only applies to the read-modify-write operations; in particular, it does not apply to the atomic_set() primitive. Replace the barrier with an smp_mb(). Fixes: dac56212e8127 ("bio: skip atomic inc/dec of ->bi_cnt for most use cases") Cc: stable@vger.kernel.org Reported-by: "Paul E. McKenney" Reported-by: Peter Zijlstra Signed-off-by: Andrea Parri Reviewed-by: Ming Lei Cc: Jens Axboe Cc: Ming Lei Cc: linux-block@vger.kernel.org Cc: "Paul E. McKenney" Cc: Peter Zijlstra Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bio.h b/include/linux/bio.h index 97cb48f03dc7..a5ca6f199b88 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -237,7 +237,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) { if (count != 1) { bio->bi_flags |= (1 << BIO_REFFED); - smp_mb__before_atomic(); + smp_mb(); } atomic_set(&bio->__bi_cnt, count); } -- cgit v1.2.3 From f0539c7018c7f9df819bb45274638d66318ebd48 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Mon, 13 May 2019 17:19:41 -0700 Subject: hugetlb: use same fault hash key for shared and private mappings commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream. hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is: page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5 Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Reviewed-by: Davidlohr Bueso Cc: Joonsoo Kim Cc: "Kirill A . Shutemov" Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/hugetlb.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b699d59d0f4f..6b8a7b654771 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -92,9 +92,7 @@ void putback_active_hugepage(struct page *page); void free_huge_page(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address); pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); -- cgit v1.2.3 From a5f3559f30cf16d893df0e552f8c6a543a6c3128 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 24 Apr 2019 10:52:53 +0200 Subject: smpboot: Place the __percpu annotation correctly [ Upstream commit d4645d30b50d1691c26ff0f8fa4e718b08f8d3bb ] The test robot reported a wrong assignment of a per-CPU variable which it detected by using sparse and sent a report. The assignment itself is correct. The annotation for sparse was wrong and hence the report. The first pointer is a "normal" pointer and points to the per-CPU memory area. That means that the __percpu annotation has to be moved. Move the __percpu annotation to pointer which points to the per-CPU area. This change affects only the sparse tool (and is ignored by the compiler). Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: f97f8f06a49fe ("smpboot: Provide infrastructure for percpu hotplug threads") Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- include/linux/smpboot.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h index 12910cf19869..12a4b09f4d08 100644 --- a/include/linux/smpboot.h +++ b/include/linux/smpboot.h @@ -30,7 +30,7 @@ struct smpboot_thread_data; * @thread_comm: The base name of the thread */ struct smp_hotplug_thread { - struct task_struct __percpu **store; + struct task_struct * __percpu *store; struct list_head list; int (*thread_should_run)(unsigned int cpu); void (*thread_fn)(unsigned int cpu); -- cgit v1.2.3 From 0ce6473c392b014072142dc5cc0eed2375c8a4fd Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Tue, 19 Mar 2019 13:37:55 +0200 Subject: iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion [ Upstream commit df1d80aee963480c5c2938c64ec0ac3e4a0df2e0 ] For devices from the SigmaDelta family we need to keep CS low when doing a conversion, since the device will use the MISO line as a interrupt to indicate that the conversion is complete. This is why the driver locks the SPI bus and when the SPI bus is locked keeps as long as a conversion is going on. The current implementation gets one small detail wrong though. CS is only de-asserted after the SPI bus is unlocked. This means it is possible for a different SPI device on the same bus to send a message which would be wrongfully be addressed to the SigmaDelta device as well. Make sure that the last SPI transfer that is done while holding the SPI bus lock de-asserts the CS signal. Signed-off-by: Lars-Peter Clausen Signed-off-by: Alexandru Ardelean Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin --- include/linux/iio/adc/ad_sigma_delta.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/iio/adc/ad_sigma_delta.h b/include/linux/iio/adc/ad_sigma_delta.h index 6cc48ac55fd2..40b14736c73d 100644 --- a/include/linux/iio/adc/ad_sigma_delta.h +++ b/include/linux/iio/adc/ad_sigma_delta.h @@ -66,6 +66,7 @@ struct ad_sigma_delta { bool irq_dis; bool bus_locked; + bool keep_cs_asserted; uint8_t comm; -- cgit v1.2.3 From 1410277e190084f1de51f8656215d508d9132086 Mon Sep 17 00:00:00 2001 From: Nicolas Saenz Julienne Date: Wed, 27 Mar 2019 11:18:48 +0100 Subject: HID: core: move Usage Page concatenation to Main item [ Upstream commit 58e75155009cc800005629955d3482f36a1e0eec ] As seen on some USB wireless keyboards manufactured by Primax, the HID parser was using some assumptions that are not always true. In this case it's s the fact that, inside the scope of a main item, an Usage Page will always precede an Usage. The spec is not pretty clear as 6.2.2.7 states "Any usage that follows is interpreted as a Usage ID and concatenated with the Usage Page". While 6.2.2.8 states "When the parser encounters a main item it concatenates the last declared Usage Page with a Usage to form a complete usage value." Being somewhat contradictory it was decided to match Window's implementation, which follows 6.2.2.8. In summary, the patch moves the Usage Page concatenation from the local item parsing function to the main item parsing function. Signed-off-by: Nicolas Saenz Julienne Reviewed-by: Terry Junge Signed-off-by: Benjamin Tissoires Signed-off-by: Sasha Levin --- include/linux/hid.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/hid.h b/include/linux/hid.h index fab65b61d6d4..04bdf5477ec5 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -374,6 +374,7 @@ struct hid_global { struct hid_local { unsigned usage[HID_MAX_USAGES]; /* usage array */ + u8 usage_size[HID_MAX_USAGES]; /* usage size array */ unsigned collection_index[HID_MAX_USAGES]; /* collection index array */ unsigned usage_index; unsigned usage_minimum; -- cgit v1.2.3 From f02d577c6f0288f3685c31d8c06b948b09834d23 Mon Sep 17 00:00:00 2001 From: Chris Packham Date: Mon, 20 May 2019 15:45:36 +1200 Subject: tipc: Avoid copying bytes beyond the supplied data TLV_SET is called with a data pointer and a len parameter that tells us how many bytes are pointed to by data. When invoking memcpy() we need to careful to only copy len bytes. Previously we would copy TLV_LENGTH(len) bytes which would copy an extra 4 bytes past the end of the data pointer which newer GCC versions complain about. In file included from test.c:17: In function 'TLV_SET', inlined from 'test' at test.c:186:5: /usr/include/linux/tipc_config.h:317:3: warning: 'memcpy' forming offset [33, 36] is out of the bounds [0, 32] of object 'bearer_name' with type 'char[32]' [-Warray-bounds] memcpy(TLV_DATA(tlv_ptr), data, tlv_len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test.c: In function 'test': test.c::161:10: note: 'bearer_name' declared here char bearer_name[TIPC_MAX_BEARER_NAME]; ^~~~~~~~~~~ We still want to ensure any padding bytes at the end are initialised, do this with a explicit memset() rather than copy bytes past the end of data. Apply the same logic to TCM_SET. Signed-off-by: Chris Packham Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/tipc_config.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/tipc_config.h b/include/uapi/linux/tipc_config.h index 087b0ef82c07..bbebd258cf07 100644 --- a/include/uapi/linux/tipc_config.h +++ b/include/uapi/linux/tipc_config.h @@ -301,8 +301,10 @@ static inline int TLV_SET(void *tlv, __u16 type, void *data, __u16 len) tlv_ptr = (struct tlv_desc *)tlv; tlv_ptr->tlv_type = htons(type); tlv_ptr->tlv_len = htons(tlv_len); - if (len && data) - memcpy(TLV_DATA(tlv_ptr), data, tlv_len); + if (len && data) { + memcpy(TLV_DATA(tlv_ptr), data, len); + memset(TLV_DATA(tlv_ptr) + len, 0, TLV_SPACE(len) - tlv_len); + } return TLV_SPACE(len); } @@ -399,8 +401,10 @@ static inline int TCM_SET(void *msg, __u16 cmd, __u16 flags, tcm_hdr->tcm_len = htonl(msg_len); tcm_hdr->tcm_type = htons(cmd); tcm_hdr->tcm_flags = htons(flags); - if (data_len && data) + if (data_len && data) { memcpy(TCM_DATA(msg), data, data_len); + memset(TCM_DATA(msg) + data_len, 0, TCM_SPACE(data_len) - msg_len); + } return TCM_SPACE(data_len); } -- cgit v1.2.3 From e186b19bc33c718fa451f60ef95699ca7aac6745 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Tue, 14 May 2019 15:43:27 -0700 Subject: include/linux/bitops.h: sanitize rotate primitives commit ef4d6f6b275c498f8e5626c99dbeefdc5027f843 upstream. The ror32 implementation (word >> shift) | (word << (32 - shift) has undefined behaviour if shift is outside the [1, 31] range. Similarly for the 64 bit variants. Most callers pass a compile-time constant (naturally in that range), but there's an UBSAN report that these may actually be called with a shift count of 0. Instead of special-casing that, we can make them DTRT for all values of shift while also avoiding UB. For some reason, this was already partly done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes these patterns as rotates, so for example __u32 rol32(__u32 word, unsigned int shift) { return (word << (shift & 31)) | (word >> ((-shift) & 31)); } compiles to 0000000000000020 : 20: 89 f8 mov %edi,%eax 22: 89 f1 mov %esi,%ecx 24: d3 c0 rol %cl,%eax 26: c3 retq Older compilers unfortunately do not do as well, but this only affects the small minority of users that don't pass constants. Due to integer promotions, ro[lr]8 were already well-defined for shifts in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16] (only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set, word << 16 is undefined). For consistency, update those as well. Link: http://lkml.kernel.org/r/20190410211906.2190-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Reported-by: Ido Schimmel Tested-by: Ido Schimmel Reviewed-by: Will Deacon Cc: Vadim Pasternak Cc: Andrey Ryabinin Cc: Jacek Anaszewski Cc: Pavel Machek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Matthias Kaehlcke Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index d4b167fc9ecb..76ad8a957ffa 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -58,7 +58,7 @@ static __always_inline unsigned long hweight_long(unsigned long w) */ static inline __u64 rol64(__u64 word, unsigned int shift) { - return (word << shift) | (word >> (64 - shift)); + return (word << (shift & 63)) | (word >> ((-shift) & 63)); } /** @@ -68,7 +68,7 @@ static inline __u64 rol64(__u64 word, unsigned int shift) */ static inline __u64 ror64(__u64 word, unsigned int shift) { - return (word >> shift) | (word << (64 - shift)); + return (word >> (shift & 63)) | (word << ((-shift) & 63)); } /** @@ -78,7 +78,7 @@ static inline __u64 ror64(__u64 word, unsigned int shift) */ static inline __u32 rol32(__u32 word, unsigned int shift) { - return (word << shift) | (word >> ((-shift) & 31)); + return (word << (shift & 31)) | (word >> ((-shift) & 31)); } /** @@ -88,7 +88,7 @@ static inline __u32 rol32(__u32 word, unsigned int shift) */ static inline __u32 ror32(__u32 word, unsigned int shift) { - return (word >> shift) | (word << (32 - shift)); + return (word >> (shift & 31)) | (word << ((-shift) & 31)); } /** @@ -98,7 +98,7 @@ static inline __u32 ror32(__u32 word, unsigned int shift) */ static inline __u16 rol16(__u16 word, unsigned int shift) { - return (word << shift) | (word >> (16 - shift)); + return (word << (shift & 15)) | (word >> ((-shift) & 15)); } /** @@ -108,7 +108,7 @@ static inline __u16 rol16(__u16 word, unsigned int shift) */ static inline __u16 ror16(__u16 word, unsigned int shift) { - return (word >> shift) | (word << (16 - shift)); + return (word >> (shift & 15)) | (word << ((-shift) & 15)); } /** @@ -118,7 +118,7 @@ static inline __u16 ror16(__u16 word, unsigned int shift) */ static inline __u8 rol8(__u8 word, unsigned int shift) { - return (word << shift) | (word >> (8 - shift)); + return (word << (shift & 7)) | (word >> ((-shift) & 7)); } /** @@ -128,7 +128,7 @@ static inline __u8 rol8(__u8 word, unsigned int shift) */ static inline __u8 ror8(__u8 word, unsigned int shift) { - return (word >> shift) | (word << (8 - shift)); + return (word >> (shift & 7)) | (word << ((-shift) & 7)); } /** -- cgit v1.2.3 From 0b45276928b50e65232016e419e6182eac85d8b2 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Fri, 31 May 2019 22:30:26 -0700 Subject: memcg: make it work on sparse non-0-node systems commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream. We have a single node system with node 0 disabled: Scanning NUMA topology in Northbridge 24 Number of physical nodes 2 Skipping disabled node 0 Node 1 MemBase 0000000000000000 Limit 00000000fbff0000 NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff] This causes crashes in memcg when system boots: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] ... RIP: 0010:list_lru_add+0x94/0x170 ... Call Trace: d_lru_add+0x44/0x50 dput.part.34+0xfc/0x110 __fput+0x108/0x230 task_work_run+0x9f/0xc0 exit_to_usermode_loop+0xf5/0x100 It is reproducible as far as 4.12. I did not try older kernels. You have to have a new enough systemd, e.g. 241 (the reason is unknown -- was not investigated). Cannot be reproduced with systemd 234. The system crashes because the size of lru array is never updated in memcg_update_all_list_lrus and the reads are past the zero-sized array, causing dereferences of random memory. The root cause are list_lru_memcg_aware checks in the list_lru code. The test in list_lru_memcg_aware is broken: it assumes node 0 is always present, but it is not true on some systems as can be seen above. So fix this by avoiding checks on node 0. Remember the memcg-awareness by a bool flag in struct list_lru. Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists") Signed-off-by: Jiri Slaby Acked-by: Michal Hocko Suggested-by: Vladimir Davydov Acked-by: Vladimir Davydov Reviewed-by: Shakeel Butt Cc: Johannes Weiner Cc: Raghavendra K T Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/list_lru.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index fa7fd03cb5f9..f2e966937e39 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -51,6 +51,7 @@ struct list_lru { struct list_lru_node *node; #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB) struct list_head list; + bool memcg_aware; #endif }; -- cgit v1.2.3 From 9557090582a33801349f0a0920a55d134a27e740 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox Date: Fri, 5 Apr 2019 14:02:10 -0700 Subject: fs: prevent page refcount overflow in pipe_buf_get commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn Signed-off-by: Matthew Wilcox Signed-off-by: Linus Torvalds [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/pipe_fs_i.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 4f7129389855..c6ae0d5e9f0f 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -107,18 +107,20 @@ struct pipe_buf_operations { /* * Get a reference to the pipe buffer. */ - void (*get)(struct pipe_inode_info *, struct pipe_buffer *); + bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to * @buf: the buffer to get a reference to + * + * Return: %true if the reference was successfully obtained. */ -static inline void pipe_buf_get(struct pipe_inode_info *pipe, +static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - buf->ops->get(pipe, buf); + return buf->ops->get(pipe, buf); } /** @@ -178,7 +180,7 @@ struct pipe_inode_info *alloc_pipe_info(void); void free_pipe_info(struct pipe_inode_info *); /* Generic pipe buffer ops functions */ -void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); +bool generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *); int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *); void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *); -- cgit v1.2.3 From 96019c69145840a498e4e988f56a7524dc0c59b7 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 11 Apr 2019 10:06:20 -0700 Subject: mm: make page ref count overflow check tighter and more explicit commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upstream. We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox Cc: Jann Horn Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index e3c8d40a18b5..478466081265 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -763,6 +763,10 @@ static inline bool is_zone_device_page(const struct page *page) } #endif +/* 127: arbitrary random number, small enough to assemble well */ +#define page_ref_zero_or_close_to_overflow(page) \ + ((unsigned int) page_ref_count(page) + 127u <= 127u) + static inline void get_page(struct page *page) { page = compound_head(page); @@ -770,7 +774,7 @@ static inline void get_page(struct page *page) * Getting a normal page or the head of a compound page * requires to already have an elevated page->_refcount. */ - VM_BUG_ON_PAGE(page_ref_count(page) <= 0, page); + VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page); page_ref_inc(page); if (unlikely(is_zone_device_page(page))) -- cgit v1.2.3 From 8fca3c3646831e9fe9da08d8c9234a04b6848e06 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 4 Apr 2017 17:09:08 +0100 Subject: efi/libstub: Unify command line param parsing commit 60f38de7a8d4e816100ceafd1b382df52527bd50 upstream. Merge the parsing of the command line carried out in arm-stub.c with the handling in efi_parse_options(). Note that this also fixes the missing handling of CONFIG_CMDLINE_FORCE=y, in which case the builtin command line should supersede the one passed by the firmware. Signed-off-by: Ard Biesheuvel Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: bhe@redhat.com Cc: bhsharma@redhat.com Cc: bp@alien8.de Cc: eugene@hp.com Cc: evgeny.kalugin@intel.com Cc: jhugo@codeaurora.org Cc: leif.lindholm@linaro.org Cc: linux-efi@vger.kernel.org Cc: mark.rutland@arm.com Cc: roy.franz@cavium.com Cc: rruigrok@codeaurora.org Link: http://lkml.kernel.org/r/20170404160910.28115-1-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar [ardb: fix up merge conflicts with 4.9.180] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- include/linux/efi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/efi.h b/include/linux/efi.h index e6711bf9f0d1..02c4f16685b6 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1427,7 +1427,7 @@ efi_status_t handle_cmdline_files(efi_system_table_t *sys_table_arg, unsigned long *load_addr, unsigned long *load_size); -efi_status_t efi_parse_options(char *cmdline); +efi_status_t efi_parse_options(char const *cmdline); efi_status_t efi_setup_gop(efi_system_table_t *sys_table_arg, struct screen_info *si, efi_guid_t *proto, -- cgit v1.2.3 From 0b1a2360815a5c60144ac88c9c3aa6efc16d3e3d Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 3 Jun 2019 13:26:20 -0700 Subject: rcu: locking and unlocking need to always be at least barriers commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream. Herbert Xu pointed out that commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") was incorrect in making the preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT. If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is a no-op, but still is a compiler barrier. And RCU locking still _needs_ that compiler barrier. It is simply fundamentally not true that RCU locking would be a complete no-op: we still need to guarantee (for example) that things that can trap and cause preemption cannot migrate into the RCU locked region. The way we do that is by making it a barrier. See for example commit 386afc91144b ("spinlocks and preemption points need to be at least compiler barriers") from back in 2013 that had similar issues with spinlocks that become no-ops on UP: they must still constrain the compiler from moving other operations into the critical region. Now, it is true that a lot of RCU operations already use READ_ONCE() and WRITE_ONCE() (which in practice likely would never be re-ordered wrt anything remotely interesting), but it is also true that that is not globally the case, and that it's not even necessarily always possible (ie bitfields etc). Reported-by: Herbert Xu Fixes: bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") Cc: stable@kernel.org Cc: Boqun Feng Cc: Paul E. McKenney Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/rcupdate.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 01f71e1d2e94..aa2935779e43 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -306,14 +306,12 @@ void synchronize_rcu(void); static inline void __rcu_read_lock(void) { - if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) - preempt_disable(); + preempt_disable(); } static inline void __rcu_read_unlock(void) { - if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) - preempt_enable(); + preempt_enable(); } static inline void synchronize_rcu(void) -- cgit v1.2.3 From 5bdc536ce6c468b50d9f918cd7b0d4cb3d754a19 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Thu, 30 May 2019 00:09:39 +0200 Subject: x86/power: Fix 'nosmt' vs hibernation triple fault during resume commit ec527c318036a65a083ef68d8ba95789d2212246 upstream. As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ # v4.19+ Debugged-by: Thomas Gleixner Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina Acked-by: Pavel Machek Reviewed-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 166686209f2c..b27c9b2e683f 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -271,11 +271,15 @@ extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); extern void cpu_smt_check_topology_early(void); extern void cpu_smt_check_topology(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_ENABLED) static inline void cpu_smt_disable(bool force) { } static inline void cpu_smt_check_topology_early(void) { } static inline void cpu_smt_check_topology(void) { } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif /* -- cgit v1.2.3 From 8ad6a539db436d682c423fc7188a759400b238b4 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Fri, 1 Mar 2019 14:03:47 +0000 Subject: drm/i915: Fix I915_EXEC_RING_MASK commit d90c06d57027203f73021bb7ddb30b800d65c636 upstream. This was supposed to be a mask of all known rings, but it is being used by execbuffer to filter out invalid rings, and so is instead mapping high unused values onto valid rings. Instead of a mask of all known rings, we need it to be the mask of all possible rings. Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring") Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: # v4.6+ Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman --- include/uapi/drm/i915_drm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 03725fe89859..02137b508666 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -756,7 +756,7 @@ struct drm_i915_gem_execbuffer2 { __u32 num_cliprects; /** This is a struct drm_clip_rect *cliprects */ __u64 cliprects_ptr; -#define I915_EXEC_RING_MASK (7<<0) +#define I915_EXEC_RING_MASK (0x3f) #define I915_EXEC_DEFAULT (0<<0) #define I915_EXEC_RENDER (1<<0) #define I915_EXEC_BSD (2<<0) -- cgit v1.2.3 From 9c829b6e3fe2dfe25fccc6dc0477cf86661feeac Mon Sep 17 00:00:00 2001 From: Kirill Smelkov Date: Tue, 26 Mar 2019 22:20:43 +0000 Subject: fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream. Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete. This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus. The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006. However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not. See https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396 for historic context. The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples: kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ... Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below): drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel. FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations: See https://github.com/libfuse/osspd https://lwn.net/Articles/308445 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario: https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels: https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 Let's fix this regression. The plan is: 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers. 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously. 3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply. It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations. Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Cc: Michael Kerrisk Cc: Yongzhi Pan Cc: Jonathan Corbet Cc: David Vrabel Cc: Juergen Gross Cc: Miklos Szeredi Cc: Tejun Heo Cc: Kirill Tkhai Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Greg Kroah-Hartman Cc: Julia Lawall Cc: Nikolaus Rath Cc: Han-Wen Nienhuys Signed-off-by: Kirill Smelkov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/fs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/fs.h b/include/linux/fs.h index bcad2b963296..5244df520bed 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -143,6 +143,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* Has write method(s) */ #define FMODE_CAN_WRITE ((__force fmode_t)0x40000) +/* File is stream-like */ +#define FMODE_STREAM ((__force fmode_t)0x200000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) @@ -2843,6 +2846,7 @@ extern loff_t no_seek_end_llseek_size(struct file *, loff_t, int, loff_t); extern loff_t no_seek_end_llseek(struct file *, loff_t, int); extern int generic_file_open(struct inode * inode, struct file * filp); extern int nonseekable_open(struct inode * inode, struct file * filp); +extern int stream_open(struct inode * inode, struct file * filp); #ifdef CONFIG_BLOCK typedef void (dio_submit_t)(struct bio *bio, struct inode *inode, -- cgit v1.2.3 From cfd8d2e79524aefd838152bbccc49374fdf46f52 Mon Sep 17 00:00:00 2001 From: Kirill Smelkov Date: Wed, 24 Apr 2019 07:13:57 +0000 Subject: fuse: Add FOPEN_STREAM to use stream_open() commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream. Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/fuse.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 42fa977e3b14..8ca749a109d5 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -215,10 +215,12 @@ struct fuse_file_lock { * FOPEN_DIRECT_IO: bypass page cache for this open file * FOPEN_KEEP_CACHE: don't invalidate the data cache on open * FOPEN_NONSEEKABLE: the file is not seekable + * FOPEN_STREAM: the file is stream-like (no file position at all) */ #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) #define FOPEN_NONSEEKABLE (1 << 2) +#define FOPEN_STREAM (1 << 4) /** * INIT request/reply flags -- cgit v1.2.3 From 8474fc0335aeba513243f921988608e5d3108357 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Sun, 5 May 2019 11:16:20 -0700 Subject: ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 upstream. Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is disabled. Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit") Reported-by: kbuild test robot Signed-off-by: David Ahern Signed-off-by: David S. Miller Cc: Nobuhiro Iwamatsu Signed-off-by: Greg Kroah-Hartman --- include/net/arp.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/net/arp.h b/include/net/arp.h index 1b3f86981757..92d2f7d7d1cb 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -17,6 +17,7 @@ static inline u32 arp_hashfn(const void *pkey, const struct net_device *dev, u32 return val * hash_rnd[0]; } +#ifdef CONFIG_INET static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key) { if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) @@ -24,6 +25,13 @@ static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev); } +#else +static inline +struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key) +{ + return NULL; +} +#endif static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32 key) { -- cgit v1.2.3 From cc1b58ccb78e0de51bcec1f2914d9296260668bd Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 17:31:03 -0700 Subject: tcp: limit payload size of sacked skbs commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream. Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Backport notes, provided by Joao Martins v4.15 or since commit 737ff314563 ("tcp: use sequence distance to detect reordering") had switched from the packet-based FACK tracking and switched to sequence-based. v4.14 and older still have the old logic and hence on tcp_skb_shift_data() needs to retain its original logic and have @fack_count in sync. In other words, we keep the increment of pcount with tcp_skb_pcount(skb) to later used that to update fack_count. To make it more explicit we track the new skb that gets incremented to pcount in @next_pcount, and we get to avoid the constant invocation of tcp_skb_pcount(skb) all together. Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet Reported-by: Jonathan Looney Acked-by: Neal Cardwell Reviewed-by: Tyler Hicks Cc: Yuchung Cheng Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/tcp.h | 3 +++ include/net/tcp.h | 2 ++ 2 files changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index d0c3615f9050..7f517458c64f 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -433,4 +433,7 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp) tp->saved_syn = NULL; } +int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, int pcount, + int shiftlen); + #endif /* _LINUX_TCP_H */ diff --git a/include/net/tcp.h b/include/net/tcp.h index fed2a78fb8cb..d7047de952f0 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -53,6 +53,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_HEADER (128 + MAX_HEADER) #define MAX_TCP_OPTION_SPACE 40 +#define TCP_MIN_SND_MSS 48 +#define TCP_MIN_GSO_SIZE (TCP_MIN_SND_MSS - MAX_TCP_OPTION_SPACE) /* * Never offer a window over 32767 without using window scaling. Some -- cgit v1.2.3 From e358f4af19db46ca25cc9a8a78412b09ba98859d Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 17:40:56 -0700 Subject: tcp: tcp_fragment() should apply sane memory limits commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream. Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Signed-off-by: Eric Dumazet Reported-by: Jonathan Looney Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Reviewed-by: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/snmp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 3442a26d36d9..56e3460d1f9f 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -282,6 +282,7 @@ enum LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */ LINUX_MIB_TCPMTUPFAIL, /* TCPMTUPFail */ LINUX_MIB_TCPMTUPSUCCESS, /* TCPMTUPSuccess */ + LINUX_MIB_TCPWQUEUETOOBIG, /* TCPWqueueTooBig */ __LINUX_MIB_MAX }; -- cgit v1.2.3 From 8e39cbc03dafa3731d22533f869bf326c0e6e6f8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2019 17:44:24 -0700 Subject: tcp: add tcp_min_snd_mss sysctl commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet Suggested-by: Jonathan Looney Acked-by: Neal Cardwell Cc: Yuchung Cheng Cc: Tyler Hicks Cc: Bruce Curtis Cc: Jonathan Lemon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/netns/ipv4.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 7adf4386ac8f..bf619a67ec03 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -94,6 +94,7 @@ struct netns_ipv4 { #endif int sysctl_tcp_mtu_probing; int sysctl_tcp_base_mss; + int sysctl_tcp_min_snd_mss; int sysctl_tcp_probe_threshold; u32 sysctl_tcp_probe_interval; -- cgit v1.2.3 From d7650c74b6ef62186cf187e26ef77bcecb2d7a07 Mon Sep 17 00:00:00 2001 From: Phong Hoang Date: Tue, 19 Mar 2019 19:40:08 +0900 Subject: pwm: Fix deadlock warning when removing PWM device MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 347ab9480313737c0f1aaa08e8f2e1a791235535 ] This patch fixes deadlock warning if removing PWM device when CONFIG_PROVE_LOCKING is enabled. This issue can be reproceduced by the following steps on the R-Car H3 Salvator-X board if the backlight is disabled: # cd /sys/class/pwm/pwmchip0 # echo 0 > export # ls device export npwm power pwm0 subsystem uevent unexport # cd device/driver # ls bind e6e31000.pwm uevent unbind # echo e6e31000.pwm > unbind [ 87.659974] ====================================================== [ 87.666149] WARNING: possible circular locking dependency detected [ 87.672327] 5.0.0 #7 Not tainted [ 87.675549] ------------------------------------------------------ [ 87.681723] bash/2986 is trying to acquire lock: [ 87.686337] 000000005ea0e178 (kn->count#58){++++}, at: kernfs_remove_by_name_ns+0x50/0xa0 [ 87.694528] [ 87.694528] but task is already holding lock: [ 87.700353] 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c [ 87.707405] [ 87.707405] which lock already depends on the new lock. [ 87.707405] [ 87.715574] [ 87.715574] the existing dependency chain (in reverse order) is: [ 87.723048] [ 87.723048] -> #1 (pwm_lock){+.+.}: [ 87.728017] __mutex_lock+0x70/0x7e4 [ 87.732108] mutex_lock_nested+0x1c/0x24 [ 87.736547] pwm_request_from_chip.part.6+0x34/0x74 [ 87.741940] pwm_request_from_chip+0x20/0x40 [ 87.746725] export_store+0x6c/0x1f4 [ 87.750820] dev_attr_store+0x18/0x28 [ 87.754998] sysfs_kf_write+0x54/0x64 [ 87.759175] kernfs_fop_write+0xe4/0x1e8 [ 87.763615] __vfs_write+0x40/0x184 [ 87.767619] vfs_write+0xa8/0x19c [ 87.771448] ksys_write+0x58/0xbc [ 87.775278] __arm64_sys_write+0x18/0x20 [ 87.779721] el0_svc_common+0xd0/0x124 [ 87.783986] el0_svc_compat_handler+0x1c/0x24 [ 87.788858] el0_svc_compat+0x8/0x18 [ 87.792947] [ 87.792947] -> #0 (kn->count#58){++++}: [ 87.798260] lock_acquire+0xc4/0x22c [ 87.802353] __kernfs_remove+0x258/0x2c4 [ 87.806790] kernfs_remove_by_name_ns+0x50/0xa0 [ 87.811836] remove_files.isra.1+0x38/0x78 [ 87.816447] sysfs_remove_group+0x48/0x98 [ 87.820971] sysfs_remove_groups+0x34/0x4c [ 87.825583] device_remove_attrs+0x6c/0x7c [ 87.830197] device_del+0x11c/0x33c [ 87.834201] device_unregister+0x14/0x2c [ 87.838638] pwmchip_sysfs_unexport+0x40/0x4c [ 87.843509] pwmchip_remove+0xf4/0x13c [ 87.847773] rcar_pwm_remove+0x28/0x34 [ 87.852039] platform_drv_remove+0x24/0x64 [ 87.856651] device_release_driver_internal+0x18c/0x21c [ 87.862391] device_release_driver+0x14/0x1c [ 87.867175] unbind_store+0xe0/0x124 [ 87.871265] drv_attr_store+0x20/0x30 [ 87.875442] sysfs_kf_write+0x54/0x64 [ 87.879618] kernfs_fop_write+0xe4/0x1e8 [ 87.884055] __vfs_write+0x40/0x184 [ 87.888057] vfs_write+0xa8/0x19c [ 87.891887] ksys_write+0x58/0xbc [ 87.895716] __arm64_sys_write+0x18/0x20 [ 87.900154] el0_svc_common+0xd0/0x124 [ 87.904417] el0_svc_compat_handler+0x1c/0x24 [ 87.909289] el0_svc_compat+0x8/0x18 [ 87.913378] [ 87.913378] other info that might help us debug this: [ 87.913378] [ 87.921374] Possible unsafe locking scenario: [ 87.921374] [ 87.927286] CPU0 CPU1 [ 87.931808] ---- ---- [ 87.936331] lock(pwm_lock); [ 87.939293] lock(kn->count#58); [ 87.945120] lock(pwm_lock); [ 87.950599] lock(kn->count#58); [ 87.953908] [ 87.953908] *** DEADLOCK *** [ 87.953908] [ 87.959821] 4 locks held by bash/2986: [ 87.963563] #0: 00000000ace7bc30 (sb_writers#6){.+.+}, at: vfs_write+0x188/0x19c [ 87.971044] #1: 00000000287991b2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xb4/0x1e8 [ 87.978872] #2: 00000000f739d016 (&dev->mutex){....}, at: device_release_driver_internal+0x40/0x21c [ 87.988001] #3: 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c [ 87.995481] [ 87.995481] stack backtrace: [ 87.999836] CPU: 0 PID: 2986 Comm: bash Not tainted 5.0.0 #7 [ 88.005489] Hardware name: Renesas Salvator-X board based on r8a7795 ES1.x (DT) [ 88.012791] Call trace: [ 88.015235] dump_backtrace+0x0/0x190 [ 88.018891] show_stack+0x14/0x1c [ 88.022204] dump_stack+0xb0/0xec [ 88.025514] print_circular_bug.isra.32+0x1d0/0x2e0 [ 88.030385] __lock_acquire+0x1318/0x1864 [ 88.034388] lock_acquire+0xc4/0x22c [ 88.037958] __kernfs_remove+0x258/0x2c4 [ 88.041874] kernfs_remove_by_name_ns+0x50/0xa0 [ 88.046398] remove_files.isra.1+0x38/0x78 [ 88.050487] sysfs_remove_group+0x48/0x98 [ 88.054490] sysfs_remove_groups+0x34/0x4c [ 88.058580] device_remove_attrs+0x6c/0x7c [ 88.062671] device_del+0x11c/0x33c [ 88.066154] device_unregister+0x14/0x2c [ 88.070070] pwmchip_sysfs_unexport+0x40/0x4c [ 88.074421] pwmchip_remove+0xf4/0x13c [ 88.078163] rcar_pwm_remove+0x28/0x34 [ 88.081906] platform_drv_remove+0x24/0x64 [ 88.085996] device_release_driver_internal+0x18c/0x21c [ 88.091215] device_release_driver+0x14/0x1c [ 88.095478] unbind_store+0xe0/0x124 [ 88.099048] drv_attr_store+0x20/0x30 [ 88.102704] sysfs_kf_write+0x54/0x64 [ 88.106359] kernfs_fop_write+0xe4/0x1e8 [ 88.110275] __vfs_write+0x40/0x184 [ 88.113757] vfs_write+0xa8/0x19c [ 88.117065] ksys_write+0x58/0xbc [ 88.120374] __arm64_sys_write+0x18/0x20 [ 88.124291] el0_svc_common+0xd0/0x124 [ 88.128034] el0_svc_compat_handler+0x1c/0x24 [ 88.132384] el0_svc_compat+0x8/0x18 The sysfs unexport in pwmchip_remove() is completely asymmetric to what we do in pwmchip_add_with_polarity() and commit 0733424c9ba9 ("pwm: Unexport children before chip removal") is a strong indication that this was wrong to begin with. We should just move pwmchip_sysfs_unexport() where it belongs, which is right after pwmchip_sysfs_unexport_children(). In that case, we do not need separate functions anymore either. We also really want to remove sysfs irrespective of whether or not the chip will be removed as a result of pwmchip_remove(). We can only assume that the driver will be gone after that, so we shouldn't leave any dangling sysfs files around. This warning disappears if we move pwmchip_sysfs_unexport() to the top of pwmchip_remove(), pwmchip_sysfs_unexport_children(). That way it is also outside of the pwm_lock section, which indeed doesn't seem to be needed. Moving the pwmchip_sysfs_export() call outside of that section also seems fine and it'd be perfectly symmetric with pwmchip_remove() again. So, this patch fixes them. Signed-off-by: Phong Hoang [shimoda: revise the commit log and code] Fixes: 76abbdde2d95 ("pwm: Add sysfs interface") Fixes: 0733424c9ba9 ("pwm: Unexport children before chip removal") Signed-off-by: Yoshihiro Shimoda Tested-by: Hoan Nguyen An Reviewed-by: Geert Uytterhoeven Reviewed-by: Simon Horman Reviewed-by: Uwe Kleine-König Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin --- include/linux/pwm.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 2c6c5114c089..f1bbae014889 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -641,7 +641,6 @@ static inline void pwm_remove_table(struct pwm_lookup *table, size_t num) #ifdef CONFIG_PWM_SYSFS void pwmchip_sysfs_export(struct pwm_chip *chip); void pwmchip_sysfs_unexport(struct pwm_chip *chip); -void pwmchip_sysfs_unexport_children(struct pwm_chip *chip); #else static inline void pwmchip_sysfs_export(struct pwm_chip *chip) { @@ -650,10 +649,6 @@ static inline void pwmchip_sysfs_export(struct pwm_chip *chip) static inline void pwmchip_sysfs_unexport(struct pwm_chip *chip) { } - -static inline void pwmchip_sysfs_unexport_children(struct pwm_chip *chip) -{ -} #endif /* CONFIG_PWM_SYSFS */ #endif /* __LINUX_PWM_H */ -- cgit v1.2.3 From 5e0c41ca896aa2487112742a76142ab84e85dc9d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 13 Jun 2019 09:28:42 +0200 Subject: Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections" This reverts commit 745f5c5f2ac14ac1cbb7fe3cbdc893c9d1af1356 which is commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream. Lots of people have reported issues with this patch, and as there does not seem to be a fix going into Linus's kernel tree any time soon, revert the commit in the stable trees so as to get people's machines working properly again. Reported-by: Vasily Khoruzhick Reported-by: Hans de Goede Cc: Jeremy Cline Cc: Marcel Holtmann Cc: Johan Hedberg Signed-off-by: Greg Kroah-Hartman --- include/net/bluetooth/hci_core.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 57a7dba49d29..4931787193c3 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -176,9 +176,6 @@ struct adv_info { #define HCI_MAX_SHORT_NAME_LENGTH 10 -/* Min encryption key size to match with SMP */ -#define HCI_MIN_ENC_KEY_SIZE 7 - /* Default LE RPA expiry time, 15 minutes */ #define HCI_DEFAULT_RPA_TIMEOUT (15 * 60) -- cgit v1.2.3 From df260f7a037164aa1a617bb9728a7a6c44117e9e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 29 May 2019 13:46:25 -0700 Subject: cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css() commit 18fa84a2db0e15b02baa5d94bdb5bd509175d2f6 upstream. A PF_EXITING task can stay associated with an offline css. If such task calls task_get_css(), it can get stuck indefinitely. This can be triggered by BSD process accounting which writes to a file with PF_EXITING set when racing against memcg disable as in the backtrace at the end. After this change, task_get_css() may return a css which was already offline when the function was called. None of the existing users are affected by this change. INFO: rcu_sched self-detected stall on CPU INFO: rcu_sched detected stalls on CPUs/tasks: ... NMI backtrace for cpu 0 ... Call Trace: dump_stack+0x46/0x68 nmi_cpu_backtrace.cold.2+0x13/0x57 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x9e/0xce rcu_check_callbacks.cold.74+0x2af/0x433 update_process_times+0x28/0x60 tick_sched_timer+0x34/0x70 __hrtimer_run_queues+0xee/0x250 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x56/0x110 apic_timer_interrupt+0xf/0x20 RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0 ... btrfs_file_write_iter+0x31b/0x563 __vfs_write+0xfa/0x140 __kernel_write+0x4f/0x100 do_acct_process+0x495/0x580 acct_process+0xb9/0xdb do_exit+0x748/0xa00 do_group_exit+0x3a/0xa0 get_signal+0x254/0x560 do_signal+0x23/0x5c0 exit_to_usermode_loop+0x5d/0xa0 prepare_exit_to_usermode+0x53/0x80 retint_user+0x8/0x8 Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org # v4.2+ Fixes: ec438699a9ae ("cgroup, block: implement task_get_css() and use it in bio_associate_current()") Signed-off-by: Greg Kroah-Hartman --- include/linux/cgroup.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 7620a8bc0493..8be03520995c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -462,7 +462,7 @@ static inline struct cgroup_subsys_state *task_css(struct task_struct *task, * * Find the css for the (@task, @subsys_id) combination, increment a * reference on and return it. This function is guaranteed to return a - * valid css. + * valid css. The returned css may already have been offlined. */ static inline struct cgroup_subsys_state * task_get_css(struct task_struct *task, int subsys_id) @@ -472,7 +472,13 @@ task_get_css(struct task_struct *task, int subsys_id) rcu_read_lock(); while (true) { css = task_css(task, subsys_id); - if (likely(css_tryget_online(css))) + /* + * Can't use css_tryget_online() here. A task which has + * PF_EXITING set may stay associated with an offline css. + * If such task calls this function, css_tryget_online() + * will keep failing. + */ + if (likely(css_tryget(css))) break; cpu_relax(); } -- cgit v1.2.3 From 6290d9d3192e7a973d59fd22fb72028a34dab372 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Wed, 24 Apr 2019 22:19:17 +0200 Subject: Bluetooth: Align minimum encryption key size for LE and BR/EDR connections commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream. The minimum encryption key size for LE connections is 56 bits and to align LE with BR/EDR, enforce 56 bits of minimum encryption key size for BR/EDR connections as well. Signed-off-by: Marcel Holtmann Signed-off-by: Johan Hedberg Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- include/net/bluetooth/hci_core.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 4931787193c3..57a7dba49d29 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -176,6 +176,9 @@ struct adv_info { #define HCI_MAX_SHORT_NAME_LENGTH 10 +/* Min encryption key size to match with SMP */ +#define HCI_MIN_ENC_KEY_SIZE 7 + /* Default LE RPA expiry time, 15 minutes */ #define HCI_DEFAULT_RPA_TIMEOUT (15 * 60) -- cgit v1.2.3 From 074d0aaec0c61ab19099a1d31d08c7552ed97a16 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 21 Feb 2018 14:45:54 -0800 Subject: bug.h: work around GCC PR82365 in BUG() [ Upstream commit 173a3efd3edb2ef6ef07471397c5f542a360e9c1 ] Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de Signed-off-by: Arnd Bergmann Acked-by: Vineet Gupta [arch/arc] Tested-by: Vineet Gupta [arch/arc] Cc: Mikael Starvik Cc: Jesper Nilsson Cc: Tony Luck Cc: Fenghua Yu Cc: Geert Uytterhoeven Cc: "David S. Miller" Cc: Christopher Li Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Kees Cook Cc: Ingo Molnar Cc: Josh Poimboeuf Cc: Will Deacon Cc: "Steven Rostedt (VMware)" Cc: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin [removed cris chunks - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/asm-generic/bug.h | 1 + include/linux/compiler-gcc.h | 15 ++++++++++++++- include/linux/compiler.h | 5 +++++ 3 files changed, 20 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h index 6f96247226a4..89f079d6b41b 100644 --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -47,6 +47,7 @@ struct bug_entry { #ifndef HAVE_ARCH_BUG #define BUG() do { \ printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \ + barrier_before_unreachable(); \ panic("BUG!"); \ } while (0) #endif diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 8e9b0cb8db41..61650c1830d4 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -233,6 +233,15 @@ #define annotate_unreachable() #endif +/* + * calling noreturn functions, __builtin_unreachable() and __builtin_trap() + * confuse the stack allocation in gcc, leading to overly large stack + * frames, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 + * + * Adding an empty inline assembly before it works around the problem + */ +#define barrier_before_unreachable() asm volatile("") + /* * Mark a position in code as unreachable. This can be used to * suppress control flow warnings after asm blocks that transfer @@ -243,7 +252,11 @@ * unreleased. Really, we need to have autoconf for the kernel. */ #define unreachable() \ - do { annotate_unreachable(); __builtin_unreachable(); } while (0) + do { \ + annotate_unreachable(); \ + barrier_before_unreachable(); \ + __builtin_unreachable(); \ + } while (0) /* Mark a function definition as prohibited from being cloned. */ #define __noclone __attribute__((__noclone__, __optimize__("no-tracer"))) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 4f3dfabb680f..80a5bc623c47 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -177,6 +177,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); # define barrier_data(ptr) barrier() #endif +/* workaround for GCC PR82365 if needed */ +#ifndef barrier_before_unreachable +# define barrier_before_unreachable() do { } while (0) +#endif + /* Unreachable code */ #ifndef unreachable # define unreachable() do { } while (1) -- cgit v1.2.3 From 3b2f8a66ae6034e9f32d6baeae517ad3400161a3 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Mon, 17 Jun 2019 21:34:14 +0800 Subject: ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL [ Upstream commit 6f6a8622057c92408930c31698394fae1557b188 ] A similar fix to Patch "ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL" is also needed by ip6_tunnel. Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/ip6_tunnel.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h index 1b1cf33cbfb0..2b6abd046087 100644 --- a/include/net/ip6_tunnel.h +++ b/include/net/ip6_tunnel.h @@ -149,9 +149,12 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb, memset(skb->cb, 0, sizeof(struct inet6_skb_parm)); pkt_len = skb->len - skb_inner_network_offset(skb); err = ip6_local_out(dev_net(skb_dst(skb)->dev), sk, skb); - if (unlikely(net_xmit_eval(err))) - pkt_len = -1; - iptunnel_xmit_stats(dev, pkt_len); + + if (dev) { + if (unlikely(net_xmit_eval(err))) + pkt_len = -1; + iptunnel_xmit_stats(dev, pkt_len); + } } #endif #endif -- cgit v1.2.3 From 6f56992a4e66b699f072459e3333946078f6cc83 Mon Sep 17 00:00:00 2001 From: Vishnu DASA Date: Fri, 24 May 2019 15:13:10 +0000 Subject: VMCI: Fix integer overflow in VMCI handle arrays commit 1c2eb5b2853c9f513690ba6b71072d8eb65da16a upstream. The VMCI handle array has an integer overflow in vmci_handle_arr_append_entry when it tries to expand the array. This can be triggered from a guest, since the doorbell link hypercall doesn't impose a limit on the number of doorbell handles that a VM can create in the hypervisor, and these handles are stored in a handle array. In this change, we introduce a mandatory max capacity for handle arrays/lists to avoid excessive memory usage. Signed-off-by: Vishnu Dasa Reviewed-by: Adit Ranadive Reviewed-by: Jorgen Hansen Cc: stable Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 1bd31a38c51e..ce13d4677f1d 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -75,9 +75,18 @@ enum { /* * A single VMCI device has an upper limit of 128MB on the amount of - * memory that can be used for queue pairs. + * memory that can be used for queue pairs. Since each queue pair + * consists of at least two pages, the memory limit also dictates the + * number of queue pairs a guest can create. */ #define VMCI_MAX_GUEST_QP_MEMORY (128 * 1024 * 1024) +#define VMCI_MAX_GUEST_QP_COUNT (VMCI_MAX_GUEST_QP_MEMORY / PAGE_SIZE / 2) + +/* + * There can be at most PAGE_SIZE doorbells since there is one doorbell + * per byte in the doorbell bitmap page. + */ +#define VMCI_MAX_GUEST_DOORBELL_COUNT PAGE_SIZE /* * Queues with pre-mapped data pages must be small, so that we don't pin -- cgit v1.2.3 From 4bc014488921c1ba0a47d9d55772cbed0489ca20 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 11 Jul 2019 20:52:18 -0700 Subject: nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header commit c32cc30c0544f13982ee0185d55f4910319b1a79 upstream. cpu_to_le32/le32_to_cpu is defined in include/linux/byteorder/generic.h, which is not exported to user-space. UAPI headers must use the ones prefixed with double-underscore. Detected by compile-testing exported headers: include/linux/nilfs2_ondisk.h: In function `nilfs_checkpoint_set_snapshot': include/linux/nilfs2_ondisk.h:536:17: error: implicit declaration of function `cpu_to_le32' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h:536:29: error: implicit declaration of function `le32_to_cpu' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h: In function `nilfs_segment_usage_set_clean': include/linux/nilfs2_ondisk.h:622:19: error: implicit declaration of function `cpu_to_le64' [-Werror=implicit-function-declaration] su->su_lastmod = cpu_to_le64(0); ^~~~~~~~~~~ Link: http://lkml.kernel.org/r/20190605053006.14332-1-yamada.masahiro@socionext.com Fixes: e63e88bc53ba ("nilfs2: move ioctl interface and disk layout to uapi separately") Signed-off-by: Masahiro Yamada Acked-by: Ryusuke Konishi Cc: Arnd Bergmann Cc: Greg KH Cc: Joe Perches Cc: [4.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/nilfs2_ondisk.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/nilfs2_ondisk.h b/include/uapi/linux/nilfs2_ondisk.h index 2a8a3addb675..f9b6b5be7ddf 100644 --- a/include/uapi/linux/nilfs2_ondisk.h +++ b/include/uapi/linux/nilfs2_ondisk.h @@ -28,7 +28,7 @@ #include #include - +#include #define NILFS_INODE_BMAP_SIZE 7 @@ -532,19 +532,19 @@ enum { static inline void \ nilfs_checkpoint_set_##name(struct nilfs_checkpoint *cp) \ { \ - cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ - (1UL << NILFS_CHECKPOINT_##flag)); \ + cp->cp_flags = __cpu_to_le32(__le32_to_cpu(cp->cp_flags) | \ + (1UL << NILFS_CHECKPOINT_##flag)); \ } \ static inline void \ nilfs_checkpoint_clear_##name(struct nilfs_checkpoint *cp) \ { \ - cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) & \ + cp->cp_flags = __cpu_to_le32(__le32_to_cpu(cp->cp_flags) & \ ~(1UL << NILFS_CHECKPOINT_##flag)); \ } \ static inline int \ nilfs_checkpoint_##name(const struct nilfs_checkpoint *cp) \ { \ - return !!(le32_to_cpu(cp->cp_flags) & \ + return !!(__le32_to_cpu(cp->cp_flags) & \ (1UL << NILFS_CHECKPOINT_##flag)); \ } @@ -594,20 +594,20 @@ enum { static inline void \ nilfs_segment_usage_set_##name(struct nilfs_segment_usage *su) \ { \ - su->su_flags = cpu_to_le32(le32_to_cpu(su->su_flags) | \ + su->su_flags = __cpu_to_le32(__le32_to_cpu(su->su_flags) | \ (1UL << NILFS_SEGMENT_USAGE_##flag));\ } \ static inline void \ nilfs_segment_usage_clear_##name(struct nilfs_segment_usage *su) \ { \ su->su_flags = \ - cpu_to_le32(le32_to_cpu(su->su_flags) & \ + __cpu_to_le32(__le32_to_cpu(su->su_flags) & \ ~(1UL << NILFS_SEGMENT_USAGE_##flag)); \ } \ static inline int \ nilfs_segment_usage_##name(const struct nilfs_segment_usage *su) \ { \ - return !!(le32_to_cpu(su->su_flags) & \ + return !!(__le32_to_cpu(su->su_flags) & \ (1UL << NILFS_SEGMENT_USAGE_##flag)); \ } @@ -618,15 +618,15 @@ NILFS_SEGMENT_USAGE_FNS(ERROR, error) static inline void nilfs_segment_usage_set_clean(struct nilfs_segment_usage *su) { - su->su_lastmod = cpu_to_le64(0); - su->su_nblocks = cpu_to_le32(0); - su->su_flags = cpu_to_le32(0); + su->su_lastmod = __cpu_to_le64(0); + su->su_nblocks = __cpu_to_le32(0); + su->su_flags = __cpu_to_le32(0); } static inline int nilfs_segment_usage_clean(const struct nilfs_segment_usage *su) { - return !le32_to_cpu(su->su_flags); + return !__le32_to_cpu(su->su_flags); } /** -- cgit v1.2.3 From 8151383a170ae82a3c795027cf79933cd6a1edd9 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 21 May 2019 16:48:43 -0400 Subject: rcu: Force inlining of rcu_read_lock() [ Upstream commit 6da9f775175e516fc7229ceaa9b54f8f56aa7924 ] When debugging options are turned on, the rcu_read_lock() function might not be inlined. This results in lockdep's print_lock() function printing "rcu_read_lock+0x0/0x70" instead of rcu_read_lock()'s caller. For example: [ 10.579995] ============================= [ 10.584033] WARNING: suspicious RCU usage [ 10.588074] 4.18.0.memcg_v2+ #1 Not tainted [ 10.593162] ----------------------------- [ 10.597203] include/linux/rcupdate.h:281 Illegal context switch in RCU read-side critical section! [ 10.606220] [ 10.606220] other info that might help us debug this: [ 10.606220] [ 10.614280] [ 10.614280] rcu_scheduler_active = 2, debug_locks = 1 [ 10.620853] 3 locks held by systemd/1: [ 10.624632] #0: (____ptrval____) (&type->i_mutex_dir_key#5){.+.+}, at: lookup_slow+0x42/0x70 [ 10.633232] #1: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 [ 10.640954] #2: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 These "rcu_read_lock+0x0/0x70" strings are not providing any useful information. This commit therefore forces inlining of the rcu_read_lock() function so that rcu_read_lock()'s caller is instead shown. Signed-off-by: Waiman Long Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- include/linux/rcupdate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index aa2935779e43..96037ba940ee 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -866,7 +866,7 @@ static inline void rcu_preempt_sleep_check(void) * read-side critical sections may be preempted and they may also block, but * only when acquiring spinlocks that are subject to priority inheritance. */ -static inline void rcu_read_lock(void) +static __always_inline void rcu_read_lock(void) { __rcu_read_lock(); __acquire(RCU); -- cgit v1.2.3 From df5b05868d66a58d184e2ce8b87abe86918a5fd8 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Thu, 30 May 2019 12:50:43 +0200 Subject: clocksource/drivers/exynos_mct: Increase priority over ARM arch timer [ Upstream commit 6282edb72bed5324352522d732080d4c1b9dfed6 ] Exynos SoCs based on CA7/CA15 have 2 timer interfaces: custom Exynos MCT (Multi Core Timer) and standard ARM Architected Timers. There are use cases, where both timer interfaces are used simultanously. One of such examples is using Exynos MCT for the main system timer and ARM Architected Timers for the KVM and virtualized guests (KVM requires arch timers). Exynos Multi-Core Timer driver (exynos_mct) must be however started before ARM Architected Timers (arch_timer), because they both share some common hardware blocks (global system counter) and turning on MCT is needed to get ARM Architected Timer working properly. To ensure selecting Exynos MCT as the main system timer, increase MCT timer rating. To ensure proper starting order of both timers during suspend/resume cycle, increase MCT hotplug priority over ARM Archictected Timers. Signed-off-by: Marek Szyprowski Reviewed-by: Krzysztof Kozlowski Reviewed-by: Chanwoo Choi Signed-off-by: Daniel Lezcano Signed-off-by: Sasha Levin --- include/linux/cpuhotplug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index c9447a689522..1ab0273560ae 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -77,10 +77,10 @@ enum cpuhp_state { CPUHP_AP_PERF_ARM_HW_BREAKPOINT_STARTING, CPUHP_AP_PERF_ARM_STARTING, CPUHP_AP_ARM_L2X0_STARTING, + CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, CPUHP_AP_JCORE_TIMER_STARTING, - CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING, CPUHP_AP_METAG_TIMER_STARTING, CPUHP_AP_QCOM_TIMER_STARTING, -- cgit v1.2.3 From 229b670e66689326c97639b4767c12c33462dac1 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 1 Feb 2018 21:00:48 +0300 Subject: compiler.h, kasan: Avoid duplicating __read_once_size_nocheck() [ Upstream commit bdb5ac801af3d81d36732c2f640d6a1d3df83826 ] Instead of having two identical __read_once_size_nocheck() functions with different attributes, consolidate all the difference in new macro __no_kasan_or_inline and use it. No functional changes. Signed-off-by: Andrey Ryabinin Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/compiler.h | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 80a5bc623c47..ced454c03819 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -250,23 +250,21 @@ void __read_once_size(const volatile void *p, void *res, int size) #ifdef CONFIG_KASAN /* - * This function is not 'inline' because __no_sanitize_address confilcts + * We can't declare function 'inline' because __no_sanitize_address confilcts * with inlining. Attempt to inline it may cause a build failure. * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 * '__maybe_unused' allows us to avoid defined-but-not-used warnings. */ -static __no_sanitize_address __maybe_unused -void __read_once_size_nocheck(const volatile void *p, void *res, int size) -{ - __READ_ONCE_SIZE; -} +# define __no_kasan_or_inline __no_sanitize_address __maybe_unused #else -static __always_inline +# define __no_kasan_or_inline __always_inline +#endif + +static __no_kasan_or_inline void __read_once_size_nocheck(const volatile void *p, void *res, int size) { __READ_ONCE_SIZE; } -#endif static __always_inline void __write_once_size(volatile void *p, void *res, int size) { -- cgit v1.2.3 From 4b5d4bdfd1ea2c2946f55ba309e17440b4ddada2 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 1 Feb 2018 21:00:49 +0300 Subject: compiler.h: Add read_word_at_a_time() function. [ Upstream commit 7f1e541fc8d57a143dd5df1d0a1276046e08c083 ] Sometimes we know that it's safe to do potentially out-of-bounds access because we know it won't cross a page boundary. Still, KASAN will report this as a bug. Add read_word_at_a_time() function which is supposed to be used in such cases. In read_word_at_a_time() KASAN performs relaxed check - only the first byte of access is validated. Signed-off-by: Andrey Ryabinin Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/compiler.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index ced454c03819..3050de0dac96 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -302,6 +302,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s * with an explicit memory barrier or atomic instruction that provides the * required ordering. */ +#include #define __READ_ONCE(x, check) \ ({ \ @@ -320,6 +321,13 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s */ #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0) +static __no_kasan_or_inline +unsigned long read_word_at_a_time(const void *addr) +{ + kasan_check_read(addr, 1); + return *(unsigned long *)addr; +} + #define WRITE_ONCE(x, val) \ ({ \ union { typeof(x) __val; char __c[1]; } __u = \ -- cgit v1.2.3 From 50810015e027476591c275b8b8a9a433fc577c72 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 11 Jul 2019 09:54:40 -0700 Subject: access: avoid the RCU grace period for the temporary subjective credentials commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream. It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) Acked-by: Eric Dumazet Acked-by: Paul E. McKenney Cc: Oleg Nesterov Cc: Jan Glauber Cc: Jiri Kosina Cc: Jayachandran Chandrasekharan Nair Cc: Greg KH Cc: Kees Cook Cc: David Howells Cc: Miklos Szeredi Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/cred.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/cred.h b/include/linux/cred.h index cf1a5d0c4eb4..4f614042214b 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -144,7 +144,11 @@ struct cred { struct user_struct *user; /* real user ID subscription */ struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */ struct group_info *group_info; /* supplementary groups for euid/fsgid */ - struct rcu_head rcu; /* RCU deletion hook */ + /* RCU deletion */ + union { + int non_rcu; /* Can we skip RCU deletion? */ + struct rcu_head rcu; /* RCU deletion hook */ + }; }; extern void __put_cred(struct cred *); @@ -242,6 +246,7 @@ static inline const struct cred *get_cred(const struct cred *cred) { struct cred *nonconst_cred = (struct cred *) cred; validate_creds(cred); + nonconst_cred->non_rcu = 0; return get_new_cred(nonconst_cred); } -- cgit v1.2.3 From 704533394e488a109fe46ab3693315376c3824d5 Mon Sep 17 00:00:00 2001 From: Soheil Hassas Yeganeh Date: Mon, 29 Jul 2019 21:21:08 +0800 Subject: tcp: reset sk_send_head in tcp_write_queue_purge [ Upstream commit dbbf2d1e4077bab0c65ece2765d3fc69cf7d610f ] tcp_write_queue_purge clears all the SKBs in the write queue but does not reset the sk_send_head. As a result, we can have a NULL pointer dereference anywhere that we use tcp_send_head instead of the tcp_write_queue_tail. For example, after a27fd7a8ed38 (tcp: purge write queue upon RST), we can purge the write queue on RST. Prior to 75c119afe14f (tcp: implement rb-tree based retransmit queue), tcp_push will only check tcp_send_head and then accesses tcp_write_queue_tail to send the actual SKB. As a result, it will dereference a NULL pointer. This has been reported twice for 4.14 where we don't have 75c119afe14f: By Timofey Titovets: [ 422.081094] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 [ 422.081254] IP: tcp_push+0x42/0x110 [ 422.081314] PGD 0 P4D 0 [ 422.081364] Oops: 0002 [#1] SMP PTI By Yongjian Xu: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: tcp_push+0x48/0x120 PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0 Oops: 0002 [#18] SMP PTI Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1 Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A 09/22/2014 task: ffff8807d78d8140 task.stack: ffffc9000e944000 RIP: 0010:tcp_push+0x48/0x120 RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246 RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000 RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000 R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000 R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000 FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0 Call Trace: tcp_sendmsg_locked+0x33d/0xe50 tcp_sendmsg+0x37/0x60 inet_sendmsg+0x39/0xc0 sock_sendmsg+0x49/0x60 sock_write_iter+0xb6/0x100 do_iter_readv_writev+0xec/0x130 ? rw_verify_area+0x49/0xb0 do_iter_write+0x97/0xd0 vfs_writev+0x7e/0xe0 ? __wake_up_common_lock+0x80/0xa0 ? __fget_light+0x2c/0x70 ? __do_page_fault+0x1e7/0x530 do_writev+0x60/0xf0 ? inet_shutdown+0xac/0x110 SyS_writev+0x10/0x20 do_syscall_64+0x6f/0x140 ? prepare_exit_to_usermode+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x3135ce0c57 RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57 RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38 R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002 Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39 d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74 06 00 00 44 89 8f 7c 06 00 00 83 e6 01 RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88 CR2: 0000000000000038 ---[ end trace 8d545c2e93515549 ]--- There is other scenario which found in stable 4.4: Allocated: [] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218 [] alloc_skb_fclone include/linux/skbuff.h:856 [inline] [] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833 [] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178 [] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755 Freed: [] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676 [] sk_wmem_free_skb include/net/sock.h:1447 [inline] [] tcp_write_queue_purge include/net/tcp.h:1460 [inline] [] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline] [] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261 [] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246 BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline] BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline] BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056 [] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408 [] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427 [] tcp_skb_pcount include/net/tcp.h:796 [inline] [] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline] [] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056 [] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307 stable 4.4 and stable 4.9 don't have the commit abb4a8b870b5 ("tcp: purge write queue upon RST") which is referred in dbbf2d1e4077, in tcp_connect_init, it calls tcp_write_queue_purge, and does not reset sk_send_head, then UAF. stable 4.14 have the commit abb4a8b870b5 ("tcp: purge write queue upon RST"), in tcp_reset, it calls tcp_write_queue_purge(sk), and does not reset sk_send_head, then UAF. So this patch can be used to fix stable 4.4 and 4.9. Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST) Reported-by: Timofey Titovets Reported-by: Yongjian Xu Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh Tested-by: Yongjian Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Mao Wenan Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index d7047de952f0..1eda31f7f013 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1512,6 +1512,11 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb, void tcp_fastopen_init_key_once(bool publish); #define TCP_FASTOPEN_KEY_LENGTH 16 +static inline void tcp_init_send_head(struct sock *sk) +{ + sk->sk_send_head = NULL; +} + /* Fastopen key context */ struct tcp_fastopen_context { struct crypto_cipher *tfm; @@ -1528,6 +1533,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); tcp_clear_all_retrans_hints(tcp_sk(sk)); + tcp_init_send_head(sk); inet_csk(sk)->icsk_backoff = 0; } @@ -1589,11 +1595,6 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli tcp_sk(sk)->highest_sack = NULL; } -static inline void tcp_init_send_head(struct sock *sk) -{ - sk->sk_send_head = NULL; -} - static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { __skb_queue_tail(&sk->sk_write_queue, skb); -- cgit v1.2.3 From 837ffc9723f04aeb5bf252ef926c16aea1f5a0ee Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 16 Jul 2019 17:20:45 +0200 Subject: sched/fair: Don't free p->numa_faults with concurrent readers commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream. When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Thomas Gleixner Cc: Will Deacon Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/sched.h b/include/linux/sched.h index 1c487a3abd84..275511b60978 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2044,7 +2044,7 @@ static inline bool in_vfork(struct task_struct *tsk) extern void task_numa_fault(int last_node, int node, int pages, int flags); extern pid_t task_numa_group_id(struct task_struct *p); extern void set_numabalancing_state(bool enabled); -extern void task_numa_free(struct task_struct *p); +extern void task_numa_free(struct task_struct *p, bool final); extern bool should_numa_migrate_memory(struct task_struct *p, struct page *page, int src_nid, int dst_cpu); #else @@ -2059,7 +2059,7 @@ static inline pid_t task_numa_group_id(struct task_struct *p) static inline void set_numabalancing_state(bool enabled) { } -static inline void task_numa_free(struct task_struct *p) +static inline void task_numa_free(struct task_struct *p, bool final) { } static inline bool should_numa_migrate_memory(struct task_struct *p, -- cgit v1.2.3 From 0153dbcbd4c842c38955b7d80a4708bb012052a7 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 12 Jul 2019 11:01:21 +0200 Subject: ACPI: fix false-positive -Wuninitialized warning [ Upstream commit dfd6f9ad36368b8dbd5f5a2b2f0a4705ae69a323 ] clang gets confused by an uninitialized variable in what looks to it like a never executed code path: arch/x86/kernel/acpi/boot.c:618:13: error: variable 'polarity' is uninitialized when used here [-Werror,-Wuninitialized] polarity = polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH; ^~~~~~~~ arch/x86/kernel/acpi/boot.c:606:32: note: initialize the variable 'polarity' to silence this warning int rc, irq, trigger, polarity; ^ = 0 arch/x86/kernel/acpi/boot.c:617:12: error: variable 'trigger' is uninitialized when used here [-Werror,-Wuninitialized] trigger = trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE; ^~~~~~~ arch/x86/kernel/acpi/boot.c:606:22: note: initialize the variable 'trigger' to silence this warning int rc, irq, trigger, polarity; ^ = 0 This is unfortunately a design decision in clang and won't be fixed. Changing the acpi_get_override_irq() macro to an inline function reliably avoids the issue. Signed-off-by: Arnd Bergmann Reviewed-by: Andy Shevchenko Reviewed-by: Nathan Chancellor Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- include/linux/acpi.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index ca2b4c4aec42..719eb97217a3 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -309,7 +309,10 @@ void acpi_set_irq_model(enum acpi_irq_model_id model, #ifdef CONFIG_X86_IO_APIC extern int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity); #else -#define acpi_get_override_irq(gsi, trigger, polarity) (-1) +static inline int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity) +{ + return -1; +} #endif /* * This function undoes the effect of one call to acpi_register_gsi(). -- cgit v1.2.3 From 90320a506cbe57a339c59fce30eea003f28f0354 Mon Sep 17 00:00:00 2001 From: Sam Protsenko Date: Tue, 16 Jul 2019 16:28:20 -0700 Subject: coda: fix build using bare-metal toolchain [ Upstream commit b2a57e334086602be56b74958d9f29b955cd157f ] The kernel is self-contained project and can be built with bare-metal toolchain. But bare-metal toolchain doesn't define __linux__. Because of this u_quad_t type is not defined when using bare-metal toolchain and codafs build fails. This patch fixes it by defining u_quad_t type unconditionally. Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Sam Protsenko Signed-off-by: Jan Harkes Cc: Arnd Bergmann Cc: Colin Ian King Cc: Dan Carpenter Cc: David Howells Cc: Fabian Frederick Cc: Mikko Rapeli Cc: Yann Droneaud Cc: Zhouyang Jia Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/coda.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/coda.h b/include/linux/coda.h index d30209b9cef8..0ca0c83fdb1c 100644 --- a/include/linux/coda.h +++ b/include/linux/coda.h @@ -58,8 +58,7 @@ Mellon the rights to redistribute these changes without encumbrance. #ifndef _CODA_HEADER_ #define _CODA_HEADER_ -#if defined(__linux__) typedef unsigned long long u_quad_t; -#endif + #include #endif -- cgit v1.2.3 From 317fc4dd5212208734d99ae87bdbb84aa0a7bcec Mon Sep 17 00:00:00 2001 From: Mikko Rapeli Date: Tue, 16 Jul 2019 16:28:10 -0700 Subject: uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers [ Upstream commit f90fb3c7e2c13ae829db2274b88b845a75038b8a ] Only users of upc_req in kernel side fs/coda/psdev.c and fs/coda/upcall.c already include linux/coda_psdev.h. Suggested by Jan Harkes in https://lore.kernel.org/lkml/20150531111913.GA23377@cs.cmu.edu/ Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace: linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type struct list_head uc_chain; ^ linux/coda_psdev.h:13:2: error: unknown type name `caddr_t' caddr_t uc_data; ^ linux/coda_psdev.h:14:2: error: unknown type name `u_short' u_short uc_flags; ^ linux/coda_psdev.h:15:2: error: unknown type name `u_short' u_short uc_inSize; /* Size is at most 5000 bytes */ ^ linux/coda_psdev.h:16:2: error: unknown type name `u_short' u_short uc_outSize; ^ linux/coda_psdev.h:17:2: error: unknown type name `u_short' u_short uc_opcode; /* copied from data to save lookup */ ^ linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t' wait_queue_head_t uc_sleep; /* process' wait queue */ ^ Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Mikko Rapeli Signed-off-by: Jan Harkes Cc: Arnd Bergmann Cc: Colin Ian King Cc: Dan Carpenter Cc: David Howells Cc: Fabian Frederick Cc: Sam Protsenko Cc: Yann Droneaud Cc: Zhouyang Jia Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/coda_psdev.h | 11 +++++++++++ include/uapi/linux/coda_psdev.h | 13 ------------- 2 files changed, 11 insertions(+), 13 deletions(-) (limited to 'include') diff --git a/include/linux/coda_psdev.h b/include/linux/coda_psdev.h index 5b8721efa948..fe1466daf291 100644 --- a/include/linux/coda_psdev.h +++ b/include/linux/coda_psdev.h @@ -19,6 +19,17 @@ struct venus_comm { struct mutex vc_mutex; }; +/* messages between coda filesystem in kernel and Venus */ +struct upc_req { + struct list_head uc_chain; + caddr_t uc_data; + u_short uc_flags; + u_short uc_inSize; /* Size is at most 5000 bytes */ + u_short uc_outSize; + u_short uc_opcode; /* copied from data to save lookup */ + int uc_unique; + wait_queue_head_t uc_sleep; /* process' wait queue */ +}; static inline struct venus_comm *coda_vcp(struct super_block *sb) { diff --git a/include/uapi/linux/coda_psdev.h b/include/uapi/linux/coda_psdev.h index 79d05981fc4b..e2c44d2f7d5b 100644 --- a/include/uapi/linux/coda_psdev.h +++ b/include/uapi/linux/coda_psdev.h @@ -6,19 +6,6 @@ #define CODA_PSDEV_MAJOR 67 #define MAX_CODADEVS 5 /* how many do we allow */ - -/* messages between coda filesystem in kernel and Venus */ -struct upc_req { - struct list_head uc_chain; - caddr_t uc_data; - u_short uc_flags; - u_short uc_inSize; /* Size is at most 5000 bytes */ - u_short uc_outSize; - u_short uc_opcode; /* copied from data to save lookup */ - int uc_unique; - wait_queue_head_t uc_sleep; /* process' wait queue */ -}; - #define CODA_REQ_ASYNC 0x1 #define CODA_REQ_READ 0x2 #define CODA_REQ_WRITE 0x4 -- cgit v1.2.3 From 16903f1a5ba7707c051edfdfa457620bba45e2c9 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Thu, 18 Apr 2019 17:50:52 -0700 Subject: coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli Reported-by: Jann Horn Suggested-by: Oleg Nesterov Acked-by: Peter Xu Reviewed-by: Mike Rapoport Reviewed-by: Oleg Nesterov Reviewed-by: Jann Horn Acked-by: Jason Gunthorpe Acked-by: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [akaher@vmware.com: stable 4.9 backport - handle binder_update_page_range - mhocko@suse.com] Signed-off-by: Ajay Kaher Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index 478466081265..9b36cc57329e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1192,6 +1192,26 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long start, unsigned long end); +/* + * This has to be called after a get_task_mm()/mmget_not_zero() + * followed by taking the mmap_sem for writing before modifying the + * vmas or anything the coredump pretends not to change from under it. + * + * NOTE: find_extend_vma() called from GUP context is the only place + * that can modify the "mm" (notably the vm_start/end) under mmap_sem + * for reading and outside the context of the process, so it is also + * the only case that holds the mmap_sem for reading that must call + * this function. Generally if the mmap_sem is hold for reading + * there's no need of this check after get_task_mm()/mmget_not_zero(). + * + * This function can be obsoleted and the check can be removed, after + * the coredump code will hold the mmap_sem for writing before + * invoking the ->core_dump methods. + */ +static inline bool mmget_still_valid(struct mm_struct *mm) +{ + return likely(!mm->core_state); +} /** * mm_walk - callbacks for walk_page_range -- cgit v1.2.3 From d4fc64c92713f6962672439739d0f0dc4117cb5b Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Thu, 13 Jun 2019 15:56:11 -0700 Subject: coredump: fix race condition between collapse_huge_page() and core dumping commit 59ea6d06cfa9247b586a695c21f94afa7183af74 upstream. When fixing the race conditions between the coredump and the mmap_sem holders outside the context of the process, we focused on mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping"), but those aren't the only cases where the mmap_sem can be taken outside of the context of the process as Michal Hocko noticed while backporting that commit to older -stable kernels. If mmgrab() is called in the context of the process, but then the mm_count reference is transferred outside the context of the process, that can also be a problem if the mmap_sem has to be taken for writing through that mm_count reference. khugepaged registration calls mmgrab() in the context of the process, but the mmap_sem for writing is taken later in the context of the khugepaged kernel thread. collapse_huge_page() after taking the mmap_sem for writing doesn't modify any vma, so it's not obvious that it could cause a problem to the coredump, but it happens to modify the pmd in a way that breaks an invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page() needs the mmap_sem for writing just to block concurrent page faults that call pmd_trans_huge_lock(). Specifically the invariant that "!pmd_trans_huge()" cannot become a "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs. The coredump will call __get_user_pages() without mmap_sem for reading, which eventually can invoke a lockless page fault which will need a functional pmd_trans_huge_lock(). So collapse_huge_page() needs to use mmget_still_valid() to check it's not running concurrently with the coredump... as long as the coredump can invoke page faults without holding the mmap_sem for reading. This has "Fixes: khugepaged" to facilitate backporting, but in my view it's more a bug in the coredump code that will eventually have to be rewritten to stop invoking page faults without the mmap_sem for reading. So the long term plan is still to drop all mmget_still_valid(). Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com Fixes: ba76149f47d8 ("thp: khugepaged") Signed-off-by: Andrea Arcangeli Reported-by: Michal Hocko Acked-by: Michal Hocko Acked-by: Kirill A. Shutemov Cc: Oleg Nesterov Cc: Jann Horn Cc: Hugh Dickins Cc: Mike Rapoport Cc: Mike Kravetz Cc: Peter Xu Cc: Jason Gunthorpe Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [Ajay: Just adjusted to apply on v4.9] Signed-off-by: Ajay Kaher Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b36cc57329e..ade072a6fd24 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1197,6 +1197,10 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, * followed by taking the mmap_sem for writing before modifying the * vmas or anything the coredump pretends not to change from under it. * + * It also has to be called when mmgrab() is used in the context of + * the process, but then the mm_count refcount is transferred outside + * the context of the process to run down_write() on that pinned mm. + * * NOTE: find_extend_vma() called from GUP context is the only place * that can modify the "mm" (notably the vm_start/end) under mmap_sem * for reading and outside the context of the process, so it is also -- cgit v1.2.3 From fe5844365ec6c4d61838f289926f4d55da94d2fb Mon Sep 17 00:00:00 2001 From: Miguel Ojeda Date: Fri, 2 Aug 2019 12:37:56 +0200 Subject: Backport minimal compiler_attributes.h to support GCC 9 This adds support for __copy to v4.9.y so that we can use it in init/exit_module to avoid -Werror=missing-attributes errors on GCC 9. Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/ Cc: Suggested-by: Rolf Eike Beer Signed-off-by: Miguel Ojeda Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 3050de0dac96..0020ee1cab37 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -54,6 +54,22 @@ extern void __chk_io_ptr(const volatile void __iomem *); #ifdef __KERNEL__ +/* + * Minimal backport of compiler_attributes.h to add support for __copy + * to v4.9.y so that we can use it in init/exit_module to avoid + * -Werror=missing-attributes errors on GCC 9. + */ +#ifndef __has_attribute +# define __has_attribute(x) __GCC4_has_attribute_##x +# define __GCC4_has_attribute___copy__ 0 +#endif + +#if __has_attribute(__copy__) +# define __copy(symbol) __attribute__((__copy__(symbol))) +#else +# define __copy(symbol) +#endif + #ifdef __GNUC__ #include #endif -- cgit v1.2.3 From 2c34c215c1028f971af7cf8189c4f6237afdb90b Mon Sep 17 00:00:00 2001 From: Miguel Ojeda Date: Fri, 2 Aug 2019 12:37:57 +0200 Subject: include/linux/module.h: copy __init/__exit attrs to init/cleanup_module commit a6e60d84989fa0e91db7f236eda40453b0e44afa upstream. The upcoming GCC 9 release extends the -Wmissing-attributes warnings (enabled by -Wall) to C and aliases: it warns when particular function attributes are missing in the aliases but not in their target. In particular, it triggers for all the init/cleanup_module aliases in the kernel (defined by the module_init/exit macros), ending up being very noisy. These aliases point to the __init/__exit functions of a module, which are defined as __cold (among other attributes). However, the aliases themselves do not have the __cold attribute. Since the compiler behaves differently when compiling a __cold function as well as when compiling paths leading to calls to __cold functions, the warning is trying to point out the possibly-forgotten attribute in the alias. In order to keep the warning enabled, we decided to silence this case. Ideally, we would mark the aliases directly as __init/__exit. However, there are currently around 132 modules in the kernel which are missing __init/__exit in their init/cleanup functions (either because they are missing, or for other reasons, e.g. the functions being called from somewhere else); and a section mismatch is a hard error. A conservative alternative was to mark the aliases as __cold only. However, since we would like to eventually enforce __init/__exit to be always marked, we chose to use the new __copy function attribute (introduced by GCC 9 as well to deal with this). With it, we copy the attributes used by the target functions into the aliases. This way, functions that were not marked as __init/__exit won't have their aliases marked either, and therefore there won't be a section mismatch. Note that the warning would go away marking either the extern declaration, the definition, or both. However, we only mark the definition of the alias, since we do not want callers (which only see the declaration) to be compiled as if the function was __cold (and therefore the paths leading to those calls would be assumed to be unlikely). Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/ Cc: Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/ Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/ Suggested-by: Martin Sebor Acked-by: Jessica Yu Signed-off-by: Miguel Ojeda Signed-off-by: Greg Kroah-Hartman --- include/linux/module.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/module.h b/include/linux/module.h index fd9e121c7b3f..99f330ae13da 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -129,13 +129,13 @@ extern void cleanup_module(void); #define module_init(initfn) \ static inline initcall_t __maybe_unused __inittest(void) \ { return initfn; } \ - int init_module(void) __attribute__((alias(#initfn))); + int init_module(void) __copy(initfn) __attribute__((alias(#initfn))); /* This is only required if you want to be unloadable. */ #define module_exit(exitfn) \ static inline exitcall_t __maybe_unused __exittest(void) \ { return exitfn; } \ - void cleanup_module(void) __attribute__((alias(#exitfn))); + void cleanup_module(void) __copy(exitfn) __attribute__((alias(#exitfn))); #endif -- cgit v1.2.3 From 792f95e79a75ea8195236631bb59dd51389d87ce Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Wed, 24 Jul 2019 11:00:55 +0200 Subject: scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure commit 023358b136d490ca91735ac6490db3741af5a8bd upstream. Gcc-9 complains for a memset across pointer boundaries, which happens as the code tries to allocate a flexible array on the stack. Turns out we cannot do this without relying on gcc-isms, so with this patch we'll embed the fc_rport_priv structure into fcoe_rport, can use the normal 'container_of' outcast, and will only have to do a memset over one structure. Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- include/scsi/libfcoe.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/scsi/libfcoe.h b/include/scsi/libfcoe.h index 722d3264d3bf..6be92eede5c0 100644 --- a/include/scsi/libfcoe.h +++ b/include/scsi/libfcoe.h @@ -241,6 +241,7 @@ struct fcoe_fcf { * @vn_mac: VN_Node assigned MAC address for data */ struct fcoe_rport { + struct fc_rport_priv rdata; unsigned long time; u16 fcoe_len; u16 flags; -- cgit v1.2.3 From 06fac08201a7166b2defa85635db40c4a2483c7d Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 6 Aug 2019 17:09:14 +0200 Subject: tcp: be more careful in tcp_fragment() [ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ] Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet Reported-by: Andrew Prout Tested-by: Andrew Prout Tested-by: Jonathan Lemon Tested-by: Michal Kubecek Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Christoph Paasch Cc: Jonathan Looney Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/tcp.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 1eda31f7f013..a474213ca015 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1595,6 +1595,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli tcp_sk(sk)->highest_sack = NULL; } +static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk) +{ + struct sk_buff *skb = tcp_write_queue_head(sk); + + if (skb == tcp_send_head(sk)) + skb = NULL; + + return skb; +} + +static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) +{ + struct sk_buff *skb = tcp_send_head(sk); + + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); +} + static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { __skb_queue_tail(&sk->sk_write_queue, skb); -- cgit v1.2.3 From 22395a3e46d611644fdd7b25c521a040dc2e8b51 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Fri, 19 May 2017 11:33:16 +0200 Subject: libceph: use kbasename() and kill ceph_file_part() commit 6f4dbd149d2a151b89d1a5bbf7530ee5546c7908 upstream. Signed-off-by: Ilya Dryomov Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman --- include/linux/ceph/ceph_debug.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/ceph/ceph_debug.h b/include/linux/ceph/ceph_debug.h index aa2e19182d99..51c5bd64bd00 100644 --- a/include/linux/ceph/ceph_debug.h +++ b/include/linux/ceph/ceph_debug.h @@ -3,6 +3,8 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include + #ifdef CONFIG_CEPH_LIB_PRETTYDEBUG /* @@ -12,12 +14,10 @@ */ # if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG) -extern const char *ceph_file_part(const char *s, int len); # define dout(fmt, ...) \ pr_debug("%.*s %12.12s:%-4d : " fmt, \ 8 - (int)sizeof(KBUILD_MODNAME), " ", \ - ceph_file_part(__FILE__, sizeof(__FILE__)), \ - __LINE__, ##__VA_ARGS__) + kbasename(__FILE__), __LINE__, ##__VA_ARGS__) # else /* faux printk call just to see any compiler warnings. */ # define dout(fmt, ...) do { \ -- cgit v1.2.3 From 00a8794f636b83eec31eee0d4ab27923b48506d8 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 30 Jul 2019 21:25:20 +0200 Subject: compat_ioctl: pppoe: fix PPPOEIOCSFWD handling [ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ] Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault Signed-off-by: Arnd Bergmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/if_pppox.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h index ba7a9b0c7c57..24e9b360da65 100644 --- a/include/linux/if_pppox.h +++ b/include/linux/if_pppox.h @@ -84,6 +84,9 @@ extern int register_pppox_proto(int proto_num, const struct pppox_proto *pp); extern void unregister_pppox_proto(int proto_num); extern void pppox_unbind_sock(struct sock *sk);/* delete ppp-channel binding */ extern int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); +extern int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); + +#define PPPOEIOCSFWD32 _IOW(0xB1 ,0, compat_size_t) /* PPPoX socket states */ enum { -- cgit v1.2.3 From daa60696bdf4e59fcb304b3601b1e3f8cdb4f412 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Mon, 22 Jul 2019 10:24:33 +0100 Subject: ALSA: compress: Fix regression on compressed capture streams [ Upstream commit 4475f8c4ab7b248991a60d9c02808dbb813d6be8 ] A previous fix to the stop handling on compressed capture streams causes some knock on issues. The previous fix updated snd_compr_drain_notify to set the state back to PREPARED for capture streams. This causes some issues however as the handling for snd_compr_poll differs between the two states and some user-space applications were relying on the poll failing after the stream had been stopped. To correct this regression whilst still fixing the original problem the patch was addressing, update the capture handling to skip the PREPARED state rather than skipping the SETUP state as it has done until now. Fixes: 4f2ab5e1d13d ("ALSA: compress: Fix stop handling on compressed capture streams") Signed-off-by: Charles Keepax Acked-by: Vinod Koul Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- include/sound/compress_driver.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include') diff --git a/include/sound/compress_driver.h b/include/sound/compress_driver.h index 96bc5acdade3..49482080311a 100644 --- a/include/sound/compress_driver.h +++ b/include/sound/compress_driver.h @@ -185,10 +185,7 @@ static inline void snd_compr_drain_notify(struct snd_compr_stream *stream) if (snd_BUG_ON(!stream)) return; - if (stream->direction == SND_COMPRESS_PLAYBACK) - stream->runtime->state = SNDRV_PCM_STATE_SETUP; - else - stream->runtime->state = SNDRV_PCM_STATE_PREPARED; + stream->runtime->state = SNDRV_PCM_STATE_SETUP; wake_up(&stream->runtime->sleep); } -- cgit v1.2.3 From c98446e1bab6253ddce7144cc2a91c400a323839 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Sat, 17 Aug 2019 00:00:08 +0100 Subject: bpf: add bpf_jit_limit knob to restrict unpriv allocations commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3 upstream. Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Cc: Eric Dumazet Cc: Jann Horn Cc: Kees Cook Cc: LKML Signed-off-by: Alexei Starovoitov [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/filter.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index 1f09c521adfe..689bba102007 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -599,6 +599,7 @@ void bpf_warn_invalid_xdp_action(u32 act); #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; extern int bpf_jit_harden; +extern int bpf_jit_limit; typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size); -- cgit v1.2.3 From 53e054b3cd1bd3dde2212c3e279ab4a3eefac6bb Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sat, 17 Aug 2019 00:01:12 +0100 Subject: siphash: add cryptographically secure PRF commit 2c956a60778cbb6a27e0c7a8a52a91378c90e1d1 upstream. SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining. For the first usage: There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases. There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk. For the second usage: A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy. Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld Reviewed-by: Jean-Philippe Aumasson Cc: Linus Torvalds Cc: Eric Biggers Cc: David Laight Cc: Eric Dumazet Signed-off-by: David S. Miller [bwh: Backported to 4.9 as dependency of commits df453700e8d8 "inet: switch IP ID generator to siphash" and 3c79107631db "netfilter: ctnetlink: don't use conntrack/expect object addresses as id"] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/siphash.h | 85 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 include/linux/siphash.h (limited to 'include') diff --git a/include/linux/siphash.h b/include/linux/siphash.h new file mode 100644 index 000000000000..feeb29cd113e --- /dev/null +++ b/include/linux/siphash.h @@ -0,0 +1,85 @@ +/* Copyright (C) 2016 Jason A. Donenfeld . All Rights Reserved. + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + * + * This implementation is specifically for SipHash2-4. + */ + +#ifndef _LINUX_SIPHASH_H +#define _LINUX_SIPHASH_H + +#include +#include + +#define SIPHASH_ALIGNMENT __alignof__(u64) +typedef struct { + u64 key[2]; +} siphash_key_t; + +u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key); +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key); +#endif + +u64 siphash_1u64(const u64 a, const siphash_key_t *key); +u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t *key); +u64 siphash_3u64(const u64 a, const u64 b, const u64 c, + const siphash_key_t *key); +u64 siphash_4u64(const u64 a, const u64 b, const u64 c, const u64 d, + const siphash_key_t *key); +u64 siphash_1u32(const u32 a, const siphash_key_t *key); +u64 siphash_3u32(const u32 a, const u32 b, const u32 c, + const siphash_key_t *key); + +static inline u64 siphash_2u32(const u32 a, const u32 b, + const siphash_key_t *key) +{ + return siphash_1u64((u64)b << 32 | a, key); +} +static inline u64 siphash_4u32(const u32 a, const u32 b, const u32 c, + const u32 d, const siphash_key_t *key) +{ + return siphash_2u64((u64)b << 32 | a, (u64)d << 32 | c, key); +} + + +static inline u64 ___siphash_aligned(const __le64 *data, size_t len, + const siphash_key_t *key) +{ + if (__builtin_constant_p(len) && len == 4) + return siphash_1u32(le32_to_cpup((const __le32 *)data), key); + if (__builtin_constant_p(len) && len == 8) + return siphash_1u64(le64_to_cpu(data[0]), key); + if (__builtin_constant_p(len) && len == 16) + return siphash_2u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + key); + if (__builtin_constant_p(len) && len == 24) + return siphash_3u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + le64_to_cpu(data[2]), key); + if (__builtin_constant_p(len) && len == 32) + return siphash_4u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + le64_to_cpu(data[2]), le64_to_cpu(data[3]), + key); + return __siphash_aligned(data, len, key); +} + +/** + * siphash - compute 64-bit siphash PRF value + * @data: buffer to hash + * @size: size of @data + * @key: the siphash key + */ +static inline u64 siphash(const void *data, size_t len, + const siphash_key_t *key) +{ +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) + return __siphash_unaligned(data, len, key); +#endif + return ___siphash_aligned(data, len, key); +} + +#endif /* _LINUX_SIPHASH_H */ -- cgit v1.2.3 From 175a407ce432088d827b822b8a47afd8360a8dbe Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sat, 17 Aug 2019 00:01:19 +0100 Subject: siphash: implement HalfSipHash1-3 for hash tables commit 1ae2324f732c9c4e2fa4ebd885fa1001b70d52e1 upstream. HalfSipHash, or hsiphash, is a shortened version of SipHash, which generates 32-bit outputs using a weaker 64-bit key. It has *much* lower security margins, and shouldn't be used for anything too sensitive, but it could be used as a hashtable key function replacement, if the output is never exposed, and if the security requirement is not too high. The goal is to make this something that performance-critical jhash users would be willing to use. On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias SipHash1-3 to HalfSipHash1-3 on those systems. 64-bit x86_64: [ 0.509409] test_siphash: SipHash2-4 cycles: 4049181 [ 0.510650] test_siphash: SipHash1-3 cycles: 2512884 [ 0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920 [ 0.512904] test_siphash: JenkinsHash cycles: 978267 So, we map hsiphash() -> SipHash1-3 32-bit x86: [ 0.509868] test_siphash: SipHash2-4 cycles: 14812892 [ 0.513601] test_siphash: SipHash1-3 cycles: 9510710 [ 0.515263] test_siphash: HalfSipHash1-3 cycles: 3856157 [ 0.515952] test_siphash: JenkinsHash cycles: 1148567 So, we map hsiphash() -> HalfSipHash1-3 hsiphash() is roughly 3 times slower than jhash(), but comes with a considerable security improvement. Signed-off-by: Jason A. Donenfeld Reviewed-by: Jean-Philippe Aumasson Signed-off-by: David S. Miller [bwh: Backported to 4.9 to avoid regression for WireGuard with only half the siphash API present] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/siphash.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/siphash.h b/include/linux/siphash.h index feeb29cd113e..fa7a6b9cedbf 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -5,7 +5,9 @@ * SipHash: a fast short-input PRF * https://131002.net/siphash/ * - * This implementation is specifically for SipHash2-4. + * This implementation is specifically for SipHash2-4 for a secure PRF + * and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for + * hashtables. */ #ifndef _LINUX_SIPHASH_H @@ -82,4 +84,57 @@ static inline u64 siphash(const void *data, size_t len, return ___siphash_aligned(data, len, key); } +#define HSIPHASH_ALIGNMENT __alignof__(unsigned long) +typedef struct { + unsigned long key[2]; +} hsiphash_key_t; + +u32 __hsiphash_aligned(const void *data, size_t len, + const hsiphash_key_t *key); +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u32 __hsiphash_unaligned(const void *data, size_t len, + const hsiphash_key_t *key); +#endif + +u32 hsiphash_1u32(const u32 a, const hsiphash_key_t *key); +u32 hsiphash_2u32(const u32 a, const u32 b, const hsiphash_key_t *key); +u32 hsiphash_3u32(const u32 a, const u32 b, const u32 c, + const hsiphash_key_t *key); +u32 hsiphash_4u32(const u32 a, const u32 b, const u32 c, const u32 d, + const hsiphash_key_t *key); + +static inline u32 ___hsiphash_aligned(const __le32 *data, size_t len, + const hsiphash_key_t *key) +{ + if (__builtin_constant_p(len) && len == 4) + return hsiphash_1u32(le32_to_cpu(data[0]), key); + if (__builtin_constant_p(len) && len == 8) + return hsiphash_2u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + key); + if (__builtin_constant_p(len) && len == 12) + return hsiphash_3u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + le32_to_cpu(data[2]), key); + if (__builtin_constant_p(len) && len == 16) + return hsiphash_4u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + le32_to_cpu(data[2]), le32_to_cpu(data[3]), + key); + return __hsiphash_aligned(data, len, key); +} + +/** + * hsiphash - compute 32-bit hsiphash PRF value + * @data: buffer to hash + * @size: size of @data + * @key: the hsiphash key + */ +static inline u32 hsiphash(const void *data, size_t len, + const hsiphash_key_t *key) +{ +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + if (!IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT)) + return __hsiphash_unaligned(data, len, key); +#endif + return ___hsiphash_aligned(data, len, key); +} + #endif /* _LINUX_SIPHASH_H */ -- cgit v1.2.3 From b97a2f3d58f439d11ececb2faa21dac775d63c5c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 17 Aug 2019 00:01:27 +0100 Subject: inet: switch IP ID generator to siphash commit df453700e8d81b1bdafdf684365ee2b9431fb702 upstream. According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet Reported-by: Amit Klein Reported-by: Benny Pinkas Signed-off-by: David S. Miller [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/siphash.h | 5 +++++ include/net/netns/ipv4.h | 2 ++ 2 files changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/siphash.h b/include/linux/siphash.h index fa7a6b9cedbf..bf21591a9e5e 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -21,6 +21,11 @@ typedef struct { u64 key[2]; } siphash_key_t; +static inline bool siphash_key_is_zero(const siphash_key_t *key) +{ + return !(key->key[0] | key->key[1]); +} + u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key); #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key); diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index bf619a67ec03..af1c5e7c7e94 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -8,6 +8,7 @@ #include #include #include +#include struct tcpm_hash_bucket; struct ctl_table_header; @@ -137,5 +138,6 @@ struct netns_ipv4 { int sysctl_fib_multipath_use_neigh; #endif atomic_t rt_genid; + siphash_key_t ip_id_key; }; #endif -- cgit v1.2.3 From 1922476beeeea46bebbe577215078736dd4231dc Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Sat, 17 Aug 2019 00:01:34 +0100 Subject: netfilter: ctnetlink: don't use conntrack/expect object addresses as id commit 3c79107631db1f7fd32cf3f7368e4672004a3010 upstream. else, we leak the addresses to userspace via ctnetlink events and dumps. Compute an ID on demand based on the immutable parts of nf_conn struct. Another advantage compared to using an address is that there is no immediate re-use of the same ID in case the conntrack entry is freed and reallocated again immediately. Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/net/netfilter/nf_conntrack.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 9ae819e27940..b57a9f37c297 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -336,6 +336,8 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, gfp_t flags); void nf_ct_tmpl_free(struct nf_conn *tmpl); +u32 nf_ct_get_id(const struct nf_conn *ct); + #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_ADD_ATOMIC(net, count, v) this_cpu_add((net)->ct.stat->count, (v)) -- cgit v1.2.3 From 912420e5252c6867d3c3c3ad7a69ca59e5fc48c8 Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Fri, 2 Aug 2019 21:49:19 -0700 Subject: asm-generic: fix -Wtype-limits compiler warnings [ Upstream commit cbedfe11347fe418621bd188d58a206beb676218 ] Commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a compilation warning because "rx_frag_size" is an "ushort" while PAGE_SHIFT here is 16. The commit changed the get_order() to be a multi-line macro where compilers insist to check all statements in the macro even when __builtin_constant_p(rx_frag_size) will return false as "rx_frag_size" is a module parameter. In file included from ./arch/powerpc/include/asm/page_64.h:107, from ./arch/powerpc/include/asm/page.h:242, from ./arch/powerpc/include/asm/mmu.h:132, from ./arch/powerpc/include/asm/lppaca.h:47, from ./arch/powerpc/include/asm/paca.h:17, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./arch/powerpc/include/asm/processor.h:39, from ./include/linux/prefetch.h:15, from drivers/net/ethernet/emulex/benet/be_main.c:14: drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create': ./include/asm-generic/getorder.h:54:9: warning: comparison is always true due to limited range of data type [-Wtype-limits] (((n) < (1UL << PAGE_SHIFT)) ? 0 : \ ^ drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion of macro 'get_order' adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE; ^~~~~~~~~ Fix it by moving all of this multi-line macro into a proper function, and killing __get_order() off. [akpm@linux-foundation.org: remove __get_order() altogether] [cai@lca.pw: v2] Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw Fixes: d66acc39c7ce ("bitops: Optimise get_order()") Signed-off-by: Qian Cai Reviewed-by: Nathan Chancellor Cc: David S. Miller Cc: Arnd Bergmann Cc: David Howells Cc: Jakub Jelinek Cc: Nick Desaulniers Cc: Bill Wendling Cc: James Y Knight Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/asm-generic/getorder.h | 50 +++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 30 deletions(-) (limited to 'include') diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h index 65e4468ac53d..52fbf236a90e 100644 --- a/include/asm-generic/getorder.h +++ b/include/asm-generic/getorder.h @@ -6,24 +6,6 @@ #include #include -/* - * Runtime evaluation of get_order() - */ -static inline __attribute_const__ -int __get_order(unsigned long size) -{ - int order; - - size--; - size >>= PAGE_SHIFT; -#if BITS_PER_LONG == 32 - order = fls(size); -#else - order = fls64(size); -#endif - return order; -} - /** * get_order - Determine the allocation order of a memory size * @size: The size for which to get the order @@ -42,19 +24,27 @@ int __get_order(unsigned long size) * to hold an object of the specified size. * * The result is undefined if the size is 0. - * - * This function may be used to initialise variables with compile time - * evaluations of constants. */ -#define get_order(n) \ -( \ - __builtin_constant_p(n) ? ( \ - ((n) == 0UL) ? BITS_PER_LONG - PAGE_SHIFT : \ - (((n) < (1UL << PAGE_SHIFT)) ? 0 : \ - ilog2((n) - 1) - PAGE_SHIFT + 1) \ - ) : \ - __get_order(n) \ -) +static inline __attribute_const__ int get_order(unsigned long size) +{ + if (__builtin_constant_p(size)) { + if (!size) + return BITS_PER_LONG - PAGE_SHIFT; + + if (size < (1UL << PAGE_SHIFT)) + return 0; + + return ilog2((size) - 1) - PAGE_SHIFT + 1; + } + + size--; + size >>= PAGE_SHIFT; +#if BITS_PER_LONG == 32 + return fls(size); +#else + return fls64(size); +#endif +} #endif /* __ASSEMBLY__ */ -- cgit v1.2.3 From 6c1dc8f96b54ad9e63ef3becac73750a588abe6e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 11 Dec 2018 12:14:12 +0100 Subject: bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K [ Upstream commit fdadd04931c2d7cd294dc5b2b342863f94be53a3 ] Michael and Sandipan report: Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF JIT allocations. At compile time it defaults to PAGE_SIZE * 40000, and is adjusted again at init time if MODULES_VADDR is defined. For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with the compile-time default at boot-time, which is 0x9c400000 when using 64K page size. This overflows the signed 32-bit bpf_jit_limit value: root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit -1673527296 and can cause various unexpected failures throughout the network stack. In one case `strace dhclient eth0` reported: setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) and similar failures can be seen with tools like tcpdump. This doesn't always reproduce however, and I'm not sure why. The more consistent failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host would time out on systemd/netplan configuring a virtio-net NIC with no noticeable errors in the logs. Given this and also given that in near future some architectures like arm64 will have a custom area for BPF JIT image allocations we should get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec() so therefore add another overridable bpf_jit_alloc_exec_limit() helper function which returns the possible size of the memory area for deriving the default heuristic in bpf_jit_charge_init(). Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default JIT memory provider, and therefore in case archs implement their custom module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}. Additionally, for archs supporting large page sizes, we should change the sysctl to be handled as long to not run into sysctl restrictions in future. Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations") Reported-by: Sandipan Das Reported-by: Michael Roth Signed-off-by: Daniel Borkmann Tested-by: Michael Roth Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin --- include/linux/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index 689bba102007..0837d904405a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -599,7 +599,7 @@ void bpf_warn_invalid_xdp_action(u32 act); #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; extern int bpf_jit_harden; -extern int bpf_jit_limit; +extern long bpf_jit_limit; typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size); -- cgit v1.2.3 From 669a50559acf0f6c47a3f61d627c81758358e29f Mon Sep 17 00:00:00 2001 From: Tim Froidcoeur Date: Sat, 24 Aug 2019 08:03:51 +0200 Subject: tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") triggers following stack trace: [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406! [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc [25244.888167] Call Trace: [25244.889182] [25244.890001] tcp_fragment+0x9c/0x2cf [25244.891295] tcp_write_xmit+0x68f/0x988 [25244.892732] __tcp_push_pending_frames+0x3b/0xa0 [25244.894347] tcp_data_snd_check+0x2a/0xc8 [25244.895775] tcp_rcv_established+0x2a8/0x30d [25244.897282] tcp_v4_do_rcv+0xb2/0x158 [25244.898666] tcp_v4_rcv+0x692/0x956 [25244.899959] ip_local_deliver_finish+0xeb/0x169 [25244.901547] __netif_receive_skb_core+0x51c/0x582 [25244.903193] ? inet_gro_receive+0x239/0x247 [25244.904756] netif_receive_skb_internal+0xab/0xc6 [25244.906395] napi_gro_receive+0x8a/0xc0 [25244.907760] receive_buf+0x9a1/0x9cd [25244.909160] ? load_balance+0x17a/0x7b7 [25244.910536] ? vring_unmap_one+0x18/0x61 [25244.911932] ? detach_buf+0x60/0xfa [25244.913234] virtnet_poll+0x128/0x1e1 [25244.914607] net_rx_action+0x12a/0x2b1 [25244.915953] __do_softirq+0x11c/0x26b [25244.917269] ? handle_irq_event+0x44/0x56 [25244.918695] irq_exit+0x61/0xa0 [25244.919947] do_IRQ+0x9d/0xbb [25244.921065] common_interrupt+0x85/0x85 [25244.922479] tcp_rtx_queue_tail() (called by tcp_fragment()) can call tcp_write_queue_prev() on the first packet in the queue, which will trigger the BUG in tcp_write_queue_prev(), because there is no previous packet. This happens when the retransmit queue is empty, for example in case of a zero window. Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a simple cherry-pick of the original one from master (b617158dc096) because there is a specific TCP rtx queue only since v4.15. For more details, please see the commit message of b617158dc096 ("tcp: be more careful in tcp_fragment()"). The BUG() is hit due to the specific code added to versions older than v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406), just before the BUG_ON() somehow suggests to add a check before using it, what Tim did. In master, this code path causing the issue will not be taken because the implementation of tcp_rtx_queue_tail() is different: tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() → skb_queue_prev() → BUG_ON() Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") Signed-off-by: Tim Froidcoeur Signed-off-by: Matthieu Baerts Reviewed-by: Christoph Paasch Signed-off-by: Sasha Levin --- include/net/tcp.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index a474213ca015..23814d997e86 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1609,6 +1609,10 @@ static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); + /* empty retransmit queue, for example due to zero window */ + if (skb == tcp_write_queue_head(sk)) + return NULL; + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } -- cgit v1.2.3 From f45f30e84559050d1d132b636bbcfe0be5ca0f65 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 31 Jul 2019 20:38:14 +0800 Subject: gpio: Fix build error of function redefinition [ Upstream commit 68e03b85474a51ec1921b4d13204782594ef7223 ] when do randbuilding, I got this error: In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: ./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: ./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ Reported-by: Hulk Robot Fixes: 964cb341882f ("gpio: move pincontrol calls to ") Signed-off-by: YueHaibing Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.com Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- include/linux/gpio.h | 24 ------------------------ 1 file changed, 24 deletions(-) (limited to 'include') diff --git a/include/linux/gpio.h b/include/linux/gpio.h index d12b5d566e4b..11555bd821b7 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -229,30 +229,6 @@ static inline int irq_to_gpio(unsigned irq) return -EINVAL; } -static inline int -gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, - unsigned int gpio_offset, unsigned int pin_offset, - unsigned int npins) -{ - WARN_ON(1); - return -EINVAL; -} - -static inline int -gpiochip_add_pingroup_range(struct gpio_chip *chip, - struct pinctrl_dev *pctldev, - unsigned int gpio_offset, const char *pin_group) -{ - WARN_ON(1); - return -EINVAL; -} - -static inline void -gpiochip_remove_pin_ranges(struct gpio_chip *chip) -{ - WARN_ON(1); -} - static inline int devm_gpio_request(struct device *dev, unsigned gpio, const char *label) { -- cgit v1.2.3 From 05700fe421866f60f63fbb3a02852425a9bd9447 Mon Sep 17 00:00:00 2001 From: Luis Henriques Date: Fri, 19 Jul 2019 15:32:19 +0100 Subject: libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer [ Upstream commit 5c498950f730aa17c5f8a2cdcb903524e4002ed2 ] Signed-off-by: Luis Henriques Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov Signed-off-by: Sasha Levin --- include/linux/ceph/buffer.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/ceph/buffer.h b/include/linux/ceph/buffer.h index 07ca15e76100..dada47a4360f 100644 --- a/include/linux/ceph/buffer.h +++ b/include/linux/ceph/buffer.h @@ -29,7 +29,8 @@ static inline struct ceph_buffer *ceph_buffer_get(struct ceph_buffer *b) static inline void ceph_buffer_put(struct ceph_buffer *b) { - kref_put(&b->kref, ceph_buffer_release); + if (b) + kref_put(&b->kref, ceph_buffer_release); } extern int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end); -- cgit v1.2.3 From 88d08316dabb6462c32d0adb62c4dade71405d92 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Fri, 22 Mar 2019 16:26:19 -0700 Subject: xfrm: clean up xfrm protocol checks commit dbb2483b2a46fbaf833cfb5deb5ed9cace9c7399 upstream. In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()") I introduced a check for xfrm protocol, but according to Herbert IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so it should be removed from validate_tmpl(). And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific protocols, this is why xfrm_state_flush() could still miss IPPROTO_ROUTING, which leads that those entries are left in net->xfrm.state_all before exit net. Fix this by replacing IPSEC_PROTO_ANY with zero. This patch also extracts the check from validate_tmpl() to xfrm_id_proto_valid() and uses it in parse_ipsecrequest(). With this, no other protocols should be added into xfrm. Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()") Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com Cc: Steffen Klassert Cc: Herbert Xu Signed-off-by: Cong Wang Acked-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Zubin Mithra Signed-off-by: Greg Kroah-Hartman --- include/net/xfrm.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 835c30e491c8..9e2f260cbb51 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1297,6 +1297,23 @@ static inline int xfrm_state_kern(const struct xfrm_state *x) return atomic_read(&x->tunnel_users); } +static inline bool xfrm_id_proto_valid(u8 proto) +{ + switch (proto) { + case IPPROTO_AH: + case IPPROTO_ESP: + case IPPROTO_COMP: +#if IS_ENABLED(CONFIG_IPV6) + case IPPROTO_ROUTING: + case IPPROTO_DSTOPTS: +#endif + return true; + default: + return false; + } +} + +/* IPSEC_PROTO_ANY only matches 3 IPsec protocols, 0 could match all. */ static inline int xfrm_id_proto_match(u8 proto, u8 userproto) { return (!userproto || proto == userproto || -- cgit v1.2.3 From f99a93352471791351d67b4f56a7ac891ec1b6aa Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 3 May 2019 08:24:44 -0700 Subject: ip6: fix skb leak in ip6frag_expire_frag_queue() commit 47d3d7fdb10a21c223036b58bd70ffdc24a472c4 upstream. Since ip6frag_expire_frag_queue() now pulls the head skb from frag queue, we should no longer use skb_get(), since this leads to an skb leak. Stefan Bader initially reported a problem in 4.4.stable [1] caused by the skb_get(), so this patch should also fix this issue. 296583.091021] kernel BUG at /build/linux-6VmqmP/linux-4.4.0/net/core/skbuff.c:1207! [296583.091734] Call Trace: [296583.091749] [] __pskb_pull_tail+0x50/0x350 [296583.091764] [] _decode_session6+0x26a/0x400 [296583.091779] [] __xfrm_decode_session+0x39/0x50 [296583.091795] [] icmpv6_route_lookup+0xf0/0x1c0 [296583.091809] [] icmp6_send+0x5e1/0x940 [296583.091823] [] ? __netif_receive_skb+0x18/0x60 [296583.091838] [] ? netif_receive_skb_internal+0x32/0xa0 [296583.091858] [] ? ixgbe_clean_rx_irq+0x594/0xac0 [ixgbe] [296583.091876] [] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6] [296583.091893] [] icmpv6_send+0x21/0x30 [296583.091906] [] ip6_expire_frag_queue+0xe0/0x120 [296583.091921] [] nf_ct_frag6_expire+0x1f/0x30 [nf_defrag_ipv6] [296583.091938] [] call_timer_fn+0x37/0x140 [296583.091951] [] ? nf_ct_net_exit+0x50/0x50 [nf_defrag_ipv6] [296583.091968] [] run_timer_softirq+0x234/0x330 [296583.091982] [] __do_softirq+0x109/0x2b0 Fixes: d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag") Signed-off-by: Eric Dumazet Reported-by: Stefan Bader Cc: Peter Oskolkov Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Baolin Wang Signed-off-by: Greg Kroah-Hartman --- include/net/ipv6_frag.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h index 28aa9b30aece..1f77fb4dc79d 100644 --- a/include/net/ipv6_frag.h +++ b/include/net/ipv6_frag.h @@ -94,7 +94,6 @@ ip6frag_expire_frag_queue(struct net *net, struct frag_queue *fq) goto out; head->dev = dev; - skb_get(head); spin_unlock(&fq->q.lock); icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); -- cgit v1.2.3 From 4a7e12395a5e5cfa9967de2714a43d223eec2f51 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Thu, 5 Sep 2019 19:36:37 -0700 Subject: isdn/capi: check message length in capi_write() [ Upstream commit fe163e534e5eecdfd7b5920b0dfd24c458ee85d6 ] syzbot reported: BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 do_loop_readv_writev fs/read_write.c:703 [inline] do_iter_write+0x83e/0xd80 fs/read_write.c:961 vfs_writev fs/read_write.c:1004 [inline] do_writev+0x397/0x840 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 [...] The problem is that capi_write() is reading past the end of the message. Fix it by checking the message's length in the needed places. Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com Signed-off-by: Eric Biggers Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/isdn/capicmd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/isdn/capicmd.h b/include/uapi/linux/isdn/capicmd.h index b58635f722da..ae1e1fba2e13 100644 --- a/include/uapi/linux/isdn/capicmd.h +++ b/include/uapi/linux/isdn/capicmd.h @@ -15,6 +15,7 @@ #define CAPI_MSG_BASELEN 8 #define CAPI_DATA_B3_REQ_LEN (CAPI_MSG_BASELEN+4+4+2+2+2) #define CAPI_DATA_B3_RESP_LEN (CAPI_MSG_BASELEN+4+2) +#define CAPI_DISCONNECT_B3_RESP_LEN (CAPI_MSG_BASELEN+4) /*----- CAPI commands -----*/ #define CAPI_ALERT 0x01 -- cgit v1.2.3 From 787f774bc72cec8d7d8c024b4e130d4e5535d6d8 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Mon, 27 Aug 2018 08:35:55 +0300 Subject: IB/core: Add an unbound WQ type to the new CQ API commit f794809a7259dfaa3d47d90ef5a86007cf48b1ce upstream. The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: Jack Morgenstein Signed-off-by: Leon Romanovsky Reviewed-by: Sagi Grimberg Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- include/rdma/ib_verbs.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index a42535f252b5..0638b8aeba34 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -63,6 +63,7 @@ extern struct workqueue_struct *ib_wq; extern struct workqueue_struct *ib_comp_wq; +extern struct workqueue_struct *ib_comp_unbound_wq; union ib_gid { u8 raw[16]; @@ -1415,9 +1416,10 @@ struct ib_ah { typedef void (*ib_comp_handler)(struct ib_cq *cq, void *cq_context); enum ib_poll_context { - IB_POLL_DIRECT, /* caller context, no hw completions */ - IB_POLL_SOFTIRQ, /* poll from softirq context */ - IB_POLL_WORKQUEUE, /* poll from workqueue */ + IB_POLL_DIRECT, /* caller context, no hw completions */ + IB_POLL_SOFTIRQ, /* poll from softirq context */ + IB_POLL_WORKQUEUE, /* poll from workqueue */ + IB_POLL_UNBOUND_WORKQUEUE, /* poll from unbound workqueue */ }; struct ib_cq { @@ -1434,6 +1436,7 @@ struct ib_cq { struct irq_poll iop; struct work_struct work; }; + struct workqueue_struct *comp_wq; }; struct ib_srq { -- cgit v1.2.3 From d641e70f40f70e676093ecd8687c53393c463d97 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 3 Sep 2019 20:08:21 +0900 Subject: kprobes: Prohibit probing on BUG() and WARN() address [ Upstream commit e336b4027775cb458dc713745e526fa1a1996b2a ] Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to get the address where the BUG() has occurred, kprobes can not do single-step out-of-line that instruction. So prohibit probing on such address. Without this fix, if someone put a kprobe on WARN(), the kernel will crash with invalid opcode error instead of outputing warning message, because kernel can not find correct bug address. Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt (VMware) Acked-by: Naveen N. Rao Cc: Anil S Keshavamurthy Cc: David S . Miller Cc: Linus Torvalds Cc: Naveen N . Rao Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2 Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- include/linux/bug.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/bug.h b/include/linux/bug.h index 292d6a10b0c2..0faae96302bd 100644 --- a/include/linux/bug.h +++ b/include/linux/bug.h @@ -114,6 +114,11 @@ int is_valid_bugaddr(unsigned long addr); #else /* !CONFIG_GENERIC_BUG */ +static inline void *find_bug(unsigned long bugaddr) +{ + return NULL; +} + static inline enum bug_trap_type report_bug(unsigned long bug_addr, struct pt_regs *regs) { -- cgit v1.2.3 From 1fcdaaaa2f0e8829512c9ba2d6cf53c59c603f8c Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 11 Sep 2019 17:36:50 +0800 Subject: quota: fix wrong condition in is_quota_modification() commit 6565c182094f69e4ffdece337d395eb7ec760efc upstream. Quoted from commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize") " At LSF we decided that if we truncate up from isize we shouldn't trim fallocated blocks that were fallocated with KEEP_SIZE and are past the new i_size. This patch fixes ext4 to do this. " And generic/092 of fstest have covered this case for long time, however is_quota_modification() didn't adjust based on that rule, so that in below condition, we will lose to quota block change: - fallocate blocks beyond EOF - remount - truncate(file_path, file_size) Fix it. Link: https://lore.kernel.org/r/20190911093650.35329-1-yuchao0@huawei.com Fixes: 3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize") CC: stable@vger.kernel.org Signed-off-by: Chao Yu Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- include/linux/quotaops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index f00fa86ac966..87733344768c 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -21,7 +21,7 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb) /* i_mutex must being held */ static inline bool is_quota_modification(struct inode *inode, struct iattr *ia) { - return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) || + return (ia->ia_valid & ATTR_SIZE) || (ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) || (ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid)); } -- cgit v1.2.3 From 9cb93be149c5b3478434fdfae24b5d7118d5331d Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 1 Aug 2019 15:38:14 -0700 Subject: scsi: core: Reduce memory required for SCSI logging [ Upstream commit dccc96abfb21dc19d69e707c38c8ba439bba7160 ] The data structure used for log messages is so large that it can cause a boot failure. Since allocations from that data structure can fail anyway, use kmalloc() / kfree() instead of that data structure. See also https://bugzilla.kernel.org/show_bug.cgi?id=204119. See also commit ded85c193a39 ("scsi: Implement per-cpu logging buffer") # v4.0. Reported-by: Jan Palus Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Ming Lei Cc: Jan Palus Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- include/scsi/scsi_dbg.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/scsi/scsi_dbg.h b/include/scsi/scsi_dbg.h index 56710e03101c..1fcf14aee28a 100644 --- a/include/scsi/scsi_dbg.h +++ b/include/scsi/scsi_dbg.h @@ -5,8 +5,6 @@ struct scsi_cmnd; struct scsi_device; struct scsi_sense_hdr; -#define SCSI_LOG_BUFSIZE 128 - extern void scsi_print_command(struct scsi_cmnd *); extern size_t __scsi_format_command(char *, size_t, const unsigned char *, size_t); -- cgit v1.2.3 From 154129a3ac9b1216ad30ccfe8d9b78146752db4c Mon Sep 17 00:00:00 2001 From: Oleksandr Suvorov Date: Fri, 19 Jul 2019 10:05:30 +0000 Subject: ASoC: Define a set of DAPM pre/post-up events commit cfc8f568aada98f9608a0a62511ca18d647613e2 upstream. Prepare to use SND_SOC_DAPM_PRE_POST_PMU definition to reduce coming code size and make it more readable. Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Suvorov Reviewed-by: Marcel Ziswiler Reviewed-by: Igor Opaniuk Reviewed-by: Fabio Estevam Link: https://lore.kernel.org/r/20190719100524.23300-2-oleksandr.suvorov@toradex.com Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- include/sound/soc-dapm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/sound/soc-dapm.h b/include/sound/soc-dapm.h index f60d755f7ac6..ab9fa975f880 100644 --- a/include/sound/soc-dapm.h +++ b/include/sound/soc-dapm.h @@ -339,6 +339,8 @@ struct device; #define SND_SOC_DAPM_WILL_PMD 0x80 /* called at start of sequence */ #define SND_SOC_DAPM_PRE_POST_PMD \ (SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMD) +#define SND_SOC_DAPM_PRE_POST_PMU \ + (SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMU) /* convenience event type detection */ #define SND_SOC_DAPM_EVENT_ON(e) \ -- cgit v1.2.3 From c48986ff2476fddad67b47467e308a1d7c17c988 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 7 Feb 2019 21:44:41 +0100 Subject: cfg80211: add and use strongly typed element iteration macros commit 0f3b07f027f87a38ebe5c436490095df762819be upstream. Rather than always iterating elements from frames with pure u8 pointers, add a type "struct element" that encapsulates the id/datalen/data format of them. Then, add the element iteration macros * for_each_element * for_each_element_id * for_each_element_extid which take, as their first 'argument', such a structure and iterate through a given u8 array interpreting it as elements. While at it and since we'll need it, also add * for_each_subelement * for_each_subelement_id * for_each_subelement_extid which instead of taking data/length just take an outer element and use its data/datalen. Also add for_each_element_completed() to determine if any of the loops above completed, i.e. it was able to parse all of the elements successfully and no data remained. Use for_each_element_id() in cfg80211_find_ie_match() as the first user of this. Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- include/linux/ieee80211.h | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) (limited to 'include') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index a80516fd65c8..2e306cbb834e 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2630,4 +2630,57 @@ static inline bool ieee80211_action_contains_tpc(struct sk_buff *skb) return true; } +struct element { + u8 id; + u8 datalen; + u8 data[]; +}; + +/* element iteration helpers */ +#define for_each_element(element, _data, _datalen) \ + for (element = (void *)(_data); \ + (u8 *)(_data) + (_datalen) - (u8 *)element >= \ + sizeof(*element) && \ + (u8 *)(_data) + (_datalen) - (u8 *)element >= \ + sizeof(*element) + element->datalen; \ + element = (void *)(element->data + element->datalen)) + +#define for_each_element_id(element, _id, data, datalen) \ + for_each_element(element, data, datalen) \ + if (element->id == (_id)) + +#define for_each_element_extid(element, extid, data, datalen) \ + for_each_element(element, data, datalen) \ + if (element->id == WLAN_EID_EXTENSION && \ + element->datalen > 0 && \ + element->data[0] == (extid)) + +#define for_each_subelement(sub, element) \ + for_each_element(sub, (element)->data, (element)->datalen) + +#define for_each_subelement_id(sub, id, element) \ + for_each_element_id(sub, id, (element)->data, (element)->datalen) + +#define for_each_subelement_extid(sub, extid, element) \ + for_each_element_extid(sub, extid, (element)->data, (element)->datalen) + +/** + * for_each_element_completed - determine if element parsing consumed all data + * @element: element pointer after for_each_element() or friends + * @data: same data pointer as passed to for_each_element() or friends + * @datalen: same data length as passed to for_each_element() or friends + * + * This function returns %true if all the data was parsed or considered + * while walking the elements. Only use this if your for_each_element() + * loop cannot be broken out of, otherwise it always returns %false. + * + * If some data was malformed, this returns %false since the last parsed + * element will not fill the whole remaining data. + */ +static inline bool for_each_element_completed(const struct element *element, + const void *data, size_t datalen) +{ + return (u8 *)element == (u8 *)data + datalen; +} + #endif /* LINUX_IEEE80211_H */ -- cgit v1.2.3 From fbc012a110068da1f2912835e094308aeaf7af9a Mon Sep 17 00:00:00 2001 From: Jouni Malinen Date: Mon, 11 Feb 2019 16:29:04 +0200 Subject: cfg80211: Use const more consistently in for_each_element macros commit 7388afe09143210f555bdd6c75035e9acc1fab96 upstream. Enforce the first argument to be a correct type of a pointer to struct element and avoid unnecessary typecasts from const to non-const pointers (the change in validate_ie_attr() is needed to make this part work). In addition, avoid signed/unsigned comparison within for_each_element() and mark struct element packed just in case. Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- include/linux/ieee80211.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 2e306cbb834e..d3417c18ee2f 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2634,16 +2634,16 @@ struct element { u8 id; u8 datalen; u8 data[]; -}; +} __packed; /* element iteration helpers */ -#define for_each_element(element, _data, _datalen) \ - for (element = (void *)(_data); \ - (u8 *)(_data) + (_datalen) - (u8 *)element >= \ - sizeof(*element) && \ - (u8 *)(_data) + (_datalen) - (u8 *)element >= \ - sizeof(*element) + element->datalen; \ - element = (void *)(element->data + element->datalen)) +#define for_each_element(_elem, _data, _datalen) \ + for (_elem = (const struct element *)(_data); \ + (const u8 *)(_data) + (_datalen) - (const u8 *)_elem >= \ + (int)sizeof(*_elem) && \ + (const u8 *)(_data) + (_datalen) - (const u8 *)_elem >= \ + (int)sizeof(*_elem) + _elem->datalen; \ + _elem = (const struct element *)(_elem->data + _elem->datalen)) #define for_each_element_id(element, _id, data, datalen) \ for_each_element(element, data, datalen) \ @@ -2680,7 +2680,7 @@ struct element { static inline bool for_each_element_completed(const struct element *element, const void *data, size_t datalen) { - return (u8 *)element == (u8 *)data + datalen; + return (const u8 *)element == (const u8 *)data + datalen; } #endif /* LINUX_IEEE80211_H */ -- cgit v1.2.3 From ff33916b3b5b91b2324403965094ba6583d7660c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sun, 6 Oct 2019 14:24:25 -0700 Subject: llc: fix sk_buff leak in llc_conn_service() commit b74555de21acd791f12c4a1aeaf653dd7ac21133 upstream. syzbot reported: BUG: memory leak unreferenced object 0xffff88811eb3de00 (size 224): comm "syz-executor559", pid 7315, jiffies 4294943019 (age 10.300s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 a0 38 24 81 88 ff ff 00 c0 f2 15 81 88 ff ff ..8$............ backtrace: [<000000008d1c66a1>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<000000008d1c66a1>] slab_post_alloc_hook mm/slab.h:439 [inline] [<000000008d1c66a1>] slab_alloc_node mm/slab.c:3269 [inline] [<000000008d1c66a1>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579 [<00000000447d9496>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198 [<000000000cdbf82f>] alloc_skb include/linux/skbuff.h:1058 [inline] [<000000000cdbf82f>] llc_alloc_frame+0x66/0x110 net/llc/llc_sap.c:54 [<000000002418b52e>] llc_conn_ac_send_sabme_cmd_p_set_x+0x2f/0x140 net/llc/llc_c_ac.c:777 [<000000001372ae17>] llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] [<000000001372ae17>] llc_conn_service net/llc/llc_conn.c:400 [inline] [<000000001372ae17>] llc_conn_state_process+0x1ac/0x640 net/llc/llc_conn.c:75 [<00000000f27e53c1>] llc_establish_connection+0x110/0x170 net/llc/llc_if.c:109 [<00000000291b2ca0>] llc_ui_connect+0x10e/0x370 net/llc/af_llc.c:477 [<000000000f9c740b>] __sys_connect+0x11d/0x170 net/socket.c:1840 [...] The bug is that most callers of llc_conn_send_pdu() assume it consumes a reference to the skb, when actually due to commit b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value") it doesn't. Revert most of that commit, and instead make the few places that need llc_conn_send_pdu() to *not* consume a reference call skb_get() before. Fixes: b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value") Reported-by: syzbot+6b825a6494a04cc0e3f7@syzkaller.appspotmail.com Signed-off-by: Eric Biggers Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- include/net/llc_conn.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h index df528a623548..ea985aa7a6c5 100644 --- a/include/net/llc_conn.h +++ b/include/net/llc_conn.h @@ -104,7 +104,7 @@ void llc_sk_reset(struct sock *sk); /* Access to a connection */ int llc_conn_state_process(struct sock *sk, struct sk_buff *skb); -int llc_conn_send_pdu(struct sock *sk, struct sk_buff *skb); +void llc_conn_send_pdu(struct sock *sk, struct sk_buff *skb); void llc_conn_rtn_pdu(struct sock *sk, struct sk_buff *skb); void llc_conn_resend_i_pdu_as_cmd(struct sock *sk, u8 nr, u8 first_p_bit); void llc_conn_resend_i_pdu_as_rsp(struct sock *sk, u8 nr, u8 first_f_bit); -- cgit v1.2.3 From 1d41d2fe9ea977531cbc9d9d0f48dbffaa6e3134 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 24 Sep 2019 13:11:26 -0700 Subject: sch_netem: fix rcu splat in netem_enqueue() commit 159d2c7d8106177bd9a986fd005a311fe0d11285 upstream. qdisc_root() use from netem_enqueue() triggers a lockdep warning. __dev_queue_xmit() uses rcu_read_lock_bh() which is not equivalent to rcu_read_lock() + local_bh_disable_bh as far as lockdep is concerned. WARNING: suspicious RCU usage 5.3.0-rc7+ #0 Not tainted ----------------------------- include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor427/8855: #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline] #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214 #1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804 #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838 stack backtrace: CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357 qdisc_root include/net/sch_generic.h:492 [inline] netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479 __dev_xmit_skb net/core/dev.c:3527 [inline] __dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838 dev_queue_xmit+0x18/0x20 net/core/dev.c:3902 neigh_hh_output include/net/neighbour.h:500 [inline] neigh_output include/net/neighbour.h:509 [inline] ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290 ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417 dst_output include/net/dst.h:436 [inline] ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125 ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555 udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887 udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/sch_generic.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 538f3c4458b0..5d5a137b9067 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -276,6 +276,11 @@ static inline struct Qdisc *qdisc_root(const struct Qdisc *qdisc) return q; } +static inline struct Qdisc *qdisc_root_bh(const struct Qdisc *qdisc) +{ + return rcu_dereference_bh(qdisc->dev_queue->qdisc); +} + static inline struct Qdisc *qdisc_root_sleeping(const struct Qdisc *qdisc) { return qdisc->dev_queue->qdisc_sleeping; -- cgit v1.2.3 From f8b141077a9a8fd2a7f6bae447a710a6d224b44e Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 20 May 2018 16:39:10 +0800 Subject: sctp: fix the issue that flags are ignored when using kernel_connect commit 644fbdeacf1d3edd366e44b8ba214de9d1dd66a9 upstream. Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags param can't be passed into its proto .connect where this flags is really needed. sctp works around it by getting flags from socket file in __sctp_connect. It works for connecting from userspace, as inherently the user sock has socket file and it passes f_flags as the flags param into the proto_ops .connect. However, the sock created by sock_create_kern doesn't have a socket file, and it passes the flags (like O_NONBLOCK) by using the flags param in kernel_connect, which calls proto_ops .connect later. So to fix it, this patch defines a new proto_ops .connect for sctp, sctp_inet_connect, which calls __sctp_connect() directly with this flags param. After this, the sctp's proto .connect can be removed. Note that sctp_inet_connect doesn't need to do some checks that are not needed for sctp, which makes thing better than with inet_dgram_connect. Suggested-by: Marcelo Ricardo Leitner Signed-off-by: Xin Long Acked-by: Neil Horman Acked-by: Marcelo Ricardo Leitner Reviewed-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/sctp/sctp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 579ded2c6ef1..32db5208854d 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -103,6 +103,8 @@ void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int); /* * sctp/socket.c */ +int sctp_inet_connect(struct socket *sock, struct sockaddr *uaddr, + int addr_len, int flags); int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb); int sctp_inet_listen(struct socket *sock, int backlog); void sctp_write_space(struct sock *sk); -- cgit v1.2.3 From cdf8ae78f3441151ad3a1722e71b9737095a3811 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 5 Nov 2017 10:07:43 +0100 Subject: ALSA: timer: Limit max instances per timer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9b7d869ee5a77ed4a462372bb89af622e705bfb8 ] Currently we allow unlimited number of timer instances, and it may bring the system hogging way too much CPU when too many timer instances are opened and processed concurrently. This may end up with a soft-lockup report as triggered by syzkaller, especially when hrtimer backend is deployed. Since such insane number of instances aren't demanded by the normal use case of ALSA sequencer and it merely opens a risk only for abuse, this patch introduces the upper limit for the number of instances per timer backend. As default, it's set to 1000, but for the fine-grained timer like hrtimer, it's set to 100. Reported-by: syzbot Tested-by: Jérôme Glisse Cc: Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- include/sound/timer.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/sound/timer.h b/include/sound/timer.h index c4d76ff056c6..7ae226ab6990 100644 --- a/include/sound/timer.h +++ b/include/sound/timer.h @@ -90,6 +90,8 @@ struct snd_timer { struct list_head ack_list_head; struct list_head sack_list_head; /* slow ack list head */ struct tasklet_struct task_queue; + int max_instances; /* upper limit of timer instances */ + int num_instances; /* current number of timer instances */ }; struct snd_timer_instance { -- cgit v1.2.3 From 37d6ef4556a0f2e31de50b01a948c476773ff8ac Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 24 Oct 2019 13:50:27 -0700 Subject: net: fix sk_page_frag() recursion from memory reclaim [ Upstream commit 20eb4f29b60286e0d6dc01d9c260b4bd383c58fb ] sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: Tejun Heo Cc: Josef Bacik Cc: Eric Dumazet Cc: stable@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/gfp.h | 23 +++++++++++++++++++++++ include/net/sock.h | 11 ++++++++--- 2 files changed, 31 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f8041f9de31e..d11f56bc9c7e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -284,6 +284,29 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } +/** + * gfpflags_normal_context - is gfp_flags a normal sleepable context? + * @gfp_flags: gfp_flags to test + * + * Test whether @gfp_flags indicates that the allocation is from the + * %current context and allowed to sleep. + * + * An allocation being allowed to block doesn't mean it owns the %current + * context. When direct reclaim path tries to allocate memory, the + * allocation context is nested inside whatever %current was doing at the + * time of the original allocation. The nested allocation may be allowed + * to block but modifying anything %current owns can corrupt the outer + * context's expectations. + * + * %true result from this function indicates that the allocation context + * can sleep and use anything that's associated with %current. + */ +static inline bool gfpflags_normal_context(const gfp_t gfp_flags) +{ + return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC)) == + __GFP_DIRECT_RECLAIM; +} + #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else diff --git a/include/net/sock.h b/include/net/sock.h index 116308632fae..469c012a6d01 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2045,12 +2045,17 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp, * sk_page_frag - return an appropriate page_frag * @sk: socket * - * If socket allocation mode allows current thread to sleep, it means its - * safe to use the per task page_frag instead of the per socket one. + * Use the per task page_frag instead of the per socket one for + * optimization when we know that we're in the normal context and owns + * everything that's associated with %current. + * + * gfpflags_allow_blocking() isn't enough here as direct reclaim may nest + * inside other socket operations and end up recursing into sk_page_frag() + * while it's already in use. */ static inline struct page_frag *sk_page_frag(struct sock *sk) { - if (gfpflags_allow_blocking(sk->sk_allocation)) + if (gfpflags_normal_context(sk->sk_allocation)) return ¤t->task_frag; return &sk->sk_frag; -- cgit v1.2.3 From 1f94465d13ace2d4610c4eb2b362454ce2a9d87c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 22 Oct 2019 07:57:46 -0700 Subject: net/flow_dissector: switch to siphash commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 upstream. UDP IPv6 packets auto flowlabels are using a 32bit secret (static u32 hashrnd in net/core/flow_dissector.c) and apply jhash() over fields known by the receivers. Attackers can easily infer the 32bit secret and use this information to identify a device and/or user, since this 32bit secret is only set at boot time. Really, using jhash() to generate cookies sent on the wire is a serious security concern. Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be a dead end. Trying to periodically change the secret (like in sch_sfq.c) could change paths taken in the network for long lived flows. Let's switch to siphash, as we did in commit df453700e8d8 ("inet: switch IP ID generator to siphash") Using a cryptographically strong pseudo random function will solve this privacy issue and more generally remove other weak points in the stack. Packet schedulers using skb_get_hash_perturb() benefit from this change. Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default") Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit") Signed-off-by: Eric Dumazet Reported-by: Jonathan Berger Reported-by: Amit Klein Reported-by: Benny Pinkas Cc: Tom Herbert Signed-off-by: David S. Miller Signed-off-by: Mahesh Bandewar Signed-off-by: Greg Kroah-Hartman --- include/linux/skbuff.h | 3 ++- include/net/flow_dissector.h | 3 ++- include/net/fq.h | 2 +- include/net/fq_impl.h | 4 ++-- 4 files changed, 7 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f8761774a94f..e37112ac332f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1178,7 +1178,8 @@ static inline __u32 skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 return skb->hash; } -__u32 skb_get_hash_perturb(const struct sk_buff *skb, u32 perturb); +__u32 skb_get_hash_perturb(const struct sk_buff *skb, + const siphash_key_t *perturb); static inline __u32 skb_get_hash_raw(const struct sk_buff *skb) { diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index d9534927d93b..1505cf7a4aaf 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -3,6 +3,7 @@ #include #include +#include #include /** @@ -151,7 +152,7 @@ struct flow_dissector { struct flow_keys { struct flow_dissector_key_control control; #define FLOW_KEYS_HASH_START_FIELD basic - struct flow_dissector_key_basic basic; + struct flow_dissector_key_basic basic __aligned(SIPHASH_ALIGNMENT); struct flow_dissector_key_tags tags; struct flow_dissector_key_vlan vlan; struct flow_dissector_key_keyid keyid; diff --git a/include/net/fq.h b/include/net/fq.h index 6d8521a30c5c..2c7687902789 100644 --- a/include/net/fq.h +++ b/include/net/fq.h @@ -70,7 +70,7 @@ struct fq { struct list_head backlogs; spinlock_t lock; u32 flows_cnt; - u32 perturbation; + siphash_key_t perturbation; u32 limit; u32 memory_limit; u32 memory_usage; diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h index 4e6131cd3f43..45a0d9a006a0 100644 --- a/include/net/fq_impl.h +++ b/include/net/fq_impl.h @@ -105,7 +105,7 @@ static struct fq_flow *fq_flow_classify(struct fq *fq, lockdep_assert_held(&fq->lock); - hash = skb_get_hash_perturb(skb, fq->perturbation); + hash = skb_get_hash_perturb(skb, &fq->perturbation); idx = reciprocal_scale(hash, fq->flows_cnt); flow = &fq->flows[idx]; @@ -252,7 +252,7 @@ static int fq_init(struct fq *fq, int flows_cnt) INIT_LIST_HEAD(&fq->backlogs); spin_lock_init(&fq->lock); fq->flows_cnt = max_t(u32, flows_cnt, 1); - fq->perturbation = prandom_u32(); + get_random_bytes(&fq->perturbation, sizeof(fq->perturbation)); fq->quantum = 300; fq->limit = 8192; fq->memory_limit = 16 << 20; /* 16 MBytes */ -- cgit v1.2.3 From a723c971b1f7fb96f2553e020565d19dc2ef54e0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 7 Nov 2019 20:08:19 -0800 Subject: net: fix data-race in neigh_event_send() [ Upstream commit 1b53d64435d56902fc234ff2507142d971a09687 ] KCSAN reported the following data-race [1] The fix will also prevent the compiler from optimizing out the condition. [1] BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1: neigh_event_send include/net/neighbour.h:443 [inline] neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474 neigh_output include/net/neighbour.h:511 [inline] ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290 ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip_output+0xdf/0x210 net/ipv4/ip_output.c:432 dst_output include/net/dst.h:436 [inline] ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125 __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532 ip_queue_xmit+0x45/0x60 include/net/ip.h:237 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline] __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976 tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999 tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618 read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0: neigh_event_send include/net/neighbour.h:442 [inline] neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474 neigh_output include/net/neighbour.h:511 [inline] ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290 ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip_output+0xdf/0x210 net/ipv4/ip_output.c:432 dst_output include/net/dst.h:436 [inline] ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125 __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532 ip_queue_xmit+0x45/0x60 include/net/ip.h:237 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline] __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976 tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999 tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/neighbour.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/neighbour.h b/include/net/neighbour.h index f6017ddc4ded..1c0d07376125 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -425,8 +425,8 @@ static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) { unsigned long now = jiffies; - if (neigh->used != now) - neigh->used = now; + if (READ_ONCE(neigh->used) != now) + WRITE_ONCE(neigh->used, now); if (!(neigh->nud_state&(NUD_CONNECTED|NUD_DELAY|NUD_PROBE))) return __neigh_event_send(neigh, skb); return 0; -- cgit v1.2.3 From 967fcde64f209b8f87ee320ffe0c9974ed1fc68d Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Tue, 5 Nov 2019 21:16:30 -0800 Subject: mm: thp: handle page cache THP correctly in PageTransCompoundMap commit 169226f7e0d275c1879551f37484ef6683579a5c upstream. We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi@linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi Reported-by: Gang Deng Tested-by: Gang Deng Suggested-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: [4.8+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 5 ----- include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 20 ++++++++++++++++++-- 3 files changed, 23 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/linux/mm.h b/include/linux/mm.h index ade072a6fd24..ca6f213fa4f0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -504,11 +504,6 @@ static inline int is_vmalloc_or_module_addr(const void *x) extern void kvfree(const void *addr); -static inline atomic_t *compound_mapcount_ptr(struct page *page) -{ - return &page[1].compound_mapcount; -} - static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8d6decd50220..21b18b755c0e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -262,6 +262,11 @@ struct page_frag_cache { typedef unsigned long vm_flags_t; +static inline atomic_t *compound_mapcount_ptr(struct page *page) +{ + return &page[1].compound_mapcount; +} + /* * A region containing a mapping of a non-memory backed file under NOMMU * conditions. These are held in a global tree and are pinned by the VMAs that diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 74e4dda91238..896e4199623a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -545,12 +545,28 @@ static inline int PageTransCompound(struct page *page) * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount to filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + /* File THP is PMD mapped and not PTE mapped */ + return atomic_read(&page->_mapcount) == + atomic_read(compound_mapcount_ptr(head)); } /* -- cgit v1.2.3 From f823bf0fd93818e0a3feac87426e292ee7f4c5c4 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 31 Oct 2019 11:06:24 +0100 Subject: netfilter: nf_tables: Align nft_expr private data to 64-bit commit 250367c59e6ba0d79d702a059712d66edacd4a1a upstream. Invoking the following commands on a 32-bit architecture with strict alignment requirements (such as an ARMv7-based Raspberry Pi) results in an alignment exception: # nft add table ip test-ip4 # nft add chain ip test-ip4 output { type filter hook output priority 0; } # nft add rule ip test-ip4 output quota 1025 bytes Alignment trap: not handling instruction e1b26f9f at [<7f4473f8>] Unhandled fault: alignment exception (0x001) at 0xb832e824 Internal error: : 1 [#1] PREEMPT SMP ARM Hardware name: BCM2835 [<7f4473fc>] (nft_quota_do_init [nft_quota]) [<7f447448>] (nft_quota_init [nft_quota]) [<7f4260d0>] (nf_tables_newrule [nf_tables]) [<7f4168dc>] (nfnetlink_rcv_batch [nfnetlink]) [<7f416bd0>] (nfnetlink_rcv [nfnetlink]) [<8078b334>] (netlink_unicast) [<8078b664>] (netlink_sendmsg) [<8071b47c>] (sock_sendmsg) [<8071bd18>] (___sys_sendmsg) [<8071ce3c>] (__sys_sendmsg) [<8071ce94>] (sys_sendmsg) The reason is that nft_quota_do_init() calls atomic64_set() on an atomic64_t which is only aligned to 32-bit, not 64-bit, because it succeeds struct nft_expr in memory which only contains a 32-bit pointer. Fix by aligning the nft_expr private data to 64-bit. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Lukas Wunner Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- include/net/netfilter/nf_tables.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 66f6b84df287..7ba9a624090f 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -705,7 +705,8 @@ struct nft_expr_ops { */ struct nft_expr { const struct nft_expr_ops *ops; - unsigned char data[]; + unsigned char data[] + __attribute__((aligned(__alignof__(u64)))); }; static inline void *nft_expr_priv(const struct nft_expr *expr) -- cgit v1.2.3 From a4b012c648780b19fea13c1601ba091713f84f58 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 23 Oct 2019 09:53:03 -0700 Subject: ipvs: move old_secure_tcp into struct netns_ipvs [ Upstream commit c24b75e0f9239e78105f81c5f03a751641eb07ef ] syzbot reported the following issue : BUG: KCSAN: data-race in update_defense_level / update_defense_level read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1: update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0: update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events defense_work_handler Indeed, old_secure_tcp is currently a static variable, while it needs to be a per netns variable. Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Simon Horman Signed-off-by: Sasha Levin --- include/net/ip_vs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index cd6018a9ee24..a26165744d98 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -887,6 +887,7 @@ struct netns_ipvs { struct delayed_work defense_work; /* Work handler */ int drop_rate; int drop_counter; + int old_secure_tcp; atomic_t dropentry; /* locks in ctl.c */ spinlock_t dropentry_lock; /* drop entry handling */ -- cgit v1.2.3 From 4e8e9fd6a3a1c733ee9f0879fa4a3b2589406205 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 4 Nov 2019 21:38:43 -0800 Subject: net: prevent load/store tearing on sk->sk_stamp [ Upstream commit f75359f3ac855940c5718af10ba089b8977bf339 ] Add a couple of READ_ONCE() and WRITE_ONCE() to prevent load-tearing and store-tearing in sock_read_timestamp() and sock_write_timestamp() This might prevent another KCSAN report. Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe") Signed-off-by: Eric Dumazet Cc: Deepa Dinamani Acked-by: Deepa Dinamani Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/sock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index 469c012a6d01..d8d14ae8892a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2142,7 +2142,7 @@ static inline ktime_t sock_read_timestamp(struct sock *sk) return kt; #else - return sk->sk_stamp; + return READ_ONCE(sk->sk_stamp); #endif } @@ -2153,7 +2153,7 @@ static inline void sock_write_timestamp(struct sock *sk, ktime_t kt) sk->sk_stamp = kt; write_sequnlock(&sk->sk_stamp_seq); #else - sk->sk_stamp = kt; + WRITE_ONCE(sk->sk_stamp, kt); #endif } -- cgit v1.2.3 From 91b712cffa4e7be02706ea92368cc0cdb930a56c Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Thu, 12 Jul 2018 19:53:13 +0100 Subject: drm/i915: Prevent writing into a read-only object via a GGTT mmap commit 3e977ac6179b39faa3c0eda5fce4f00663ae298d upstream. If the user has created a read-only object, they should not be allowed to circumvent the write protection by using a GGTT mmapping. Deny it. Also most machines do not support read-only GGTT PTEs, so again we have to reject attempted writes. Fortunately, this is known a priori, so we can at least reject in the call to create the mmap (with a sanity check in the fault handler). v2: Check the vma->vm_flags during mmap() to allow readonly access. v3: Remove VM_MAYWRITE to curtail mprotect() Testcase: igt/gem_userptr_blits/readonly_mmap* Signed-off-by: Chris Wilson Cc: Jon Bloomfield Cc: Joonas Lahtinen Cc: Matthew Auld Cc: David Herrmann Reviewed-by: Matthew Auld #v1 Reviewed-by: Joonas Lahtinen Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi Reviewed-by: Jon Bloomfield Signed-off-by: Greg Kroah-Hartman --- include/drm/drm_vma_manager.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/drm/drm_vma_manager.h b/include/drm/drm_vma_manager.h index 9c03895dc479..72bf408f887f 100644 --- a/include/drm/drm_vma_manager.h +++ b/include/drm/drm_vma_manager.h @@ -42,6 +42,7 @@ struct drm_vma_offset_node { rwlock_t vm_lock; struct drm_mm_node vm_node; struct rb_root vm_files; + bool readonly:1; }; struct drm_vma_offset_manager { -- cgit v1.2.3 From 6e3683ebef352a0581805d499800be465ca84bc5 Mon Sep 17 00:00:00 2001 From: Jack Pham Date: Tue, 1 Aug 2017 02:00:56 -0700 Subject: usb: gadget: core: unmap request from DMA only if previously mapped commit 31fe084ffaaf8abece14f8ca28e5e3b4e2bf97b6 upstream. In the SG case this is already handled since a non-zero request->num_mapped_sgs is a clear indicator that dma_map_sg() had been called. While it would be nice to do the same for the singly mapped case by simply checking for non-zero request->dma, it's conceivable that 0 is a valid dma_addr_t handle. Hence add a flag 'dma_mapped' to struct usb_request and use this to determine the need to call dma_unmap_single(). Otherwise, if a request is not DMA mapped then the result of calling usb_request_unmap_request() would safely be a no-op. Signed-off-by: Jack Pham Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/gadget.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h index e4516e9ded0f..4b810bc7ae63 100644 --- a/include/linux/usb/gadget.h +++ b/include/linux/usb/gadget.h @@ -48,6 +48,7 @@ struct usb_ep; * by adding a zero length packet as needed; * @short_not_ok: When reading data, makes short packets be * treated as errors (queue stops advancing till cleanup). + * @dma_mapped: Indicates if request has been mapped to DMA (internal) * @complete: Function called when request completes, so this request and * its buffer may be re-used. The function will always be called with * interrupts disabled, and it must not sleep. @@ -103,6 +104,7 @@ struct usb_request { unsigned no_interrupt:1; unsigned zero:1; unsigned short_not_ok:1; + unsigned dma_mapped:1; void (*complete)(struct usb_ep *ep, struct usb_request *req); -- cgit v1.2.3 From 9392b2dda0aedff871f10eae4e9b1e7d7e7bc3f9 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Wed, 23 Oct 2019 12:19:51 +0200 Subject: x86/speculation/taa: Add sysfs reporting for TSX Async Abort commit 6608b45ac5ecb56f9e171252229c39580cc85f0f upstream. Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Tested-by: Neelima Krishnan Reviewed-by: Mark Gross Reviewed-by: Tony Luck Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index b27c9b2e683f..6deb30ce96c4 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -56,6 +56,9 @@ extern ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_tsx_async_abort(struct device *dev, + struct device_attribute *attr, + char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit v1.2.3 From c6170b81e7b78942cb4b36fc72cbd75145fd08d5 Mon Sep 17 00:00:00 2001 From: Junaid Shahid Date: Thu, 3 Jan 2019 17:14:28 -0800 Subject: kvm: Convert kvm_lock to a mutex commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream. It doesn't seem as if there is any particular need for kvm_lock to be a spinlock, so convert the lock to a mutex so that sleepable functions (in particular cond_resched()) can be called while holding it. Signed-off-by: Junaid Shahid Signed-off-by: Paolo Bonzini [bwh: Backported to 4.9: - Drop changes in kvm_hyperv_tsc_notifier(), vm_stat_clear(), vcpu_stat_clear(), kvm_uevent_notify_change() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index eb55374b73f3..7ecb32b43dc7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -129,7 +129,7 @@ static inline bool is_error_page(struct page *page) extern struct kmem_cache *kvm_vcpu_cache; -extern spinlock_t kvm_lock; +extern struct mutex kvm_lock; extern struct list_head vm_list; struct kvm_io_range { -- cgit v1.2.3 From 12ceedb7604dfbe370a21df444819ece665c91db Mon Sep 17 00:00:00 2001 From: Vineela Tummalapalli Date: Mon, 4 Nov 2019 12:22:01 +0100 Subject: x86/bugs: Add ITLB_MULTIHIT bug infrastructure commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream. Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: Vineela Tummalapalli Co-developed-by: Pawan Gupta Signed-off-by: Pawan Gupta Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner [bwh: Backported to 4.9: - No support for X86_VENDOR_HYGON, ATOM_AIRMONT_NP - Adjust context, indentation] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6deb30ce96c4..539adb8d1662 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -59,6 +59,8 @@ extern ssize_t cpu_show_mds(struct device *dev, extern ssize_t cpu_show_tsx_async_abort(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_itlb_multihit(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit v1.2.3 From e2bd0778adc4b13e3874b48eaad689e4a3a35833 Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Mon, 4 Nov 2019 12:22:02 +0100 Subject: cpu/speculation: Uninline and export CPU mitigations helpers commit 731dc9df975a5da21237a18c3384f811a7a41cc6 upstream. A kernel module may need to check the value of the "mitigations=" kernel command line parameter as part of its setup when the module needs to perform software mitigations for a CPU flaw. Uninline and export the helper functions surrounding the cpu_mitigations enum to allow for their usage from a module. Lastly, privatize the enum and cpu_mitigations variable since the value of cpu_mitigations can be checked with the exported helper functions. Signed-off-by: Tyler Hicks Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-) (limited to 'include') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 539adb8d1662..e19bbc38a722 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -287,28 +287,7 @@ static inline int cpuhp_smt_enable(void) { return 0; } static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif -/* - * These are used for a global "mitigations=" cmdline option for toggling - * optional CPU mitigations. - */ -enum cpu_mitigations { - CPU_MITIGATIONS_OFF, - CPU_MITIGATIONS_AUTO, - CPU_MITIGATIONS_AUTO_NOSMT, -}; - -extern enum cpu_mitigations cpu_mitigations; - -/* mitigations=off */ -static inline bool cpu_mitigations_off(void) -{ - return cpu_mitigations == CPU_MITIGATIONS_OFF; -} - -/* mitigations=auto,nosmt */ -static inline bool cpu_mitigations_auto_nosmt(void) -{ - return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT; -} +extern bool cpu_mitigations_off(void); +extern bool cpu_mitigations_auto_nosmt(void); #endif /* _LINUX_CPU_H_ */ -- cgit v1.2.3 From 61e191b467f1edea6fae9123c37355133273a31a Mon Sep 17 00:00:00 2001 From: Junaid Shahid Date: Mon, 4 Nov 2019 12:22:02 +0100 Subject: kvm: Add helper function for creating VM worker threads commit c57c80467f90e5504c8df9ad3555d2c78800bf94 upstream. Add a function to create a kernel thread associated with a given VM. In particular, it ensures that the worker thread inherits the priority and cgroups of the calling thread. Signed-off-by: Junaid Shahid Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/kvm_host.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7ecb32b43dc7..0590e7d47b02 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1208,4 +1208,10 @@ static inline bool vcpu_valid_wakeup(struct kvm_vcpu *vcpu) } #endif /* CONFIG_HAVE_KVM_INVALID_WAKEUPS */ +typedef int (*kvm_vm_thread_fn_t)(struct kvm *kvm, uintptr_t data); + +int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, + uintptr_t data, const char *name, + struct task_struct **thread_ptr); + #endif -- cgit v1.2.3 From 40192358bf34eb64685437d0c5a26aa54702431b Mon Sep 17 00:00:00 2001 From: Eric Auger Date: Fri, 8 Nov 2019 16:58:03 +0100 Subject: iommu/vt-d: Fix QI_DEV_IOTLB_PFSID and QI_DEV_EIOTLB_PFSID macros commit 4e7120d79edb31e4ee68e6f8421448e4603be1e9 upstream. For both PASID-based-Device-TLB Invalidate Descriptor and Device-TLB Invalidate Descriptor, the Physical Function Source-ID value is split according to this layout: PFSID[3:0] is set at offset 12 and PFSID[15:4] is put at offset 52. Fix the part laid out at offset 52. Fixes: 0f725561e1684 ("iommu/vt-d: Add definitions for PFSID") Signed-off-by: Eric Auger Acked-by: Jacob Pan Cc: stable@vger.kernel.org # v4.19+ Acked-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- include/linux/intel-iommu.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index e353f6600b0b..27dbab59f034 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -295,7 +295,8 @@ enum { #define QI_DEV_IOTLB_SID(sid) ((u64)((sid) & 0xffff) << 32) #define QI_DEV_IOTLB_QDEP(qdep) (((qdep) & 0x1f) << 16) #define QI_DEV_IOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) -#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52)) +#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \ + ((u64)((pfsid >> 4) & 0xfff) << 52)) #define QI_DEV_IOTLB_SIZE 1 #define QI_DEV_IOTLB_MAX_INVS 32 @@ -320,7 +321,8 @@ enum { #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32) #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0xffff) << 16) #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) -#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52)) +#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \ + ((u64)((pfsid >> 4) & 0xfff) << 52)) #define QI_DEV_EIOTLB_MAX_INVS 32 #define QI_PGRP_IDX(idx) (((u64)(idx)) << 55) -- cgit v1.2.3 From 7228cb8d997fac5813e78cfd17740d51773962c0 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Tue, 11 Sep 2018 11:42:06 -0700 Subject: llc: avoid blocking in llc_sap_close() [ Upstream commit 9708d2b5b7c648e8e0a40d11e8cea12f6277f33c ] llc_sap_close() is called by llc_sap_put() which could be called in BH context in llc_rcv(). We can't block in BH. There is no reason to block it here, kfree_rcu() should be sufficient. Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/llc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/llc.h b/include/net/llc.h index 82d989995d18..95e5ced4c133 100644 --- a/include/net/llc.h +++ b/include/net/llc.h @@ -66,6 +66,7 @@ struct llc_sap { int sk_count; struct hlist_nulls_head sk_laddr_hash[LLC_SK_LADDR_HASH_ENTRIES]; struct hlist_head sk_dev_hash[LLC_SK_DEV_HASH_ENTRIES]; + struct rcu_head rcu; }; static inline -- cgit v1.2.3 From 63ce52b6ee1685f3a3bac8e62dd526662adc89a3 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Thu, 13 Sep 2018 15:16:22 -0500 Subject: libfdt: Ensure INT_MAX is defined in libfdt_env.h [ Upstream commit 53dd9dce6979bc54d64a3a09a2fb20187a025be7 ] The next update of libfdt has a new dependency on INT_MAX. Update the instances of libfdt_env.h in the kernel to either include the necessary header with the definition or define it locally. Cc: Russell King Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring Signed-off-by: Sasha Levin --- include/linux/libfdt_env.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/libfdt_env.h b/include/linux/libfdt_env.h index 2a663c6bb428..8850e243c940 100644 --- a/include/linux/libfdt_env.h +++ b/include/linux/libfdt_env.h @@ -1,6 +1,7 @@ #ifndef _LIBFDT_ENV_H #define _LIBFDT_ENV_H +#include /* For INT_MAX */ #include #include -- cgit v1.2.3 From 79a266352c41bc8076f94f14fd24a7c5cb7cbd9b Mon Sep 17 00:00:00 2001 From: Stefan Agner Date: Sat, 15 Sep 2018 21:38:24 -0700 Subject: cpufeature: avoid warning when compiling with clang [ Upstream commit c785896b21dd8e156326ff660050b0074d3431df ] The table id (second) argument to MODULE_DEVICE_TABLE is often referenced otherwise. This is not the case for CPU features. This leads to warnings when building the kernel with Clang: arch/arm/crypto/aes-ce-glue.c:450:1: warning: variable 'cpu_feature_match_AES' is not needed and will not be emitted [-Wunneeded-internal-declaration] module_cpu_feature_match(AES, aes_init); ^ Avoid warnings by using __maybe_unused, similar to commit 1f318a8bafcf ("modules: mark __inittest/__exittest as __maybe_unused"). Fixes: 67bad2fdb754 ("cpu: add generic support for CPU feature based module autoloading") Signed-off-by: Stefan Agner Acked-by: Ard Biesheuvel Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- include/linux/cpufeature.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/cpufeature.h b/include/linux/cpufeature.h index 986c06c88d81..84d3c81b5978 100644 --- a/include/linux/cpufeature.h +++ b/include/linux/cpufeature.h @@ -45,7 +45,7 @@ * 'asm/cpufeature.h' of your favorite architecture. */ #define module_cpu_feature_match(x, __initfunc) \ -static struct cpu_feature const cpu_feature_match_ ## x[] = \ +static struct cpu_feature const __maybe_unused cpu_feature_match_ ## x[] = \ { { .feature = cpu_feature(x) }, { } }; \ MODULE_DEVICE_TABLE(cpu, cpu_feature_match_ ## x); \ \ -- cgit v1.2.3 From 4d29f08baf4fab35db948ebd1046d8ecf5b86f57 Mon Sep 17 00:00:00 2001 From: Justin Ernst Date: Tue, 25 Sep 2018 09:34:49 -0500 Subject: EDAC: Raise the maximum number of memory controllers [ Upstream commit 6b58859419554fb824e09cfdd73151a195473cbc ] We observe an oops in the skx_edac module during boot: EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0 EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 ... EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1 Too many memory controllers: 16 EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0 We observe there are two memory controllers per socket, with a limit of 16. Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024). [ bp: This is just a band-aid fix until we've sorted out the whole issue with the bus_type association and handling in EDAC and can get rid of this arbitrary limit. ] Signed-off-by: Justin Ernst Signed-off-by: Borislav Petkov Acked-by: Russ Anderson Cc: Mauro Carvalho Chehab Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180925143449.284634-1-justin.ernst@hpe.com Signed-off-by: Sasha Levin --- include/linux/edac.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/edac.h b/include/linux/edac.h index 9e0d78966552..c6233227720c 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -17,6 +17,7 @@ #include #include #include +#include struct device; @@ -778,6 +779,6 @@ struct mem_ctl_info { /* * Maximum number of memory controllers in the coherent fabric. */ -#define EDAC_MAX_MCS 16 +#define EDAC_MAX_MCS 2 * MAX_NUMNODES #endif -- cgit v1.2.3 From edb9044165918050c4429e5450f0ef44c95b3726 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Sun, 21 Jul 2019 22:19:56 +0200 Subject: fbdev: Ditch fb_edid_add_monspecs commit 3b8720e63f4a1fc6f422a49ecbaa3b59c86d5aaf upstream. It's dead code ever since commit 34280340b1dc74c521e636f45cd728f9abf56ee2 Author: Geert Uytterhoeven Date: Fri Dec 4 17:01:43 2015 +0100 fbdev: Remove unused SH-Mobile HDMI driver Also with this gone we can remove the cea_modes db. This entire thing is massively incomplete anyway, compared to the CEA parsing that drm_edid.c does. Acked-by: Linus Torvalds Cc: Tavis Ormandy Signed-off-by: Daniel Vetter Signed-off-by: Bartlomiej Zolnierkiewicz Link: https://patchwork.freedesktop.org/patch/msgid/20190721201956.941-1-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman --- include/linux/fb.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/linux/fb.h b/include/linux/fb.h index a964d076b4dc..493d6d7794de 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -732,8 +732,6 @@ extern int fb_parse_edid(unsigned char *edid, struct fb_var_screeninfo *var); extern const unsigned char *fb_firmware_edid(struct device *device); extern void fb_edid_to_monspecs(unsigned char *edid, struct fb_monspecs *specs); -extern void fb_edid_add_monspecs(unsigned char *edid, - struct fb_monspecs *specs); extern void fb_destroy_modedb(struct fb_videomode *modedb); extern int fb_find_mode_cvt(struct fb_videomode *mode, int margins, int rb); extern unsigned char *fb_ddc_read(struct i2c_adapter *adapter); @@ -807,7 +805,6 @@ struct dmt_videomode { extern const char *fb_mode_option; extern const struct fb_videomode vesa_modes[]; -extern const struct fb_videomode cea_modes[65]; extern const struct dmt_videomode dmt_modes[]; struct fb_modelist { -- cgit v1.2.3 From f3216243df6758ee8c80f63b95b9be424abb0ce2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 31 Jan 2017 16:57:29 +0100 Subject: block: introduce blk_rq_is_passthrough commit 57292b58ddb58689e8c3b4c6eadbef10d9ca44dd upstream. This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe [only take the blkdev.h changes as we only want the function for backported patches - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/linux/blkdev.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bd738aafd432..7582a5712458 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -212,6 +212,11 @@ struct request { (req)->cmd_flags |= flags; \ } while (0) +static inline bool blk_rq_is_passthrough(struct request *rq) +{ + return rq->cmd_type != REQ_TYPE_FS; +} + static inline unsigned short req_get_ioprio(struct request *req) { return req->ioprio; @@ -663,7 +668,7 @@ static inline void blk_clear_rl_full(struct request_list *rl, bool sync) static inline bool rq_mergeable(struct request *rq) { - if (rq->cmd_type != REQ_TYPE_FS) + if (blk_rq_is_passthrough(rq)) return false; if (req_op(rq) == REQ_OP_FLUSH) @@ -910,7 +915,7 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq, { struct request_queue *q = rq->q; - if (unlikely(rq->cmd_type != REQ_TYPE_FS)) + if (blk_rq_is_passthrough(rq)) return q->limits.max_hw_sectors; if (!q->limits.chunk_sectors || -- cgit v1.2.3 From 3d827805a11e9d31681f9b32968e34bd9b01ff1d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 8 Sep 2018 22:09:48 -0400 Subject: SUNRPC: Fix priority queue fairness [ Upstream commit f42f7c283078ce3c1e8368b140e270755b1ae313 ] Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- include/linux/sunrpc/sched.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 7ba040c797ec..da2791b5fe87 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -185,7 +185,6 @@ struct rpc_timer { struct rpc_wait_queue { spinlock_t lock; struct list_head tasks[RPC_NR_PRIORITY]; /* task queue for each priority level */ - pid_t owner; /* process id of last task serviced */ unsigned char maxpriority; /* maximum priority (0 if queue is not a priority queue) */ unsigned char priority; /* current priority */ unsigned char nr; /* # tasks remaining for cookie */ @@ -201,7 +200,6 @@ struct rpc_wait_queue { * from a single cookie. The aim is to improve * performance of NFS operations such as read/write. */ -#define RPC_BATCH_COUNT 16 #define RPC_IS_PRIORITY(q) ((q)->maxpriority > 0) /* -- cgit v1.2.3 From 70390678b7090de4055231a8be0492c301609372 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Tue, 11 Sep 2018 16:40:20 -0700 Subject: dmaengine: ep93xx: Return proper enum in ep93xx_dma_chan_direction [ Upstream commit 9524d6b265f9b2b9a61fceb2ee2ce1c2a83e39ca ] Clang warns when implicitly converting from one enumerated type to another. Avoid this by using the equivalent value from the expected type. In file included from drivers/dma/ep93xx_dma.c:30: ./include/linux/platform_data/dma-ep93xx.h:88:10: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] return DMA_NONE; ~~~~~~ ^~~~~~~~ 1 warning generated. Reported-by: Nick Desaulniers Signed-off-by: Nathan Chancellor Reviewed-by: Nick Desaulniers Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- include/linux/platform_data/dma-ep93xx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/platform_data/dma-ep93xx.h b/include/linux/platform_data/dma-ep93xx.h index e82c642fa53c..5913be0793a2 100644 --- a/include/linux/platform_data/dma-ep93xx.h +++ b/include/linux/platform_data/dma-ep93xx.h @@ -84,7 +84,7 @@ static inline enum dma_transfer_direction ep93xx_dma_chan_direction(struct dma_chan *chan) { if (!ep93xx_dma_chan_is_m2p(chan)) - return DMA_NONE; + return DMA_TRANS_NONE; /* even channels are for TX, odd for RX */ return (chan->chan_id % 2 == 0) ? DMA_MEM_TO_DEV : DMA_DEV_TO_MEM; -- cgit v1.2.3 From c7263e711e0f2cb388ac4c448c5b4bc6ed858983 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Mon, 24 Sep 2018 12:57:16 -0700 Subject: IB/mlx4: Avoid implicit enumerated type conversion [ Upstream commit b56511c15713ba6c7572e77a41f7ddba9c1053ec ] Clang warns when one enumerated type is implicitly converted to another. drivers/infiniband/hw/mlx4/mad.c:1811:41: warning: implicit conversion from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration type 'enum ib_qp_create_flags' [-Wenum-conversion] qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_TUNNEL_QP; ~ ^~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:1819:41: warning: implicit conversion from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration type 'enum ib_qp_create_flags' [-Wenum-conversion] qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_SQP; ~ ^~~~~~~~~~~~~~~~~ The type mlx4_ib_qp_flags explicitly provides supplemental values to the type ib_qp_create_flags. Make that clear to Clang by changing the create_flags type to u32. Reported-by: Nick Desaulniers Signed-off-by: Nathan Chancellor Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin --- include/rdma/ib_verbs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 0638b8aeba34..1d9c701b1c7b 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1026,7 +1026,7 @@ struct ib_qp_init_attr { struct ib_qp_cap cap; enum ib_sig_type sq_sig_type; enum ib_qp_type qp_type; - enum ib_qp_create_flags create_flags; + u32 create_flags; /* * Only needed for special QP types, or when using the RW API. -- cgit v1.2.3 From 37dcea4d102f323d40dc81ee915f2c017565d217 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Tue, 28 Aug 2018 17:02:40 -0300 Subject: mfd: mc13xxx-core: Fix PMIC shutdown when reading ADC values [ Upstream commit 55143439b7b501882bea9d95a54adfe00ffc79a3 ] When trying to read any MC13892 ADC channel on a imx51-babbage board: The MC13892 PMIC shutdowns completely. After debugging this issue and comparing the MC13892 and MC13783 initializations done in the vendor kernel, it was noticed that the CHRGRAWDIV bit of the ADC0 register was not being set. This bit is set by default after power on, but the driver was clearing it. After setting this bit it is possible to read the ADC values correctly. Signed-off-by: Fabio Estevam Tested-by: Chris Healy Signed-off-by: Lee Jones Signed-off-by: Sasha Levin --- include/linux/mfd/mc13xxx.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mfd/mc13xxx.h b/include/linux/mfd/mc13xxx.h index 638222e43e48..93011c61aafd 100644 --- a/include/linux/mfd/mc13xxx.h +++ b/include/linux/mfd/mc13xxx.h @@ -247,6 +247,7 @@ struct mc13xxx_platform_data { #define MC13XXX_ADC0_TSMOD0 (1 << 12) #define MC13XXX_ADC0_TSMOD1 (1 << 13) #define MC13XXX_ADC0_TSMOD2 (1 << 14) +#define MC13XXX_ADC0_CHRGRAWDIV (1 << 15) #define MC13XXX_ADC0_ADINC1 (1 << 16) #define MC13XXX_ADC0_ADINC2 (1 << 17) -- cgit v1.2.3 From 9499c51f7ab1bffdb8e309d69484f3fe8caf0e0b Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Wed, 5 Sep 2018 13:54:07 +0200 Subject: mfd: max8997: Enale irq-wakeup unconditionally [ Upstream commit efddff27c886e729a7f84a7205bd84d7d4af7336 ] IRQ wake up support for MAX8997 driver was initially configured by respective property in pdata. However, after the driver conversion to device-tree, setting it was left as 'todo'. Nowadays most of other PMIC MFD drivers initialized from device-tree assume that they can be an irq wakeup source, so enable it also for MAX8997. This fixes support for wakeup from MAX8997 RTC alarm. Signed-off-by: Marek Szyprowski Reviewed-by: Krzysztof Kozlowski Signed-off-by: Lee Jones Signed-off-by: Sasha Levin --- include/linux/mfd/max8997.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/mfd/max8997.h b/include/linux/mfd/max8997.h index cf815577bd68..3ae1fe743bc3 100644 --- a/include/linux/mfd/max8997.h +++ b/include/linux/mfd/max8997.h @@ -178,7 +178,6 @@ struct max8997_led_platform_data { struct max8997_platform_data { /* IRQ */ int ono; - int wakeup; /* ---- PMIC ---- */ struct max8997_regulator_data *regulators; -- cgit v1.2.3 From 835c18375a16d0cb8bc56576821d23f8e66dab73 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Tue, 30 Oct 2018 15:04:59 -0700 Subject: linux/bitmap.h: handle constant zero-size bitmaps correctly [ Upstream commit 7275b097851a5e2e0dd4da039c7e96b59ac5314e ] The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Reviewed-by: Andy Shevchenko Cc: Yury Norov Cc: Rasmus Villemoes Cc: Sudeep Holla Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/bitmap.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 3b77588a9360..dc56304ac829 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -185,8 +185,13 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1))) +/* + * The static inlines below do not handle constant nbits==0 correctly, + * so make such users (should any ever turn up) call the out-of-line + * versions. + */ #define small_const_nbits(nbits) \ - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) + (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { -- cgit v1.2.3 From b74d722d60354bf8a34bef72d8aa2eef40a75ddf Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Tue, 30 Oct 2018 15:05:07 -0700 Subject: linux/bitmap.h: fix type of nbits in bitmap_shift_right() [ Upstream commit d9873969fa8725dc6a5a21ab788c057fd8719751 ] Most other bitmap API, including the OOL version __bitmap_shift_right, take unsigned nbits. This was accidentally left out from 2fbad29917c98. Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk Fixes: 2fbad29917c98 ("lib: bitmap: change bitmap_shift_right to take unsigned parameters") Signed-off-by: Rasmus Villemoes Reported-by: Yury Norov Reviewed-by: Andy Shevchenko Cc: Rasmus Villemoes Cc: Sudeep Holla Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index dc56304ac829..dec03c0dbc21 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -321,7 +321,7 @@ static __always_inline int bitmap_weight(const unsigned long *src, unsigned int } static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *src, - unsigned int shift, int nbits) + unsigned int shift, unsigned int nbits) { if (small_const_nbits(nbits)) *dst = (*src & BITMAP_LAST_WORD_MASK(nbits)) >> shift; -- cgit v1.2.3 From e2e7b55178b11f18ebedc2060cee72c4b3742c41 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 30 Oct 2018 15:10:24 -0700 Subject: mm/memory_hotplug: make add_memory() take the device_hotplug_lock [ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Pavel Tatashin Reviewed-by: Rafael J. Wysocki Reviewed-by: Rashmica Gupta Reviewed-by: Oscar Salvador Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Nathan Fontenot Cc: John Allen Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Mathieu Malaterre Cc: Pavel Tatashin Cc: YASUAKI ISHIMATSU Cc: Balbir Singh Cc: Haiyang Zhang Cc: Heiko Carstens Cc: Jonathan Corbet Cc: Kate Stewart Cc: "K. Y. Srinivasan" Cc: Martin Schwidefsky Cc: Michael Neuling Cc: Philippe Ombredanne Cc: Stephen Hemminger Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/memory_hotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 134a2f69c21a..9469eef30095 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -272,6 +272,7 @@ static inline void remove_memory(int nid, u64 start, u64 size) {} extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); +extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource, bool online); extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default, -- cgit v1.2.3 From e528acd31a13f4115ab5a0bf804fd2356a8d3c32 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 11 Nov 2019 14:12:27 -0800 Subject: KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream. Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl Reported-by: Adam Borowski Analyzed-by: David Hildenbrand Acked-by: Dan Williams Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman [sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0590e7d47b02..ab90a8541aaa 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -843,6 +843,7 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); void kvm_vcpu_kick(struct kvm_vcpu *vcpu); bool kvm_is_reserved_pfn(kvm_pfn_t pfn); +bool kvm_is_zone_device_pfn(kvm_pfn_t pfn); struct kvm_irq_ack_notifier { struct hlist_node link; -- cgit v1.2.3 From 1f9509845f2f462f778c640a7c3e475ed2894502 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 22 Oct 2019 20:57:06 -0700 Subject: reset: fix reset_control_ops kerneldoc comment [ Upstream commit f430c7ed8bc22992ed528b518da465b060b9223f ] Add a missing short description to the reset_control_ops documentation. Signed-off-by: Randy Dunlap [p.zabel@pengutronix.de: rebased and updated commit message] Signed-off-by: Philipp Zabel Signed-off-by: Sasha Levin --- include/linux/reset-controller.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/reset-controller.h b/include/linux/reset-controller.h index db1fe6772ad5..b5542aff192c 100644 --- a/include/linux/reset-controller.h +++ b/include/linux/reset-controller.h @@ -6,7 +6,7 @@ struct reset_controller_dev; /** - * struct reset_control_ops + * struct reset_control_ops - reset controller driver callbacks * * @reset: for self-deasserting resets, does all necessary * things to reset the device -- cgit v1.2.3 From c4c41610a3470655eddce6ebfe54f4b9cd5ffbdd Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Thu, 6 Dec 2018 10:45:49 +0100 Subject: gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB [ Upstream commit c5510b8dafce5f3f5a039c9b262ebcae0092c462 ] If CONFIG_GPOILIB is not set, the stub of gpio_to_desc() should return the same type of error as regular version: NULL. All the callers compare the return value of gpio_to_desc() against NULL, so returned ERR_PTR would be treated as non-error case leading to dereferencing of error value. Fixes: 79a9becda894 ("gpiolib: export descriptor-based GPIO interface") Signed-off-by: Krzysztof Kozlowski Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- include/linux/gpio/consumer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index fb0fde686cb1..4a9838feb086 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -398,7 +398,7 @@ static inline int gpiod_to_irq(const struct gpio_desc *desc) static inline struct gpio_desc *gpio_to_desc(unsigned gpio) { - return ERR_PTR(-EINVAL); + return NULL; } static inline int desc_to_gpio(const struct gpio_desc *desc) -- cgit v1.2.3 From af7fee14ff3d3327e832e496e676aa2efc7ec1aa Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Fri, 28 Dec 2018 00:34:36 -0800 Subject: vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n [ Upstream commit 8b09549c2bfd9f3f8f4cdad74107ef4f4ff9cdd7 ] Commit fa5e084e43eb ("vmscan: do not unconditionally treat zones that fail zone_reclaim() as full") changed the return value of node_reclaim(). The original return value 0 means NODE_RECLAIM_SOME after this commit. While the return value of node_reclaim() when CONFIG_NUMA is n is not changed. This will leads to call zone_watermark_ok() again. This patch fixes the return value by adjusting to NODE_RECLAIM_NOSCAN. Since node_reclaim() is only called in page_alloc.c, move it to mm/internal.h. Link: http://lkml.kernel.org/r/20181113080436.22078-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang Acked-by: Michal Hocko Reviewed-by: Matthew Wilcox Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/swap.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include') diff --git a/include/linux/swap.h b/include/linux/swap.h index 2228907d08ff..d13617c7bcc4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -335,14 +335,8 @@ extern unsigned long vm_total_pages; extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; extern int sysctl_min_slab_ratio; -extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int); #else #define node_reclaim_mode 0 -static inline int node_reclaim(struct pglist_data *pgdat, gfp_t mask, - unsigned int order) -{ - return 0; -} #endif extern int page_evictable(struct page *page); -- cgit v1.2.3 From bd5d9cd97ca55bba497a446f7a5d6a47b00869d5 Mon Sep 17 00:00:00 2001 From: Alexey Skidanov Date: Thu, 3 Jan 2019 15:26:44 -0800 Subject: lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk [ Upstream commit 52fbf1134d479234d7e64ba9dcbaea23405f229e ] gen_pool_alloc_algo() uses different allocation functions implementing different allocation algorithms. With gen_pool_first_fit_align() allocation function, the returned address should be aligned on the requested boundary. If chunk start address isn't aligned on the requested boundary, the returned address isn't aligned too. The only way to get properly aligned address is to initialize the pool with chunks aligned on the requested boundary. If want to have an ability to allocate buffers aligned on different boundaries (for example, 4K, 1MB, ...), the chunk start address should be aligned on the max possible alignment. This happens because gen_pool_first_fit_align() looks for properly aligned memory block without taking into account the chunk start address alignment. To fix this, we provide chunk start address to gen_pool_first_fit_align() and change its implementation such that it starts looking for properly aligned block with appropriate offset (exactly as is done in CMA). Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.com Signed-off-by: Alexey Skidanov Reviewed-by: Andrew Morton Cc: Logan Gunthorpe Cc: Daniel Mentz Cc: Mathieu Desnoyers Cc: Laura Abbott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/genalloc.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 206fe3bccccc..575f2dee8cf7 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -50,7 +50,8 @@ typedef unsigned long (*genpool_algo_t)(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, - void *data, struct gen_pool *pool); + void *data, struct gen_pool *pool, + unsigned long start_addr); /* * General purpose special memory pool descriptor. @@ -130,24 +131,24 @@ extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo, extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, void *data, - struct gen_pool *pool); + struct gen_pool *pool, unsigned long start_addr); extern unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, - void *data, struct gen_pool *pool); + void *data, struct gen_pool *pool, unsigned long start_addr); extern unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, - void *data, struct gen_pool *pool); + void *data, struct gen_pool *pool, unsigned long start_addr); extern unsigned long gen_pool_first_fit_order_align(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, - void *data, struct gen_pool *pool); + void *data, struct gen_pool *pool, unsigned long start_addr); extern unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size, unsigned long start, unsigned int nr, void *data, - struct gen_pool *pool); + struct gen_pool *pool, unsigned long start_addr); extern struct gen_pool *devm_gen_pool_create(struct device *dev, -- cgit v1.2.3 From 9d3fcde90f85fce9e769825730020e1f97a80bbf Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 12 Feb 2019 12:26:27 -0800 Subject: net: fix possible overflow in __sk_mem_raise_allocated() [ Upstream commit 5bf325a53202b8728cf7013b72688c46071e212e ] With many active TCP sockets, fat TCP sockets could fool __sk_mem_raise_allocated() thanks to an overflow. They would increase their share of the memory, instead of decreasing it. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index d8d14ae8892a..aed436567d70 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1201,7 +1201,7 @@ static inline void sk_sockets_allocated_inc(struct sock *sk) percpu_counter_inc(sk->sk_prot->sockets_allocated); } -static inline int +static inline u64 sk_sockets_allocated_read_positive(struct sock *sk) { return percpu_counter_read_positive(sk->sk_prot->sockets_allocated); -- cgit v1.2.3 From 6b2540ebc763fd76e7564e798e9ec2be97f769ae Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 27 Feb 2019 13:37:26 +0300 Subject: net: dev: Use unsigned integer as an argument to left-shift [ Upstream commit f4d7b3e23d259c44f1f1c39645450680fcd935d6 ] 1 << 31 is Undefined Behaviour according to the C standard. Use U type modifier to avoid theoretical overflow. Signed-off-by: Andy Shevchenko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2ecf0f32444e..29ed5977ac04 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3565,7 +3565,7 @@ static inline u32 netif_msg_init(int debug_value, int default_msg_enable_bits) if (debug_value == 0) /* no output */ return 0; /* set low N bits */ - return (1 << debug_value) - 1; + return (1U << debug_value) - 1; } static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu) -- cgit v1.2.3 From 4d9210904e0fc73c33c04668dd89bffdbf161b24 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 23 Nov 2019 11:56:49 +0800 Subject: sctp: cache netns in sctp_ep_common [ Upstream commit 312434617cb16be5166316cf9d08ba760b1042a1 ] This patch is to fix a data-race reported by syzbot: BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1: sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091 sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465 sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916 inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734 __sys_accept4+0x224/0x430 net/socket.c:1754 __do_sys_accept net/socket.c:1795 [inline] __se_sys_accept net/socket.c:1792 [inline] __x64_sys_accept+0x4e/0x60 net/socket.c:1792 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0: sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894 rht_key_get_hash include/linux/rhashtable.h:133 [inline] rht_key_hashfn include/linux/rhashtable.h:159 [inline] rht_head_hashfn include/linux/rhashtable.h:174 [inline] head_hashfn lib/rhashtable.c:41 [inline] rhashtable_rehash_one lib/rhashtable.c:245 [inline] rhashtable_rehash_chain lib/rhashtable.c:276 [inline] rhashtable_rehash_table lib/rhashtable.c:316 [inline] rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate is changing its value. However, what rhashtable wants is netns from asoc base.sk, and for an asoc, its netns won't change once set. So we can simply fix it by caching netns since created. Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable") Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- include/net/sctp/structs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 11c3bf262a85..b46133a41f55 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1202,6 +1202,9 @@ struct sctp_ep_common { /* What socket does this endpoint belong to? */ struct sock *sk; + /* Cache netns and it won't change once set */ + struct net *net; + /* This is where we receive inbound chunks. */ struct sctp_inq inqueue; -- cgit v1.2.3 From 8bddce881ac1ab6dd3da2e1504601eeb2e84b170 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Tue, 30 Oct 2018 15:11:04 -0700 Subject: serial: core: Allow processing sysrq at port unlock time [ Upstream commit d6e1935819db0c91ce4a5af82466f3ab50d17346 ] Right now serial drivers process sysrq keys deep in their character receiving code. This means that they've already grabbed their port->lock spinlock. This can end up getting in the way if we've go to do serial stuff (especially kgdb) in response to the sysrq. Serial drivers have various hacks in them to handle this. Looking at '8250_port.c' you can see that the console_write() skips locking if we're in the sysrq handler. Looking at 'msm_serial.c' you can see that the port lock is dropped around uart_handle_sysrq_char(). It turns out that these hacks aren't exactly perfect. If you have lockdep turned on and use something like the 8250_port hack you'll get a splat that looks like: WARNING: possible circular locking dependency detected [...] is trying to acquire lock: ... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4 but task is already holding lock: ... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&port_lock_key){-.-.}: _raw_spin_lock_irqsave+0x58/0x70 serial8250_console_write+0xa8/0x250 univ8250_console_write+0x40/0x4c console_unlock+0x528/0x5e4 register_console+0x2c4/0x3b0 uart_add_one_port+0x350/0x478 serial8250_register_8250_port+0x350/0x3a8 dw8250_probe+0x67c/0x754 platform_drv_probe+0x58/0xa4 really_probe+0x150/0x294 driver_probe_device+0xac/0xe8 __driver_attach+0x98/0xd0 bus_for_each_dev+0x84/0xc8 driver_attach+0x2c/0x34 bus_add_driver+0xf0/0x1ec driver_register+0xb4/0x100 __platform_driver_register+0x60/0x6c dw8250_platform_driver_init+0x20/0x28 ... -> #0 (console_owner){-.-.}: lock_acquire+0x1e8/0x214 console_unlock+0x35c/0x5e4 vprintk_emit+0x230/0x274 vprintk_default+0x7c/0x84 vprintk_func+0x190/0x1bc printk+0x80/0xa0 __handle_sysrq+0x104/0x21c handle_sysrq+0x30/0x3c serial8250_read_char+0x15c/0x18c serial8250_rx_chars+0x34/0x74 serial8250_handle_irq+0x9c/0xe4 dw8250_handle_irq+0x98/0xcc serial8250_interrupt+0x50/0xe8 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&port_lock_key); lock(console_owner); lock(&port_lock_key); lock(console_owner); *** DEADLOCK *** The hack used in 'msm_serial.c' doesn't cause the above splats but it seems a bit ugly to unlock / lock our spinlock deep in our irq handler. It seems like we could defer processing the sysrq until the end of the interrupt handler right after we've unlocked the port. With this scheme if a whole batch of sysrq characters comes in one irq then we won't handle them all, but that seems like it should be a fine compromise. Signed-off-by: Douglas Anderson Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- include/linux/serial_core.h | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index eb4f6456521e..cd95b5e395a3 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -161,6 +161,7 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ + unsigned int sysrq_ch; /* char for sysrq */ #endif /* flags must be updated while holding port mutex */ @@ -470,8 +471,42 @@ uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) } return 0; } +static inline int +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) +{ + if (port->sysrq) { + if (ch && time_before(jiffies, port->sysrq)) { + port->sysrq_ch = ch; + port->sysrq = 0; + return 1; + } + port->sysrq = 0; + } + return 0; +} +static inline void +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags) +{ + int sysrq_ch; + + sysrq_ch = port->sysrq_ch; + port->sysrq_ch = 0; + + spin_unlock_irqrestore(&port->lock, irqflags); + + if (sysrq_ch) + handle_sysrq(sysrq_ch); +} #else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; } +static inline int +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; } +static inline void +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags) +{ + spin_unlock_irqrestore(&port->lock, irqflags); +} #endif /* -- cgit v1.2.3 From a2b503b16518d9c65786f53e0cb259edda1a0f80 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Fri, 16 Nov 2018 19:19:30 -0800 Subject: regulator: Fix return value of _set_load() stub [ Upstream commit f1abf67217de91f5cd3c757ae857632ca565099a ] The stub implementation of _set_load() returns a mode value which is within the bounds of valid return codes for success (the documentation just says that failures are negative error codes) but not sensible or what the actual implementation does. Fix it to just return 0. Reported-by: Cheng-Yi Chiang Signed-off-by: Mark Brown Reviewed-by: Douglas Anderson Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- include/linux/regulator/consumer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h index 692108222271..bab9236e4367 100644 --- a/include/linux/regulator/consumer.h +++ b/include/linux/regulator/consumer.h @@ -479,7 +479,7 @@ static inline unsigned int regulator_get_mode(struct regulator *regulator) static inline int regulator_set_load(struct regulator *regulator, int load_uA) { - return REGULATOR_MODE_NORMAL; + return 0; } static inline int regulator_allow_bypass(struct regulator *regulator, -- cgit v1.2.3 From d518e7b3f0ba3bc919daded8452be088be4e6961 Mon Sep 17 00:00:00 2001 From: Vincent Chen Date: Thu, 22 Nov 2018 11:14:38 +0800 Subject: math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning [ Upstream commit 83312f1b7ae205dca647bf52bbe2d51303cdedfb ] _FP_ROUND_ZERO is defined as 0 and used as a statemente in macro _FP_ROUND. This generates "error: statement with no effect [-Werror=unused-value]" from gcc. Defining _FP_ROUND_ZERO as (void)0 to fix it. This modification is quoted from glibc 'commit (8ed1e7d5894000c155acbd06f)' Signed-off-by: Vincent Chen Acked-by: Greentime Hu Signed-off-by: Greentime Hu Signed-off-by: Sasha Levin --- include/math-emu/soft-fp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/math-emu/soft-fp.h b/include/math-emu/soft-fp.h index 3f284bc03180..5650c1628383 100644 --- a/include/math-emu/soft-fp.h +++ b/include/math-emu/soft-fp.h @@ -138,7 +138,7 @@ do { \ _FP_FRAC_ADDI_##wc(X, _FP_WORK_ROUND); \ } while (0) -#define _FP_ROUND_ZERO(wc, X) 0 +#define _FP_ROUND_ZERO(wc, X) (void)0 #define _FP_ROUND_PINF(wc, X) \ do { \ -- cgit v1.2.3 From b67aadca61dad09dfff3291914e4233e0b186602 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 23 Nov 2018 23:07:14 +0300 Subject: ACPI: fix acpi_find_child_device() invocation in acpi_preset_companion() [ Upstream commit f8c6d1402b89f22a3647705d63cbd171aa19a77e ] acpi_find_child_device() accepts boolean not pointer as last argument. Signed-off-by: Alexey Dobriyan [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- include/linux/acpi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 719eb97217a3..5670bb9788bb 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -77,7 +77,7 @@ static inline bool has_acpi_companion(struct device *dev) static inline void acpi_preset_companion(struct device *dev, struct acpi_device *parent, u64 addr) { - ACPI_COMPANION_SET(dev, acpi_find_child_device(parent, addr, NULL)); + ACPI_COMPANION_SET(dev, acpi_find_child_device(parent, addr, false)); } static inline const char *acpi_dev_name(struct acpi_device *adev) -- cgit v1.2.3 From dafd9e94580c6fdcc27b5750b6d2c1987ca46f5a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Niklas=20S=C3=B6derlund?= Date: Wed, 29 Aug 2018 23:29:21 +0200 Subject: dma-mapping: fix return type of dma_set_max_seg_size() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c9d76d0655c06b8c1f944e46c4fd9e9cf4b331c0 ] The function dma_set_max_seg_size() can return either 0 on success or -EIO on error. Change its return type from unsigned int to int to capture this. Signed-off-by: Niklas Söderlund Reviewed-by: Geert Uytterhoeven Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- include/linux/dma-mapping.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 704caae69c42..97f817f4eb78 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -618,8 +618,7 @@ static inline unsigned int dma_get_max_seg_size(struct device *dev) return SZ_64K; } -static inline unsigned int dma_set_max_seg_size(struct device *dev, - unsigned int size) +static inline int dma_set_max_seg_size(struct device *dev, unsigned int size) { if (dev->dma_parms) { dev->dma_parms->max_segment_size = size; -- cgit v1.2.3 From 1bd5701bbe69f2cf3a7f297d9db61d88a3400d4d Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Sun, 18 Nov 2018 21:18:30 +0100 Subject: mtd: fix mtd_oobavail() incoherent returned value [ Upstream commit 4348433d8c0234f44adb6e12112e69343f50f0c5 ] mtd_oobavail() returns either mtd->oovabail or mtd->oobsize. Both values are unsigned 32-bit entities, so there is no reason to pretend returning a signed one. Signed-off-by: Miquel Raynal Signed-off-by: Boris Brezillon Signed-off-by: Sasha Levin --- include/linux/mtd/mtd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 13f8052b9ff9..13ddba5e531d 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -392,7 +392,7 @@ static inline struct device_node *mtd_get_of_node(struct mtd_info *mtd) return mtd->dev.of_node; } -static inline int mtd_oobavail(struct mtd_info *mtd, struct mtd_oob_ops *ops) +static inline u32 mtd_oobavail(struct mtd_info *mtd, struct mtd_oob_ops *ops) { return ops->mode == MTD_OPS_AUTO_OOB ? mtd->oobavail : mtd->oobsize; } -- cgit v1.2.3 From 95595c8f60ca995e6ab7e10130265762d504f553 Mon Sep 17 00:00:00 2001 From: Dmitry Safonov Date: Thu, 1 Nov 2018 00:24:48 +0000 Subject: tty: Don't block on IO when ldisc change is pending [ Upstream commit c96cf923a98d1b094df9f0cf97a83e118817e31b ] There might be situations where tty_ldisc_lock() has blocked, but there is already IO on tty and it prevents line discipline changes. It might theoretically turn into dead-lock. Basically, provide more priority to pending tty_ldisc_lock() than to servicing reads/writes over tty. User-visible issue was reported by Mikulas where on pa-risc with Debian 5 reboot took either 80 seconds, 3 minutes or 3:25 after proper locking in tty_reopen(). Cc: Jiri Slaby Reported-by: Mikulas Patocka Signed-off-by: Dmitry Safonov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- include/linux/tty.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/tty.h b/include/linux/tty.h index fe1b8623a3a1..fe483976b119 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -356,6 +356,7 @@ struct tty_file_private { #define TTY_NO_WRITE_SPLIT 17 /* Preserve write boundaries to driver */ #define TTY_HUPPED 18 /* Post driver->hangup() */ #define TTY_HUPPING 19 /* Hangup in progress */ +#define TTY_LDISC_CHANGING 20 /* Change pending - non-block IO */ #define TTY_LDISC_HALTED 22 /* Line discipline is halted */ /* Values for tty->flow_change */ @@ -373,6 +374,12 @@ static inline void tty_set_flow_change(struct tty_struct *tty, int val) smp_mb(); } +static inline bool tty_io_nonblock(struct tty_struct *tty, struct file *file) +{ + return file->f_flags & O_NONBLOCK || + test_bit(TTY_LDISC_CHANGING, &tty->flags); +} + static inline bool tty_io_error(struct tty_struct *tty) { return test_bit(TTY_IO_ERROR, &tty->flags); -- cgit v1.2.3 From cbf58157fb5a319f28d859cc072d3091d593e507 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 5 Nov 2019 17:44:07 +0100 Subject: jbd2: Fix possible overflow in jbd2_log_space_left() commit add3efdd78b8a0478ce423bb9d4df6bd95e8b335 upstream. When number of free space in the journal is very low, the arithmetic in jbd2_log_space_left() could underflow resulting in very high number of free blocks and thus triggering assertion failure in transaction commit code complaining there's not enough space in the journal: J_ASSERT(journal->j_free > 1); Properly check for the low number of free blocks. CC: stable@vger.kernel.org Reviewed-by: Theodore Ts'o Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20191105164437.32602-1-jack@suse.cz Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- include/linux/jbd2.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d073470cb342..344eb873f6f5 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1560,7 +1560,7 @@ static inline int jbd2_space_needed(journal_t *journal) static inline unsigned long jbd2_log_space_left(journal_t *journal) { /* Allow for rounding errors */ - unsigned long free = journal->j_free - 32; + long free = journal->j_free - 32; if (journal->j_committing_transaction) { unsigned long committing = atomic_read(&journal-> @@ -1569,7 +1569,7 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal) /* Transaction + control blocks */ free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT); } - return free; + return max_t(long, free, 0); } /* -- cgit v1.2.3 From 540b341012ba9b8e2963be3241cea229d4e4804d Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 14 Mar 2019 13:47:59 +0800 Subject: appletalk: Fix potential NULL pointer dereference in unregister_snap_client commit 9804501fa1228048857910a6bf23e085aade37cc upstream. register_snap_client may return NULL, all the callers check it, but only print a warning. This will result in NULL pointer dereference in unregister_snap_client and other places. It has always been used like this since v2.6 Reported-by: Dan Carpenter Signed-off-by: YueHaibing Signed-off-by: David S. Miller [bwh: Backported to <4.15: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/linux/atalk.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/atalk.h b/include/linux/atalk.h index af43ed404ff4..4be0e14b38fc 100644 --- a/include/linux/atalk.h +++ b/include/linux/atalk.h @@ -107,7 +107,7 @@ static __inline__ struct elapaarp *aarp_hdr(struct sk_buff *skb) #define AARP_RESOLVE_TIME (10 * HZ) extern struct datalink_proto *ddp_dl, *aarp_dl; -extern void aarp_proto_init(void); +extern int aarp_proto_init(void); /* Inter module exports */ -- cgit v1.2.3 From 072d24ef6cb1f309f6c45c71e5b0f04a2e9d48a6 Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov Date: Thu, 31 Oct 2019 10:39:20 +0000 Subject: quota: Check that quota is not dirty before release commit df4bb5d128e2c44848aeb36b7ceceba3ac85080d upstream. There is a race window where quota was redirted once we drop dq_list_lock inside dqput(), but before we grab dquot->dq_lock inside dquot_release() TASK1 TASK2 (chowner) ->dqput() we_slept: spin_lock(&dq_list_lock) if (dquot_dirty(dquot)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->write_dquot(dquot); goto we_slept if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->release_dquot(dquot); dqget() mark_dquot_dirty() dqput() goto we_slept; } So dquot dirty quota will be released by TASK1, but on next we_sleept loop we detect this and call ->write_dquot() for it. XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107 Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org CC: stable@vger.kernel.org Signed-off-by: Dmitry Monakhov Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- include/linux/quotaops.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 87733344768c..0a60fe354bc5 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -54,6 +54,16 @@ static inline struct dquot *dqgrab(struct dquot *dquot) atomic_inc(&dquot->dq_count); return dquot; } + +static inline bool dquot_is_busy(struct dquot *dquot) +{ + if (test_bit(DQ_MOD_B, &dquot->dq_flags)) + return true; + if (atomic_read(&dquot->dq_count) > 1) + return true; + return false; +} + void dqput(struct dquot *dquot); int dquot_scan_active(struct super_block *sb, int (*fn)(struct dquot *dquot, unsigned long priv), -- cgit v1.2.3 From 67b02e37c1ba5b7aea48a750731a6731784e8c8c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 5 Dec 2019 20:43:46 -0800 Subject: inet: protect against too small mtu values. [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ] syzbot was once again able to crash a host by setting a very small mtu on loopback device. Let's make inetdev_valid_mtu() available in include/net/ip.h, and use it in ip_setup_cork(), so that we protect both ip_append_page() and __ip_append_data() Also add a READ_ONCE() when the device mtu is read. Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(), even if other code paths might write over this field. Add a big comment in include/linux/netdevice.h about dev->mtu needing READ_ONCE()/WRITE_ONCE() annotations. Hopefully we will add the missing ones in followup patches. [1] refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89 RSP: 0018:ffff88809689f550 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1 R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001 R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40 refcount_add include/linux/refcount.h:193 [inline] skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821 kernel_sendpage+0x92/0xf0 net/socket.c:3794 sock_sendpage+0x8b/0xc0 net/socket.c:936 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636 splice_from_pipe+0x108/0x170 fs/splice.c:671 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842 do_splice_from fs/splice.c:861 [inline] direct_splice_actor+0x123/0x190 fs/splice.c:1035 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441409 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005 RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010 R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180 R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/netdevice.h | 5 +++++ include/net/ip.h | 5 +++++ 2 files changed, 10 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 29ed5977ac04..81c85ba6e2b8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1730,6 +1730,11 @@ struct net_device { unsigned char if_port; unsigned char dma; + /* Note : dev->mtu is often read without holding a lock. + * Writers usually hold RTNL. + * It is recommended to use READ_ONCE() to annotate the reads, + * and to use WRITE_ONCE() to annotate the writes. + */ unsigned int mtu; unsigned short type; unsigned short hard_header_len; diff --git a/include/net/ip.h b/include/net/ip.h index a3c1b9dfc9a1..d577fb5647c5 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -620,4 +620,9 @@ extern int sysctl_icmp_msgs_burst; int ip_misc_proc_init(void); #endif +static inline bool inetdev_valid_mtu(unsigned int mtu) +{ + return likely(mtu >= IPV4_MIN_MTU); +} + #endif /* _IP_H */ -- cgit v1.2.3 From 3b4a534f2a58b7f191bf0189ee55e284f513b23e Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Fri, 6 Dec 2019 12:38:36 +0100 Subject: tcp: fix rejected syncookies due to stale timestamps [ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ] If no synflood happens for a long enough period of time, then the synflood timestamp isn't refreshed and jiffies can advance so much that time_after32() can't accurately compare them any more. Therefore, we can end up in a situation where time_after32(now, last_overflow + HZ) returns false, just because these two values are too far apart. In that case, the synflood timestamp isn't updated as it should be, which can trick tcp_synq_no_recent_overflow() into rejecting valid syncookies. For example, let's consider the following scenario on a system with HZ=1000: * The synflood timestamp is 0, either because that's the timestamp of the last synflood or, more commonly, because we're working with a freshly created socket. * We receive a new SYN, which triggers synflood protection. Let's say that this happens when jiffies == 2147484649 (that is, 'synflood timestamp' + HZ + 2^31 + 1). * Then tcp_synq_overflow() doesn't update the synflood timestamp, because time_after32(2147484649, 1000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 1000: the value of 'last_overflow' + HZ. * A bit later, we receive the ACK completing the 3WHS. But cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow() says that we're not under synflood. That's because time_after32(2147484649, 120000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID. Of course, in reality jiffies would have increased a bit, but this condition will last for the next 119 seconds, which is far enough to accommodate for jiffie's growth. Fix this by updating the overflow timestamp whenever jiffies isn't within the [last_overflow, last_overflow + HZ] range. That shouldn't have any performance impact since the update still happens at most once per second. Now we're guaranteed to have fresh timestamps while under synflood, so tcp_synq_no_recent_overflow() can safely use it with time_after32() in such situations. Stale timestamps can still make tcp_synq_no_recent_overflow() return the wrong verdict when not under synflood. This will be handled in the next patch. For 64 bits architectures, the problem was introduced with the conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1b72c ("tcp: use monotonic timestamps for PAWS"). The problem has always been there on 32 bits architectures. Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/time.h | 12 ++++++++++++ include/net/tcp.h | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/time.h b/include/linux/time.h index 4cea09d94208..60fd50559241 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -275,4 +275,16 @@ static __always_inline void timespec_add_ns(struct timespec *a, u64 ns) a->tv_nsec = ns; } +/** + * time_between32 - check if a 32-bit timestamp is within a given time range + * @t: the time which may be within [l,h] + * @l: the lower bound of the range + * @h: the higher bound of the range + * + * time_before32(t, l, h) returns true if @l <= @t <= @h. All operands are + * treated as 32-bit integers. + * + * Equivalent to !(time_before32(@t, @l) || time_after32(@t, @h)). + */ +#define time_between32(t, l, h) ((u32)(h) - (u32)(l) >= (u32)(t) - (u32)(l)) #endif diff --git a/include/net/tcp.h b/include/net/tcp.h index 23814d997e86..d2962abad6ef 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -497,7 +497,7 @@ static inline void tcp_synq_overflow(const struct sock *sk) unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; unsigned long now = jiffies; - if (time_after(now, last_overflow + HZ)) + if (!time_between32(now, last_overflow, last_overflow + HZ)) tcp_sk(sk)->rx_opt.ts_recent_stamp = now; } -- cgit v1.2.3 From 0c8cd7f6bb8a53b244ecc498dc733204099b6cf3 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Fri, 6 Dec 2019 12:38:43 +0100 Subject: tcp: tighten acceptance of ACKs not matching a child socket [ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ] When no synflood occurs, the synflood timestamp isn't updated. Therefore it can be so old that time_after32() can consider it to be in the future. That's a problem for tcp_synq_no_recent_overflow() as it may report that a recent overflow occurred while, in fact, it's just that jiffies has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31. Spurious detection of recent overflows lead to extra syncookie verification in cookie_v[46]_check(). At that point, the verification should fail and the packet dropped. But we should have dropped the packet earlier as we didn't even send a syncookie. Let's refine tcp_synq_no_recent_overflow() to report a recent overflow only if jiffies is within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This way, no spurious recent overflow is reported when jiffies wraps and 'last_overflow' becomes in the future from the point of view of time_after32(). However, if jiffies wraps and enters the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with 'last_overflow' being a stale synflood timestamp), then tcp_synq_no_recent_overflow() still erroneously reports an overflow. In such cases, we have to rely on syncookie verification to drop the packet. We unfortunately have no way to differentiate between a fresh and a stale syncookie timestamp. In practice, using last_overflow as lower bound is problematic. If the synflood timestamp is concurrently updated between the time we read jiffies and the moment we store the timestamp in 'last_overflow', then 'now' becomes smaller than 'last_overflow' and tcp_synq_no_recent_overflow() returns true, potentially dropping a valid syncookie. Reading jiffies after loading the timestamp could fix the problem, but that'd require a memory barrier. Let's just accommodate for potential timestamp growth instead and extend the interval using 'last_overflow - HZ' as lower bound. Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index d2962abad6ef..c446bfa2d039 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -506,7 +506,15 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - return time_after(jiffies, last_overflow + TCP_SYNCOOKIE_VALID); + /* If last_overflow <= jiffies <= last_overflow + TCP_SYNCOOKIE_VALID, + * then we're under synflood. However, we have to use + * 'last_overflow - HZ' as lower bound. That's because a concurrent + * tcp_synq_overflow() could update .ts_recent_stamp after we read + * jiffies but before we store .ts_recent_stamp into last_overflow, + * which could lead to rejecting a valid syncookie. + */ + return !time_between32(jiffies, last_overflow - HZ, + last_overflow + TCP_SYNCOOKIE_VALID); } static inline u32 tcp_cookie_time(void) -- cgit v1.2.3 From e7a3b025b8c387fa65dc28c50db88642b1add300 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Fri, 6 Dec 2019 12:38:49 +0100 Subject: tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() [ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ] Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the timestamp of the last synflood. Protect them with READ_ONCE() and WRITE_ONCE() since reads and writes aren't serialised. Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from struct tcp_sock"). But unprotected accesses were already there when timestamp was stored in .last_synq_overflow. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index c446bfa2d039..0e3a88f808c6 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -494,17 +494,17 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb); */ static inline void tcp_synq_overflow(const struct sock *sk) { - unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; + unsigned long last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); unsigned long now = jiffies; if (!time_between32(now, last_overflow, last_overflow + HZ)) - tcp_sk(sk)->rx_opt.ts_recent_stamp = now; + WRITE_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp, now); } /* syncookies: no recent synqueue overflow on this listening socket? */ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk) { - unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; + unsigned long last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); /* If last_overflow <= jiffies <= last_overflow + TCP_SYNCOOKIE_VALID, * then we're under synflood. However, we have to use -- cgit v1.2.3 From 56d26381f7ebf943456fd1d81a9d5e47f17de50c Mon Sep 17 00:00:00 2001 From: Sean Paul Date: Thu, 29 Aug 2019 12:52:19 -0400 Subject: drm: mst: Fix query_payload ack reply struct [ Upstream commit 268de6530aa18fe5773062367fd119f0045f6e88 ] Spec says[1] Allocated_PBN is 16 bits [1]- DisplayPort 1.2 Spec, Section 2.11.9.8, Table 2-98 Fixes: ad7f8a1f9ced ("drm/helper: add Displayport multi-stream helper (v0.6)") Cc: Lyude Paul Cc: Todd Previte Cc: Dave Airlie Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Sean Paul Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Reviewed-by: Lyude Paul Signed-off-by: Sean Paul Link: https://patchwork.freedesktop.org/patch/msgid/20190829165223.129662-1-sean@poorly.run Signed-off-by: Sasha Levin --- include/drm/drm_dp_mst_helper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h index 003207670597..c0542de64690 100644 --- a/include/drm/drm_dp_mst_helper.h +++ b/include/drm/drm_dp_mst_helper.h @@ -312,7 +312,7 @@ struct drm_dp_resource_status_notify { struct drm_dp_query_payload_ack_reply { u8 port_number; - u8 allocated_pbn; + u16 allocated_pbn; }; struct drm_dp_sideband_msg_req_body { -- cgit v1.2.3 From fcaaac2e7aa61f08af7aa6d673fcca5edcc931a0 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Tue, 1 Oct 2019 04:56:38 -0300 Subject: media: cec-funcs.h: add status_req checks [ Upstream commit 9b211f9c5a0b67afc435b86f75d78273b97db1c5 ] The CEC_MSG_GIVE_DECK_STATUS and CEC_MSG_GIVE_TUNER_DEVICE_STATUS commands both have a status_req argument: ON, OFF, ONCE. If ON or ONCE, then the follower will reply with a STATUS message. Either once or whenever the status changes (status_req == ON). If status_req == OFF, then it will stop sending continuous status updates, but the follower will *not* send a STATUS message in that case. This means that if status_req == OFF, then msg->reply should be 0 as well since no reply is expected in that case. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- include/linux/cec-funcs.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/cec-funcs.h b/include/linux/cec-funcs.h index 138bbf721e70..a844749a2855 100644 --- a/include/linux/cec-funcs.h +++ b/include/linux/cec-funcs.h @@ -956,7 +956,8 @@ static inline void cec_msg_give_deck_status(struct cec_msg *msg, msg->len = 3; msg->msg[1] = CEC_MSG_GIVE_DECK_STATUS; msg->msg[2] = status_req; - msg->reply = reply ? CEC_MSG_DECK_STATUS : 0; + msg->reply = (reply && status_req != CEC_OP_STATUS_REQ_OFF) ? + CEC_MSG_DECK_STATUS : 0; } static inline void cec_ops_give_deck_status(const struct cec_msg *msg, @@ -1060,7 +1061,8 @@ static inline void cec_msg_give_tuner_device_status(struct cec_msg *msg, msg->len = 3; msg->msg[1] = CEC_MSG_GIVE_TUNER_DEVICE_STATUS; msg->msg[2] = status_req; - msg->reply = reply ? CEC_MSG_TUNER_DEVICE_STATUS : 0; + msg->reply = (reply && status_req != CEC_OP_STATUS_REQ_OFF) ? + CEC_MSG_TUNER_DEVICE_STATUS : 0; } static inline void cec_ops_give_tuner_device_status(const struct cec_msg *msg, -- cgit v1.2.3 From 5cf2e755be098036a343ebe11a8bc5f8934367d6 Mon Sep 17 00:00:00 2001 From: Russell King Date: Thu, 19 Dec 2019 23:24:47 +0000 Subject: mod_devicetable: fix PHY module format [ Upstream commit d2ed49cf6c13e379c5819aa5ac20e1f9674ebc89 ] When a PHY is probed, if the top bit is set, we end up requesting a module with the string "mdio:-10101110000000100101000101010001" - the top bit is printed to a signed -1 value. This leads to the module not being loaded. Fix the module format string and the macro generating the values for it to ensure that we only print unsigned types and the top bit is always 0/1. We correctly end up with "mdio:10101110000000100101000101010001". Fixes: 8626d3b43280 ("phylib: Support phy module autoloading") Reviewed-by: Andrew Lunn Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/mod_devicetable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index ed84c07f6a51..1abfe37314a0 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -502,9 +502,9 @@ struct platform_device_id { #define MDIO_MODULE_PREFIX "mdio:" -#define MDIO_ID_FMT "%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d" +#define MDIO_ID_FMT "%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u%u" #define MDIO_ID_ARGS(_id) \ - (_id)>>31, ((_id)>>30) & 1, ((_id)>>29) & 1, ((_id)>>28) & 1, \ + ((_id)>>31) & 1, ((_id)>>30) & 1, ((_id)>>29) & 1, ((_id)>>28) & 1, \ ((_id)>>27) & 1, ((_id)>>26) & 1, ((_id)>>25) & 1, ((_id)>>24) & 1, \ ((_id)>>23) & 1, ((_id)>>22) & 1, ((_id)>>21) & 1, ((_id)>>20) & 1, \ ((_id)>>19) & 1, ((_id)>>18) & 1, ((_id)>>17) & 1, ((_id)>>16) & 1, \ -- cgit v1.2.3 From 3420bd514a701b344146623a7800f88ac5dbbd07 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 20 Dec 2019 14:31:40 +0100 Subject: net: dst: Force 4-byte alignment of dst_metrics [ Upstream commit 258a980d1ec23e2c786e9536a7dd260bea74bae6 ] When storing a pointer to a dst_metrics structure in dst_entry._metrics, two flags are added in the least significant bits of the pointer value. Hence this assumes all pointers to dst_metrics structures have at least 4-byte alignment. However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not 4 bytes. Hence in some kernel builds, dst_default_metrics may be only 2-byte aligned, leading to obscure boot warnings like: WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261 Stack from 10835e6c: 10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea 00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000 04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c 00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001 003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84 003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a Call Trace: [<00023fa6>] __warn+0xb2/0xb4 [<00023fea>] warn_slowpath_fmt+0x42/0x64 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<00043aa8>] printk+0x0/0x18 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e [<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e [<00025e84>] __local_bh_enable_ip+0x0/0x8e [<002416a8>] dst_destroy+0x40/0xae Fix this by forcing 4-byte alignment of all dst_metrics structures. Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely") Signed-off-by: Geert Uytterhoeven Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/dst.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/dst.h b/include/net/dst.h index ddcff17615da..e57e8fb9a43d 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -110,7 +110,7 @@ struct dst_entry { struct dst_metrics { u32 metrics[RTAX_MAX]; atomic_t refcnt; -}; +} __aligned(4); /* Low pointer bits contain DST_METRICS_FLAGS */ extern const struct dst_metrics dst_default_metrics; u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old); -- cgit v1.2.3 From 4cfee666f088d635700316ea28bbc0bfa2bf89c8 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Sun, 10 Nov 2019 12:49:06 +0300 Subject: fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long [ Upstream commit 6fcbcec9cfc7b3c6a2c1f1a23ebacedff7073e0a ] Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu fractions as signed 64-bit int, filters negative values and then reports lower half as signed 32-bit int. Result may looks like: fs.quota.allocated_dquots = 22327 fs.quota.cache_hits = -489852115 fs.quota.drops = -487288718 fs.quota.free_dquots = 22083 fs.quota.lookups = -486883485 fs.quota.reads = 22327 fs.quota.syncs = 335064 fs.quota.writes = 3088689 Values bigger than 2^31-1 reported as negative. All counters except "allocated_dquots" and "free_dquots" are monotonic, thus they should be reported as is without filtering negative values. Kernel doesn't have generic helper for 64-bit sysctl yet, let's use at least unsigned long. Link: https://lore.kernel.org/r/157337934693.2078.9842146413181153727.stgit@buzz Signed-off-by: Konstantin Khlebnikov Signed-off-by: Jan Kara Signed-off-by: Sasha Levin --- include/linux/quota.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/quota.h b/include/linux/quota.h index 55107a8ff887..23eb8ea07def 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -263,7 +263,7 @@ enum { }; struct dqstats { - int stat[_DQST_DQSTAT_LAST]; + unsigned long stat[_DQST_DQSTAT_LAST]; struct percpu_counter counter[_DQST_DQSTAT_LAST]; }; -- cgit v1.2.3 From 6d94f0de742a581bcc6dda31e7d114df6a98e9de Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 13 Nov 2019 14:05:08 -0800 Subject: scsi: target: iscsi: Wait for all commands to finish before freeing a session [ Upstream commit e9d3009cb936bd0faf0719f68d98ad8afb1e613b ] The iSCSI target driver is the only target driver that does not wait for ongoing commands to finish before freeing a session. Make the iSCSI target driver wait for ongoing commands to finish before freeing a session. This patch fixes the following KASAN complaint: BUG: KASAN: use-after-free in __lock_acquire+0xb1a/0x2710 Read of size 8 at addr ffff8881154eca70 by task kworker/0:2/247 CPU: 0 PID: 247 Comm: kworker/0:2 Not tainted 5.4.0-rc1-dbg+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: target_completion target_complete_ok_work [target_core_mod] Call Trace: dump_stack+0x8a/0xd6 print_address_description.constprop.0+0x40/0x60 __kasan_report.cold+0x1b/0x33 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 __lock_acquire+0xb1a/0x2710 lock_acquire+0xd3/0x200 _raw_spin_lock_irqsave+0x43/0x60 target_release_cmd_kref+0x162/0x7f0 [target_core_mod] target_put_sess_cmd+0x2e/0x40 [target_core_mod] lio_check_stop_free+0x12/0x20 [iscsi_target_mod] transport_cmd_check_stop_to_fabric+0xd8/0xe0 [target_core_mod] target_complete_ok_work+0x1b0/0x790 [target_core_mod] process_one_work+0x549/0xa40 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 Allocated by task 889: save_stack+0x23/0x90 __kasan_kmalloc.constprop.0+0xcf/0xe0 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0xf6/0x360 transport_alloc_session+0x29/0x80 [target_core_mod] iscsi_target_login_thread+0xcd6/0x18f0 [iscsi_target_mod] kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 Freed by task 1025: save_stack+0x23/0x90 __kasan_slab_free+0x13a/0x190 kasan_slab_free+0x12/0x20 kmem_cache_free+0x146/0x400 transport_free_session+0x179/0x2f0 [target_core_mod] transport_deregister_session+0x130/0x180 [target_core_mod] iscsit_close_session+0x12c/0x350 [iscsi_target_mod] iscsit_logout_post_handler+0x136/0x380 [iscsi_target_mod] iscsit_response_queue+0x8de/0xbe0 [iscsi_target_mod] iscsi_target_tx_thread+0x27f/0x370 [iscsi_target_mod] kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 The buggy address belongs to the object at ffff8881154ec9c0 which belongs to the cache se_sess_cache of size 352 The buggy address is located 176 bytes inside of 352-byte region [ffff8881154ec9c0, ffff8881154ecb20) The buggy address belongs to the page: page:ffffea0004553b00 refcount:1 mapcount:0 mapping:ffff888101755400 index:0x0 compound_mapcount: 0 flags: 0x2fff000000010200(slab|head) raw: 2fff000000010200 dead000000000100 dead000000000122 ffff888101755400 raw: 0000000000000000 0000000080130013 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881154ec900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881154ec980: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8881154eca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881154eca80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881154ecb00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc Cc: Mike Christie Link: https://lore.kernel.org/r/20191113220508.198257-3-bvanassche@acm.org Reviewed-by: Roman Bolshakov Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- include/scsi/iscsi_proto.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h index c1260d80ef30..1a2ae0862e23 100644 --- a/include/scsi/iscsi_proto.h +++ b/include/scsi/iscsi_proto.h @@ -638,6 +638,7 @@ struct iscsi_reject { #define ISCSI_REASON_BOOKMARK_INVALID 9 #define ISCSI_REASON_BOOKMARK_NO_RESOURCES 10 #define ISCSI_REASON_NEGOTIATION_RESET 11 +#define ISCSI_REASON_WAITING_FOR_LOGOUT 12 /* Max. number of Key=Value pairs in a text message */ #define MAX_KEY_VALUE_PAIRS 8192 -- cgit v1.2.3 From 6a60df8ec13270fc14f5e16ac67f3e4a029c82f6 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 13 Nov 2019 16:12:02 +0900 Subject: libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h [ Upstream commit a8de1304b7df30e3a14f2a8b9709bb4ff31a0385 ] The DTC v1.5.1 added references to (U)INT32_MAX. This is no problem for user-space programs since defines (U)INT32_MAX along with (u)int32_t. For the kernel space, libfdt_env.h needs to be adjusted before we pull in the changes. In the kernel, we usually use s/u32 instead of (u)int32_t for the fixed-width types. Accordingly, we already have S/U32_MAX for their max values. So, we should not add (U)INT32_MAX to any more. Instead, add them to the in-kernel libfdt_env.h to compile the latest libfdt. Signed-off-by: Masahiro Yamada Signed-off-by: Rob Herring Signed-off-by: Sasha Levin --- include/linux/libfdt_env.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/libfdt_env.h b/include/linux/libfdt_env.h index 8850e243c940..bd0a55821177 100644 --- a/include/linux/libfdt_env.h +++ b/include/linux/libfdt_env.h @@ -6,6 +6,9 @@ #include +#define INT32_MAX S32_MAX +#define UINT32_MAX U32_MAX + typedef __be16 fdt16_t; typedef __be32 fdt32_t; typedef __be64 fdt64_t; -- cgit v1.2.3 From 190d14f3ddc66f1cfe7f48e83f91d2e3a8b12372 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 6 Nov 2019 09:48:04 -0800 Subject: hrtimer: Annotate lockless access to timer->state commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream. syzbot reported various data-race caused by hrtimer_is_queued() reading timer->state. A READ_ONCE() is required there to silence the warning. Also add the corresponding WRITE_ONCE() when timer->state is set. In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid loading timer->state twice. KCSAN reported these cases: BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1: tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline] tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1: __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline] tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 __sys_sendto+0x21f/0x320 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto net/socket.c:1960 [inline] __x64_sys_sendto+0x89/0xb0 net/socket.c:1960 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ tglx: Added comments ] Reported-by: syzbot Signed-off-by: Eric Dumazet Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/hrtimer.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 5e00f80b1535..5ae70d2f9a2f 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -424,12 +424,18 @@ extern u64 hrtimer_get_next_event(void); extern bool hrtimer_active(const struct hrtimer *timer); -/* - * Helper function to check, whether the timer is on one of the queues +/** + * hrtimer_is_queued = check, whether the timer is on one of the queues + * @timer: Timer to check + * + * Returns: True if the timer is queued, false otherwise + * + * The function can be used lockless, but it gives only a current snapshot. */ -static inline int hrtimer_is_queued(struct hrtimer *timer) +static inline bool hrtimer_is_queued(struct hrtimer *timer) { - return timer->state & HRTIMER_STATE_ENQUEUED; + /* The READ_ONCE pairs with the update functions of timer->state */ + return !!(READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED); } /* -- cgit v1.2.3 From 792365bfcf3c753a792a4150314d24dd6a7721ea Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 13 Dec 2019 18:20:41 -0800 Subject: tcp/dccp: fix possible race __inet_lookup_established() commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 upstream. Michal Kubecek and Firo Yang did a very nice analysis of crashes happening in __inet_lookup_established(). Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN (via a close()/socket()/listen() cycle) without a RCU grace period, I should not have changed listeners linkage in their hash table. They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt), so that a lookup can detect a socket in a hash list was moved in another one. Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix"), we have to add hlist_nulls_add_tail_rcu() helper. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet Reported-by: Michal Kubecek Reported-by: Firo Yang Reviewed-by: Michal Kubecek Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/ Signed-off-by: Jakub Kicinski [stable-4.9: we also need to update code in __inet_lookup_listener() and inet6_lookup_listener() which has been removed in 5.0-rc1.] Signed-off-by: Michal Kubecek Signed-off-by: Greg Kroah-Hartman --- include/linux/rculist_nulls.h | 37 +++++++++++++++++++++++++++++++++++++ include/net/inet_hashtables.h | 12 +++++++++--- include/net/sock.h | 5 +++++ 3 files changed, 51 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h index 2720b2fbfb86..106f4e0d7bd3 100644 --- a/include/linux/rculist_nulls.h +++ b/include/linux/rculist_nulls.h @@ -99,6 +99,43 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, first->pprev = &n->next; } +/** + * hlist_nulls_add_tail_rcu + * @n: the element to add to the hash list. + * @h: the list to add to. + * + * Description: + * Adds the specified element to the specified hlist_nulls, + * while permitting racing traversals. + * + * The caller must take whatever precautions are necessary + * (such as holding appropriate locks) to avoid racing + * with another list-mutation primitive, such as hlist_nulls_add_head_rcu() + * or hlist_nulls_del_rcu(), running on this same list. + * However, it is perfectly legal to run concurrently with + * the _rcu list-traversal primitives, such as + * hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency + * problems on Alpha CPUs. Regardless of the type of CPU, the + * list-traversal primitive must be guarded by rcu_read_lock(). + */ +static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, + struct hlist_nulls_head *h) +{ + struct hlist_nulls_node *i, *last = NULL; + + /* Note: write side code, so rcu accessors are not needed. */ + for (i = h->first; !is_a_nulls(i); i = i->next) + last = i; + + if (last) { + n->next = last->next; + n->pprev = &last->next; + rcu_assign_pointer(hlist_next_rcu(last), n); + } else { + hlist_nulls_add_head_rcu(n, h); + } +} + /** * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type * @tpos: the type * to use as a loop cursor. diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 0574493e3899..fc445e7ccadf 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -98,12 +98,18 @@ struct inet_bind_hashbucket { struct hlist_head chain; }; -/* - * Sockets can be hashed in established or listening table +/* Sockets can be hashed in established or listening table. + * We must use different 'nulls' end-of-chain value for all hash buckets : + * A socket might transition from ESTABLISH to LISTEN state without + * RCU grace period. A lookup in ehash table needs to handle this case. */ +#define LISTENING_NULLS_BASE (1U << 29) struct inet_listen_hashbucket { spinlock_t lock; - struct hlist_head head; + union { + struct hlist_head head; + struct hlist_nulls_head nulls_head; + }; }; /* This is for listening sockets, thus all sockets which possess wildcards. */ diff --git a/include/net/sock.h b/include/net/sock.h index aed436567d70..d6bce19ca261 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -661,6 +661,11 @@ static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_h hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); } +static inline void __sk_nulls_add_node_tail_rcu(struct sock *sk, struct hlist_nulls_head *list) +{ + hlist_nulls_add_tail_rcu(&sk->sk_nulls_node, list); +} + static inline void sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { sock_hold(sk); -- cgit v1.2.3 From 189ca92688236a1f72fb5a088e9bb10cd2b7ba15 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 5 Dec 2019 12:54:49 +0100 Subject: dmaengine: Fix access to uninitialized dma_slave_caps commit 53a256a9b925b47c7e67fc1f16ca41561a7b877c upstream. dmaengine_desc_set_reuse() allocates a struct dma_slave_caps on the stack, populates it using dma_get_slave_caps() and then accesses one of its members. However dma_get_slave_caps() may fail and this isn't accounted for, leading to a legitimate warning of gcc-4.9 (but not newer versions): In file included from drivers/spi/spi-bcm2835.c:19:0: drivers/spi/spi-bcm2835.c: In function 'dmaengine_desc_set_reuse': >> include/linux/dmaengine.h:1370:10: warning: 'caps.descriptor_reuse' is used uninitialized in this function [-Wuninitialized] if (caps.descriptor_reuse) { Fix it, thereby also silencing the gcc-4.9 warning. The issue has been present for 4 years but surfaces only now that the first caller of dmaengine_desc_set_reuse() has been added in spi-bcm2835.c. Another user of reusable DMA descriptors has existed for a while in pxa_camera.c, but it sets the DMA_CTRL_REUSE flag directly instead of calling dmaengine_desc_set_reuse(). Nevertheless, tag this commit for stable in case there are out-of-tree users. Fixes: 272420214d26 ("dmaengine: Add DMA_CTRL_REUSE") Reported-by: kbuild test robot Signed-off-by: Lukas Wunner Cc: stable@vger.kernel.org # v4.3+ Link: https://lore.kernel.org/r/ca92998ccc054b4f2bfd60ef3adbab2913171eac.1575546234.git.lukas@wunner.de Signed-off-by: Vinod Koul Signed-off-by: Greg Kroah-Hartman --- include/linux/dmaengine.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index cc535a478bae..710f269764dc 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -1358,8 +1358,11 @@ static inline int dma_get_slave_caps(struct dma_chan *chan, static inline int dmaengine_desc_set_reuse(struct dma_async_tx_descriptor *tx) { struct dma_slave_caps caps; + int ret; - dma_get_slave_caps(tx->chan, &caps); + ret = dma_get_slave_caps(tx->chan, &caps); + if (ret) + return ret; if (caps.descriptor_reuse) { tx->flags |= DMA_CTRL_REUSE; -- cgit v1.2.3 From 14167f448e2ebcbd8024e0dfac1a484cba69f63d Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 10 Dec 2019 10:53:44 -0800 Subject: ata: libahci_platform: Export again ahci_platform_able_phys() commit 84b032dbfdf1c139cd2b864e43959510646975f8 upstream. This reverts commit 6bb86fefa086faba7b60bb452300b76a47cde1a5 ("libahci_platform: Staticize ahci_platform_able_phys()") we are going to need ahci_platform_{enable,disable}_phys() in a subsequent commit for ahci_brcm.c in order to properly control the PHY initialization order. Also make sure the function prototypes are declared in include/linux/ahci_platform.h as a result. Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede Signed-off-by: Florian Fainelli Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/ahci_platform.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/ahci_platform.h b/include/linux/ahci_platform.h index a270f25ee7c7..1a527e40d601 100644 --- a/include/linux/ahci_platform.h +++ b/include/linux/ahci_platform.h @@ -23,6 +23,8 @@ struct ahci_host_priv; struct platform_device; struct scsi_host_template; +int ahci_platform_enable_phys(struct ahci_host_priv *hpriv); +void ahci_platform_disable_phys(struct ahci_host_priv *hpriv); int ahci_platform_enable_clks(struct ahci_host_priv *hpriv); void ahci_platform_disable_clks(struct ahci_host_priv *hpriv); int ahci_platform_enable_regulators(struct ahci_host_priv *hpriv); -- cgit v1.2.3 From bb7f07bbad4dffe388a46731639d0be4e5bea530 Mon Sep 17 00:00:00 2001 From: Stephan Gerhold Date: Wed, 6 Nov 2019 18:31:24 +0100 Subject: regulator: ab8500: Remove AB8505 USB regulator commit 99c4f70df3a6446c56ca817c2d0f9c12d85d4e7c upstream. The USB regulator was removed for AB8500 in commit 41a06aa738ad ("regulator: ab8500: Remove USB regulator"). It was then added for AB8505 in commit 547f384f33db ("regulator: ab8500: add support for ab8505"). However, there was never an entry added for it in ab8505_regulator_match. This causes all regulators after it to be initialized with the wrong device tree data, eventually leading to an out-of-bounds array read. Given that it is not used anywhere in the kernel, it seems likely that similar arguments against supporting it exist for AB8505 (it is controlled by hardware). Therefore, simply remove it like for AB8500 instead of adding an entry in ab8505_regulator_match. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij Signed-off-by: Stephan Gerhold Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.net Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- include/linux/regulator/ab8500.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/regulator/ab8500.h b/include/linux/regulator/ab8500.h index d8ecefaf63ca..260c4aa1d976 100644 --- a/include/linux/regulator/ab8500.h +++ b/include/linux/regulator/ab8500.h @@ -38,7 +38,6 @@ enum ab8505_regulator_id { AB8505_LDO_AUX6, AB8505_LDO_INTCORE, AB8505_LDO_ADC, - AB8505_LDO_USB, AB8505_LDO_AUDIO, AB8505_LDO_ANAMIC1, AB8505_LDO_ANAMIC2, -- cgit v1.2.3 From 36d6d826d57173ec5e29df85942ca7055d555020 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 7 Nov 2019 18:29:11 -0800 Subject: net: add annotations on hh->hh_len lockless accesses [ Upstream commit c305c6ae79e2ce20c22660ceda94f0d86d639a82 ] KCSAN reported a data-race [1] While we can use READ_ONCE() on the read sides, we need to make sure hh->hh_len is written last. [1] BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0: eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247 neigh_hh_init net/core/neighbour.c:1463 [inline] neigh_resolve_output net/core/neighbour.c:1480 [inline] neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116 __ip6_finish_output net/ipv6/ip6_output.c:142 [inline] __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127 ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505 ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647 rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1: neigh_resolve_output net/core/neighbour.c:1479 [inline] neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116 __ip6_finish_output net/ipv6/ip6_output.c:142 [inline] __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127 ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505 ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647 rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events rt6_probe_deferred Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/neighbour.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 1c0d07376125..a68a460fa4f3 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -454,7 +454,7 @@ static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb do { seq = read_seqbegin(&hh->hh_lock); - hh_len = hh->hh_len; + hh_len = READ_ONCE(hh->hh_len); if (likely(hh_len <= HH_DATA_MOD)) { hh_alen = HH_DATA_MOD; -- cgit v1.2.3 From 714b98566f4a85e9ff7ed5151bf8dd4b5cdff85d Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Thu, 5 Dec 2019 13:35:11 +0100 Subject: netfilter: uapi: Avoid undefined left-shift in xt_sctp.h [ Upstream commit 164166558aacea01b99c8c8ffb710d930405ba69 ] With 'bytes(__u32)' being 32, a left-shift of 31 may happen which is undefined for the signed 32-bit value 1. Avoid this by declaring 1 as unsigned. Signed-off-by: Phil Sutter Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- include/uapi/linux/netfilter/xt_sctp.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter/xt_sctp.h b/include/uapi/linux/netfilter/xt_sctp.h index 58ffcfb7978e..c2b0886c7c25 100644 --- a/include/uapi/linux/netfilter/xt_sctp.h +++ b/include/uapi/linux/netfilter/xt_sctp.h @@ -40,19 +40,19 @@ struct xt_sctp_info { #define SCTP_CHUNKMAP_SET(chunkmap, type) \ do { \ (chunkmap)[type / bytes(__u32)] |= \ - 1 << (type % bytes(__u32)); \ + 1u << (type % bytes(__u32)); \ } while (0) #define SCTP_CHUNKMAP_CLEAR(chunkmap, type) \ do { \ (chunkmap)[type / bytes(__u32)] &= \ - ~(1 << (type % bytes(__u32))); \ + ~(1u << (type % bytes(__u32))); \ } while (0) #define SCTP_CHUNKMAP_IS_SET(chunkmap, type) \ ({ \ ((chunkmap)[type / bytes (__u32)] & \ - (1 << (type % bytes (__u32)))) ? 1: 0; \ + (1u << (type % bytes (__u32)))) ? 1: 0; \ }) #define SCTP_CHUNKMAP_RESET(chunkmap) \ -- cgit v1.2.3 From 9b266c6c12b055d51f5004e9b7285a4c97627311 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Jan 2020 12:30:48 -0800 Subject: macvlan: do not assume mac_header is set in macvlan_broadcast() [ Upstream commit 96cc4b69581db68efc9749ef32e9cf8e0160c509 ] Use of eth_hdr() in tx path is error prone. Many drivers call skb_reset_mac_header() before using it, but others do not. Commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") attempted to fix this generically, but commit d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") brought back the macvlan bug. Lets add a new helper, so that tx paths no longer have to call skb_reset_mac_header() only to get a pointer to skb->data. Hopefully we will be able to revert 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") and save few cycles in transmit fast path. BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline] BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579 CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] mc_hash drivers/net/macvlan.c:251 [inline] macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline] macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559 __netdev_start_xmit include/linux/netdevice.h:4447 [inline] netdev_start_xmit include/linux/netdevice.h:4461 [inline] dev_direct_xmit+0x419/0x630 net/core/dev.c:4079 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240 packet_snd net/packet/af_packet.c:2966 [inline] packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 __sys_sendto+0x262/0x380 net/socket.c:1985 __do_sys_sendto net/socket.c:1997 [inline] __se_sys_sendto net/socket.c:1993 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x442639 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639 RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:561 [inline] tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a4932000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1025 bytes inside of 4096-byte region [ffff8880a4932000, ffff8880a4933000) The buggy address belongs to the page: page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000 raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: b863ceb7ddce ("[NET]: Add macvlan driver") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/if_ether.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index 548fd535fd02..d433f5e292c9 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -28,6 +28,14 @@ static inline struct ethhdr *eth_hdr(const struct sk_buff *skb) return (struct ethhdr *)skb_mac_header(skb); } +/* Prefer this version in TX path, instead of + * skb_reset_mac_header() + eth_hdr() + */ +static inline struct ethhdr *skb_eth_hdr(const struct sk_buff *skb) +{ + return (struct ethhdr *)skb->data; +} + static inline struct ethhdr *inner_eth_hdr(const struct sk_buff *skb) { return (struct ethhdr *)skb_inner_mac_header(skb); -- cgit v1.2.3 From 475c1471f11a455a5332879cb28a3ae03d353598 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Mar 2017 01:37:01 +0100 Subject: kobject: Export kobject_get_unless_zero() commit c70c176ff8c3ff0ac6ef9a831cd591ea9a66bd1a upstream. Make the function available for outside use and fortify it against NULL kobject. CC: Greg Kroah-Hartman Reviewed-by: Bart Van Assche Acked-by: Tejun Heo Signed-off-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/kobject.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/kobject.h b/include/linux/kobject.h index 5957c6a3fd7f..d9d4485ebad2 100644 --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -108,6 +108,8 @@ extern int __must_check kobject_rename(struct kobject *, const char *new_name); extern int __must_check kobject_move(struct kobject *, struct kobject *); extern struct kobject *kobject_get(struct kobject *kobj); +extern struct kobject * __must_check kobject_get_unless_zero( + struct kobject *kobj); extern void kobject_put(struct kobject *kobj); extern const void *kobject_namespace(struct kobject *kobj); -- cgit v1.2.3 From b6b09649213b7ee081a08166128bc41204c5b441 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Sat, 7 Dec 2019 19:34:18 +0100 Subject: can: can_dropped_invalid_skb(): ensure an initialized headroom in outgoing CAN sk_buffs commit e7153bf70c3496bac00e7e4f395bb8d8394ac0ea upstream. KMSAN sysbot detected a read access to an untinitialized value in the headroom of an outgoing CAN related sk_buff. When using CAN sockets this area is filled appropriately - but when using a packet socket this initialization is missing. The problematic read access occurs in the CAN receive path which can only be triggered when the sk_buff is sent through a (virtual) CAN interface. So we check in the sending path whether we need to perform the missing initializations. Fixes: d3b58c47d330d ("can: replace timestamp as unique skb attribute") Reported-by: syzbot+b02ff0707a97e4e79ebb@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp Tested-by: Oliver Hartkopp Cc: linux-stable # >= v4.1 Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- include/linux/can/dev.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index f7178f44825b..d2a497950639 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -17,6 +17,7 @@ #include #include #include +#include #include /* @@ -81,6 +82,36 @@ struct can_priv { #define get_can_dlc(i) (min_t(__u8, (i), CAN_MAX_DLC)) #define get_canfd_dlc(i) (min_t(__u8, (i), CANFD_MAX_DLC)) +/* Check for outgoing skbs that have not been created by the CAN subsystem */ +static inline bool can_skb_headroom_valid(struct net_device *dev, + struct sk_buff *skb) +{ + /* af_packet creates a headroom of HH_DATA_MOD bytes which is fine */ + if (WARN_ON_ONCE(skb_headroom(skb) < sizeof(struct can_skb_priv))) + return false; + + /* af_packet does not apply CAN skb specific settings */ + if (skb->ip_summed == CHECKSUM_NONE) { + /* init headroom */ + can_skb_prv(skb)->ifindex = dev->ifindex; + can_skb_prv(skb)->skbcnt = 0; + + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* preform proper loopback on capable devices */ + if (dev->flags & IFF_ECHO) + skb->pkt_type = PACKET_LOOPBACK; + else + skb->pkt_type = PACKET_HOST; + + skb_reset_mac_header(skb); + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + } + + return true; +} + /* Drop a given socketbuffer if it does not contain a valid CAN frame. */ static inline bool can_dropped_invalid_skb(struct net_device *dev, struct sk_buff *skb) @@ -98,6 +129,9 @@ static inline bool can_dropped_invalid_skb(struct net_device *dev, } else goto inval_skb; + if (!can_skb_headroom_valid(dev, skb)) + goto inval_skb; + return false; inval_skb: -- cgit v1.2.3 From cbad0dc270e7e2c8cf3b28e5daeac42edc399ad0 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 14 May 2019 15:41:42 -0700 Subject: fs/select: avoid clang stack usage warning commit ad312f95d41c9de19313c51e388c4984451c010f upstream. The select() implementation is carefully tuned to put a sensible amount of data on the stack for holding a copy of the user space fd_set, but not too large to risk overflowing the kernel stack. When building a 32-bit kernel with clang, we need a little more space than with gcc, which often triggers a warning: fs/select.c:619:5: error: stack frame size of 1048 bytes in function 'core_sys_select' [-Werror,-Wframe-larger-than=] int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, I experimentally found that for 32-bit ARM, reducing the maximum stack usage by 64 bytes keeps us reliably under the warning limit again. Link: http://lkml.kernel.org/r/20190307090146.1874906-1-arnd@arndb.de Signed-off-by: Arnd Bergmann Reviewed-by: Andi Kleen Cc: Nick Desaulniers Cc: Alexander Viro Cc: Christoph Hellwig Cc: Eric Dumazet Cc: "Darrick J. Wong" Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Miles Chen Signed-off-by: Greg Kroah-Hartman --- include/linux/poll.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/poll.h b/include/linux/poll.h index 37b057b63b46..4bee398ed2fa 100644 --- a/include/linux/poll.h +++ b/include/linux/poll.h @@ -14,7 +14,11 @@ extern struct ctl_table epoll_table[]; /* for sysctl */ /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating additional memory. */ +#ifdef __clang__ +#define MAX_STACK_ALLOC 768 +#else #define MAX_STACK_ALLOC 832 +#endif #define FRONTEND_STACK_ALLOC 256 #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC -- cgit v1.2.3 From ab567894b3c1b80b6aa3df2eb85e5528c2834fdd Mon Sep 17 00:00:00 2001 From: Dedy Lansky Date: Sun, 29 Jul 2018 14:59:16 +0300 Subject: cfg80211/mac80211: make ieee80211_send_layer2_update a public function commit 30ca1aa536211f5ac3de0173513a7a99a98a97f3 upstream. Make ieee80211_send_layer2_update() a common function so other drivers can re-use it. Signed-off-by: Dedy Lansky Signed-off-by: Johannes Berg [bwh: Backported to 4.9 as dependency of commit 3e493173b784 "mac80211: Do not send Layer 2 Update frame before authorization": - Retain type-casting of skb_put() return value - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- include/net/cfg80211.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 9d57639223c3..28a0d7a8c142 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -4181,6 +4181,17 @@ static inline const u8 *cfg80211_find_ie(u8 eid, const u8 *ies, int len) const u8 *cfg80211_find_vendor_ie(unsigned int oui, int oui_type, const u8 *ies, int len); +/** + * cfg80211_send_layer2_update - send layer 2 update frame + * + * @dev: network device + * @addr: STA MAC address + * + * Wireless drivers can use this function to update forwarding tables in bridge + * devices upon STA association. + */ +void cfg80211_send_layer2_update(struct net_device *dev, const u8 *addr); + /** * DOC: Regulatory enforcement infrastructure * -- cgit v1.2.3 From 1a087b517c3acb8330d5f9da5ab59ee080492aeb Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Sat, 30 Nov 2019 19:53:37 +0100 Subject: dt-bindings: reset: meson8b: fix duplicate reset IDs commit 4881873f4cc1460f63d85fa81363d56be328ccdc upstream. According to the public S805 datasheet the RESET2 register uses the following bits for the PIC_DC, PSC and NAND reset lines: - PIC_DC is at bit 3 (meaning: RESET_VD_RMEM + 3) - PSC is at bit 4 (meaning: RESET_VD_RMEM + 4) - NAND is at bit 5 (meaning: RESET_VD_RMEM + 4) Update the reset IDs of these three reset lines so they don't conflict with PIC_DC and map to the actual hardware reset lines. Fixes: 79795e20a184eb ("dt-bindings: reset: Add bindings for the Meson SoC Reset Controller") Signed-off-by: Martin Blumenstingl Signed-off-by: Kevin Hilman Signed-off-by: Greg Kroah-Hartman --- include/dt-bindings/reset/amlogic,meson8b-reset.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/dt-bindings/reset/amlogic,meson8b-reset.h b/include/dt-bindings/reset/amlogic,meson8b-reset.h index 614aff2c7aff..a03e86fe2c57 100644 --- a/include/dt-bindings/reset/amlogic,meson8b-reset.h +++ b/include/dt-bindings/reset/amlogic,meson8b-reset.h @@ -95,9 +95,9 @@ #define RESET_VD_RMEM 64 #define RESET_AUDIN 65 #define RESET_DBLK 66 -#define RESET_PIC_DC 66 -#define RESET_PSC 66 -#define RESET_NAND 66 +#define RESET_PIC_DC 67 +#define RESET_PSC 68 +#define RESET_NAND 69 #define RESET_GE2D 70 #define RESET_PARSER_REG 71 #define RESET_PARSER_FETCH 72 -- cgit v1.2.3 From 5dbde467ccd6c401ca35acf0c57296d1ffaa06f3 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 15 Jan 2020 08:35:25 -0500 Subject: block: fix an integer overflow in logical block size commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream. Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen Reviewed-by: Ming Lei Signed-off-by: Mikulas Patocka Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/blkdev.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 7582a5712458..2fc4ba6fa07f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -277,6 +277,7 @@ struct queue_limits { unsigned int max_sectors; unsigned int max_segment_size; unsigned int physical_block_size; + unsigned int logical_block_size; unsigned int alignment_offset; unsigned int io_min; unsigned int io_opt; @@ -286,7 +287,6 @@ struct queue_limits { unsigned int discard_granularity; unsigned int discard_alignment; - unsigned short logical_block_size; unsigned short max_segments; unsigned short max_integrity_segments; @@ -996,7 +996,7 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); extern void blk_queue_max_write_same_sectors(struct request_queue *q, unsigned int max_write_same_sectors); -extern void blk_queue_logical_block_size(struct request_queue *, unsigned short); +extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); @@ -1221,7 +1221,7 @@ static inline unsigned int queue_max_segment_size(struct request_queue *q) return q->limits.max_segment_size; } -static inline unsigned short queue_logical_block_size(struct request_queue *q) +static inline unsigned queue_logical_block_size(struct request_queue *q) { int retval = 512; @@ -1231,7 +1231,7 @@ static inline unsigned short queue_logical_block_size(struct request_queue *q) return retval; } -static inline unsigned short bdev_logical_block_size(struct block_device *bdev) +static inline unsigned int bdev_logical_block_size(struct block_device *bdev) { return queue_logical_block_size(bdev_get_queue(bdev)); } -- cgit v1.2.3 From 1bdac6c2fed0fc0789b9c29d7f5e1ab2b200182a Mon Sep 17 00:00:00 2001 From: Stephan Gerhold Date: Wed, 6 Nov 2019 18:31:25 +0100 Subject: regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_id commit 458ea3ad033fc86e291712ce50cbe60c3428cf30 upstream. Those regulators are not actually supported by the AB8500 regulator driver. There is no ab8500_regulator_info for them and no entry in ab8505_regulator_match. As such, they cannot be registered successfully, and looking them up in ab8505_regulator_match causes an out-of-bounds array read. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij Signed-off-by: Stephan Gerhold Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20191106173125.14496-2-stephan@gerhold.net Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- include/linux/regulator/ab8500.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/regulator/ab8500.h b/include/linux/regulator/ab8500.h index 260c4aa1d976..3f6b8b9ef49d 100644 --- a/include/linux/regulator/ab8500.h +++ b/include/linux/regulator/ab8500.h @@ -43,8 +43,6 @@ enum ab8505_regulator_id { AB8505_LDO_ANAMIC2, AB8505_LDO_AUX8, AB8505_LDO_ANA, - AB8505_SYSCLKREQ_2, - AB8505_SYSCLKREQ_4, AB8505_NUM_REGULATORS, }; -- cgit v1.2.3 From 190c1d6c90dc347ea55467d51b5b536ec68f4370 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 24 Apr 2019 05:46:27 -0400 Subject: media: davinci/vpbe: array underflow in vpbe_enum_outputs() [ Upstream commit b72845ee5577b227131b1fef23f9d9a296621d7b ] In vpbe_enum_outputs() we check if (temp_index >= cfg->num_outputs) but the problem is that "temp_index" can be negative. This patch changes the types to unsigned to address this array underflow bug. Fixes: 66715cdc3224 ("[media] davinci vpbe: VPBE display driver") Signed-off-by: Dan Carpenter Acked-by: "Lad, Prabhakar" Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- include/media/davinci/vpbe.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/media/davinci/vpbe.h b/include/media/davinci/vpbe.h index 4376beeb28c2..5d8ceeddc797 100644 --- a/include/media/davinci/vpbe.h +++ b/include/media/davinci/vpbe.h @@ -96,7 +96,7 @@ struct vpbe_config { struct encoder_config_info *ext_encoders; /* amplifier information goes here */ struct amp_config_info *amp; - int num_outputs; + unsigned int num_outputs; /* Order is venc outputs followed by LCD and then external encoders */ struct vpbe_output *outputs; }; -- cgit v1.2.3 From db62dded6dcb7c78d65a588882a7f3da39d007e2 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 28 Jun 2019 16:59:45 +0200 Subject: devres: allow const resource arguments [ Upstream commit 9dea44c91469512d346e638694c22c30a5273992 ] devm_ioremap_resource() does not currently take 'const' arguments, which results in a warning from the first driver trying to do it anyway: drivers/gpio/gpio-amd-fch.c: In function 'amd_fch_gpio_probe': drivers/gpio/gpio-amd-fch.c:171:49: error: passing argument 2 of 'devm_ioremap_resource' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] priv->base = devm_ioremap_resource(&pdev->dev, &amd_fch_gpio_iores); ^~~~~~~~~~~~~~~~~~~ Change the prototype to allow it, as there is no real reason not to. Fixes: 9bb2e0452508 ("gpio: amd: Make resource struct const") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20190628150049.1108048-1-arnd@arndb.de Acked-by: Greg Kroah-Hartman Reviwed-By: Enrico Weigelt Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- include/linux/device.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/device.h b/include/linux/device.h index 8d732965fab7..eb865b461acc 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -682,7 +682,8 @@ extern unsigned long devm_get_free_pages(struct device *dev, gfp_t gfp_mask, unsigned int order); extern void devm_free_pages(struct device *dev, unsigned long addr); -void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res); +void __iomem *devm_ioremap_resource(struct device *dev, + const struct resource *res); /* allows to add/remove a custom action to devres stack */ int devm_add_action(struct device *dev, void (*action)(void *), void *data); -- cgit v1.2.3 From ef25c0172fc4e30473a80d3f6c99a7e0e97c44c7 Mon Sep 17 00:00:00 2001 From: Mark Zhang Date: Wed, 31 Jul 2019 14:40:13 +0300 Subject: net/mlx5: Fix mlx5_ifc_query_lag_out_bits [ Upstream commit ea77388b02270b0af8dc57f668f311235ea068f0 ] Remove the "reserved_at_40" field to match the device specification. Fixes: 84df61ebc69b ("net/mlx5: Add HW interfaces used by LAG") Signed-off-by: Mark Zhang Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- include/linux/mlx5/mlx5_ifc.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 20ee90c47cd5..6dd276227217 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -7909,8 +7909,6 @@ struct mlx5_ifc_query_lag_out_bits { u8 syndrome[0x20]; - u8 reserved_at_40[0x40]; - struct mlx5_ifc_lagc_bits ctx; }; -- cgit v1.2.3 From a371457c19ae0e24e72b2c0a7635fc65a046b98e Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 16 Aug 2019 12:33:54 -0500 Subject: signal: Allow cifs and drbd to receive their terminating signals MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 33da8e7c814f77310250bb54a9db36a44c5de784 ] My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg Reported-by: Christoph Böhmwalder Tested-by: Christoph Böhmwalder Cc: Steve French Cc: Philipp Reisner Cc: David Laight Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" Signed-off-by: Sasha Levin --- include/linux/signal.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/signal.h b/include/linux/signal.h index 5308304993be..ffa58ff53e22 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -313,6 +313,9 @@ extern void signal_setup_done(int failed, struct ksignal *ksig, int stepping); extern void exit_signals(struct task_struct *tsk); extern void kernel_sigaction(int, __sighandler_t); +#define SIG_KTHREAD ((__force __sighandler_t)2) +#define SIG_KTHREAD_KERNEL ((__force __sighandler_t)3) + static inline void allow_signal(int sig) { /* @@ -320,7 +323,17 @@ static inline void allow_signal(int sig) * know it'll be handled, so that they don't get converted to * SIGKILL or just silently dropped. */ - kernel_sigaction(sig, (__force __sighandler_t)2); + kernel_sigaction(sig, SIG_KTHREAD); +} + +static inline void allow_kernel_signal(int sig) +{ + /* + * Kernel threads handle their own signals. Let the signal code + * know signals sent by the kernel will be handled, so that they + * don't get silently dropped. + */ + kernel_sigaction(sig, SIG_KTHREAD_KERNEL); } static inline void disallow_signal(int sig) -- cgit v1.2.3 From 9e0951ca586149e4b8b133ff137050ad502912b4 Mon Sep 17 00:00:00 2001 From: Robin Gong Date: Tue, 24 Sep 2019 09:49:18 +0000 Subject: dmaengine: imx-sdma: fix size check for sdma script_number [ Upstream commit bd73dfabdda280fc5f05bdec79b6721b4b2f035f ] Illegal memory will be touch if SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (41) exceed the size of structure sdma_script_start_addrs(40), thus cause memory corrupt such as slob block header so that kernel trap into while() loop forever in slob_free(). Please refer to below code piece in imx-sdma.c: for (i = 0; i < sdma->script_number; i++) if (addr_arr[i] > 0) saddr_arr[i] = addr_arr[i]; /* memory corrupt here */ That issue was brought by commit a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") because SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (38->41 3 scripts added) not align with script number added in sdma_script_start_addrs(2 scripts). Fixes: a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") Cc: stable@vger.kernel Link: https://www.spinics.net/lists/arm-kernel/msg754895.html Signed-off-by: Robin Gong Reported-by: Jurgen Lambrecht Link: https://lore.kernel.org/r/1569347584-3478-1-git-send-email-yibin.gong@nxp.com [vkoul: update the patch title] Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- include/linux/platform_data/dma-imx-sdma.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/platform_data/dma-imx-sdma.h b/include/linux/platform_data/dma-imx-sdma.h index 2d08816720f6..5bb0a119f39a 100644 --- a/include/linux/platform_data/dma-imx-sdma.h +++ b/include/linux/platform_data/dma-imx-sdma.h @@ -50,7 +50,10 @@ struct sdma_script_start_addrs { /* End of v2 array */ s32 zcanfd_2_mcu_addr; s32 zqspi_2_mcu_addr; + s32 mcu_2_ecspi_addr; /* End of v3 array */ + s32 mcu_2_zqspi_addr; + /* End of v4 array */ }; /** -- cgit v1.2.3 From e9cd6721460c59dc289d169b5d411bf8b9cef57a Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Wed, 20 Sep 2017 15:52:13 -0700 Subject: net: ethtool: Add back transceiver type commit 19cab8872692960535aa6d12e3a295ac51d1a648 upstream. Commit 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") deprecated the ethtool_cmd::transceiver field, which was fine in premise, except that the PHY library was actually using it to report the type of transceiver: internal or external. Use the first word of the reserved field to put this __u8 transceiver field back in. It is made read-only, and we don't expect the ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this is mostly for the legacy path where we do: ethtool_get_settings() -> dev->ethtool_ops->get_link_ksettings() -> convert_link_ksettings_to_legacy_settings() to have no information loss compared to the legacy get_settings API. Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/ethtool.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 8c5335bde212..5dd3332ebc66 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1687,6 +1687,8 @@ enum ethtool_reset_flags { * %ethtool_link_mode_bit_indices for the link modes, and other * link features that the link partner advertised through * autonegotiation; 0 if unknown or not applicable. Read-only. + * @transceiver: Used to distinguish different possible PHY types, + * reported consistently by PHYLIB. Read-only. * * If autonegotiation is disabled, the speed and @duplex represent the * fixed link mode and are writable if the driver supports multiple @@ -1738,7 +1740,9 @@ struct ethtool_link_settings { __u8 eth_tp_mdix; __u8 eth_tp_mdix_ctrl; __s8 link_mode_masks_nwords; - __u32 reserved[8]; + __u8 transceiver; + __u8 reserved1[3]; + __u32 reserved[7]; __u32 link_mode_masks[0]; /* layout of link_mode_masks fields: * __u32 map_supported[link_mode_masks_nwords]; -- cgit v1.2.3 From d0a73d27b927026832a15d3d982e126fe79a2588 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Sun, 12 Jan 2020 11:42:31 +0800 Subject: tracing: xen: Ordered comparison of function pointers commit d0695e2351102affd8efae83989056bc4b275917 upstream. Just as commit 0566e40ce7 ("tracing: initcall: Ordered comparison of function pointers"), this patch fixes another remaining one in xen.h found by clang-9. In file included from arch/x86/xen/trace.c:21: In file included from ./include/trace/events/xen.h:475: In file included from ./include/trace/define_trace.h:102: In file included from ./include/trace/trace_events.h:473: ./include/trace/events/xen.h:69:7: warning: ordered comparison of function \ pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') [-Wordered-compare-function-pointers] __field(xen_mc_callback_fn_t, fn) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/trace/trace_events.h:421:29: note: expanded from macro '__field' ^ ./include/trace/trace_events.h:407:6: note: expanded from macro '__field_ext' is_signed_type(type), filter_type); \ ^ ./include/linux/trace_events.h:554:44: note: expanded from macro 'is_signed_type' ^ Fixes: c796f213a6934 ("xen/trace: add multicall tracing") Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- include/trace/events/xen.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/trace/events/xen.h b/include/trace/events/xen.h index d6be935caa50..ecd1a0f7bd3e 100644 --- a/include/trace/events/xen.h +++ b/include/trace/events/xen.h @@ -63,7 +63,11 @@ TRACE_EVENT(xen_mc_callback, TP_PROTO(xen_mc_callback_fn_t fn, void *data), TP_ARGS(fn, data), TP_STRUCT__entry( - __field(xen_mc_callback_fn_t, fn) + /* + * Use field_struct to avoid is_signed_type() + * comparison of a function pointer. + */ + __field_struct(xen_mc_callback_fn_t, fn) __field(void *, data) ), TP_fast_assign( -- cgit v1.2.3 From 33a451d9d8ba52c2ffe6c1690fc49d798868ba2a Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 1 Aug 2018 15:42:56 -0700 Subject: bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free() commit c42b65e363ce97a828f81b59033c3558f8fa7f70 upstream. A lot of code become ugly because of open coding allocations for bitmaps. Introduce three helpers to allow users be more clear of intention and keep their code neat. Note, due to multiple circular dependencies we may not provide the helpers as inliners. For now we keep them exported and, perhaps, at some point in the future we will sort out header inclusion and inheritance. Signed-off-by: Andy Shevchenko Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- include/linux/bitmap.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index dec03c0dbc21..26347cc717e8 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -85,6 +85,14 @@ * contain all bit positions from 0 to 'bits' - 1. */ +/* + * Allocation and deallocation of bitmap. + * Provided in lib/bitmap.c to avoid circular dependency. + */ +extern unsigned long *bitmap_alloc(unsigned int nbits, gfp_t flags); +extern unsigned long *bitmap_zalloc(unsigned int nbits, gfp_t flags); +extern void bitmap_free(const unsigned long *bitmap); + /* * lib/bitmap.c provides these functions: */ -- cgit v1.2.3 From 1bbbcf6d2321acd6f9a16f8455f03bd48b343f5e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kadlecsik=20J=C3=B3zsef?= Date: Sun, 19 Jan 2020 22:06:49 +0100 Subject: netfilter: ipset: use bitmap infrastructure completely commit 32c72165dbd0e246e69d16a3ad348a4851afd415 upstream. The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- include/linux/netfilter/ipset/ip_set.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include') diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 83b9a2e0d8d4..e3a91b24a31c 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -537,13 +537,6 @@ ip6addrptr(const struct sk_buff *skb, bool src, struct in6_addr *addr) sizeof(*addr)); } -/* Calculate the bytes required to store the inclusive range of a-b */ -static inline int -bitmap_bytes(u32 a, u32 b) -{ - return 4 * ((((b - a + 8) / 8) + 3) / 4); -} - #include #include -- cgit v1.2.3 From 10d24acd8d283aff9cf7065a9ee768ad27b81bad Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 22 Jan 2020 11:15:27 +0100 Subject: USB: serial: ir-usb: fix link-speed handling commit 17a0184ca17e288decdca8b2841531e34d49285f upstream. Commit e0d795e4f36c ("usb: irda: cleanup on ir-usb module") added a USB IrDA header with common defines, but mistakingly switched to using the class-descriptor baud-rate bitmask values for the outbound header. This broke link-speed handling for rates above 9600 baud, but a device would also be able to operate at the default 9600 baud until a link-speed request was issued (e.g. using the TCGETS ioctl). Fixes: e0d795e4f36c ("usb: irda: cleanup on ir-usb module") Cc: stable # 2.6.27 Cc: Felipe Balbi Reviewed-by: Greg Kroah-Hartman Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/irda.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/usb/irda.h b/include/linux/usb/irda.h index e345ceaf72d6..9dc46010a067 100644 --- a/include/linux/usb/irda.h +++ b/include/linux/usb/irda.h @@ -118,11 +118,22 @@ struct usb_irda_cs_descriptor { * 6 - 115200 bps * 7 - 576000 bps * 8 - 1.152 Mbps - * 9 - 5 mbps + * 9 - 4 Mbps * 10..15 - Reserved */ #define USB_IRDA_STATUS_LINK_SPEED 0x0f +#define USB_IRDA_LS_NO_CHANGE 0 +#define USB_IRDA_LS_2400 1 +#define USB_IRDA_LS_9600 2 +#define USB_IRDA_LS_19200 3 +#define USB_IRDA_LS_38400 4 +#define USB_IRDA_LS_57600 5 +#define USB_IRDA_LS_115200 6 +#define USB_IRDA_LS_576000 7 +#define USB_IRDA_LS_1152000 8 +#define USB_IRDA_LS_4000000 9 + /* The following is a 4-bit value used only for * outbound header: * -- cgit v1.2.3 From 30e402b922767b80ce8fb0d1c7734d2d22df071c Mon Sep 17 00:00:00 2001 From: Helen Koike Date: Tue, 17 Dec 2019 21:00:22 +0100 Subject: media: v4l2-rect.h: fix v4l2_rect_map_inside() top/left adjustments commit f51e50db4c20d46930b33be3f208851265694f3e upstream. boundary->width and boundary->height are sizes relative to boundary->left and boundary->top coordinates, but they were not being taken into consideration to adjust r->left and r->top, leading to the following error: Consider the follow as initial values for boundary and r: struct v4l2_rect boundary = { .left = 100, .top = 100, .width = 800, .height = 600, } struct v4l2_rect r = { .left = 0, .top = 0, .width = 1920, .height = 960, } calling v4l2_rect_map_inside(&r, &boundary) was modifying r to: r = { .left = 0, .top = 0, .width = 800, .height = 600, } Which is wrongly outside the boundary rectangle, because: v4l2_rect_set_max_size(r, boundary); // r->width = 800, r->height = 600 ... if (r->left + r->width > boundary->width) // true r->left = boundary->width - r->width; // r->left = 800 - 800 if (r->top + r->height > boundary->height) // true r->top = boundary->height - r->height; // r->height = 600 - 600 Fix this by considering top/left coordinates from boundary. Fixes: ac49de8c49d7 ("[media] v4l2-rect.h: new header with struct v4l2_rect helper functions") Signed-off-by: Helen Koike Cc: # for v4.7 and up Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- include/media/v4l2-rect.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/media/v4l2-rect.h b/include/media/v4l2-rect.h index d2125f0cc7cd..1584c760b993 100644 --- a/include/media/v4l2-rect.h +++ b/include/media/v4l2-rect.h @@ -75,10 +75,10 @@ static inline void v4l2_rect_map_inside(struct v4l2_rect *r, r->left = boundary->left; if (r->top < boundary->top) r->top = boundary->top; - if (r->left + r->width > boundary->width) - r->left = boundary->width - r->width; - if (r->top + r->height > boundary->height) - r->top = boundary->height - r->height; + if (r->left + r->width > boundary->left + boundary->width) + r->left = boundary->left + boundary->width - r->width; + if (r->top + r->height > boundary->top + boundary->height) + r->top = boundary->top + boundary->height - r->height; } /** -- cgit v1.2.3 From 39919a2ac1d99348a400d7e87dabcf9f142f4df0 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Sun, 8 Dec 2019 22:11:40 +0100 Subject: media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros [ Upstream commit afb34781620274236bd9fc9246e22f6963ef5262 ] When building with Clang + -Wtautological-constant-compare, several of the ivtv and cx18 drivers warn along the lines of: drivers/media/pci/cx18/cx18-driver.c:1005:21: warning: converting the result of '<<' to a boolean always evaluates to true [-Wtautological-constant-compare] cx18_call_hw(cx, CX18_HW_GPIO_RESET_CTRL, ^ drivers/media/pci/cx18/cx18-cards.h:18:37: note: expanded from macro 'CX18_HW_GPIO_RESET_CTRL' #define CX18_HW_GPIO_RESET_CTRL (1 << 6) ^ 1 warning generated. This warning happens because the shift operation is implicitly converted to a boolean in v4l2_device_mask_call_all before being negated. This can be solved by just comparing the mask result to 0 explicitly so that there is no boolean conversion. The ultimate goal is to enable -Wtautological-compare globally because there are several subwarnings that would be helpful to have. For visual consistency and avoidance of these warnings in the future, all of the implicitly boolean conversions in the v4l2_device macros are converted to explicit ones as well. Link: https://github.com/ClangBuiltLinux/linux/issues/752 Reviewed-by: Ezequiel Garcia Reviewed-by: Nick Desaulniers Signed-off-by: Nathan Chancellor Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- include/media/v4l2-device.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/media/v4l2-device.h b/include/media/v4l2-device.h index 8ffa94009d1a..76002416cead 100644 --- a/include/media/v4l2-device.h +++ b/include/media/v4l2-device.h @@ -268,7 +268,7 @@ static inline void v4l2_subdev_notify(struct v4l2_subdev *sd, struct v4l2_subdev *__sd; \ \ __v4l2_device_call_subdevs_p(v4l2_dev, __sd, \ - !(grpid) || __sd->grp_id == (grpid), o, f , \ + (grpid) == 0 || __sd->grp_id == (grpid), o, f , \ ##args); \ } while (0) @@ -280,7 +280,7 @@ static inline void v4l2_subdev_notify(struct v4l2_subdev *sd, ({ \ struct v4l2_subdev *__sd; \ __v4l2_device_call_subdevs_until_err_p(v4l2_dev, __sd, \ - !(grpid) || __sd->grp_id == (grpid), o, f , \ + (grpid) == 0 || __sd->grp_id == (grpid), o, f , \ ##args); \ }) @@ -294,8 +294,8 @@ static inline void v4l2_subdev_notify(struct v4l2_subdev *sd, struct v4l2_subdev *__sd; \ \ __v4l2_device_call_subdevs_p(v4l2_dev, __sd, \ - !(grpmsk) || (__sd->grp_id & (grpmsk)), o, f , \ - ##args); \ + (grpmsk) == 0 || (__sd->grp_id & (grpmsk)), o, \ + f , ##args); \ } while (0) /* @@ -308,8 +308,8 @@ static inline void v4l2_subdev_notify(struct v4l2_subdev *sd, ({ \ struct v4l2_subdev *__sd; \ __v4l2_device_call_subdevs_until_err_p(v4l2_dev, __sd, \ - !(grpmsk) || (__sd->grp_id & (grpmsk)), o, f , \ - ##args); \ + (grpmsk) == 0 || (__sd->grp_id & (grpmsk)), o, \ + f , ##args); \ }) /* -- cgit v1.2.3 From cd9ea1f4d9a7cb3b1fdd59e48723d2c43df8bf3f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sat, 9 Nov 2019 09:42:13 -0800 Subject: rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls [ Upstream commit 860c8802ace14c646864795e057349c9fb2d60ad ] Eric Dumazet supplied a KCSAN report of a bug that forces use of hlist_unhashed_lockless() from sk_unhashed(): ------------------------------------------------------------------------ BUG: KCSAN: data-race in inet_unhash / inet_unhash write to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 1: __hlist_nulls_del include/linux/list_nulls.h:88 [inline] hlist_nulls_del_init_rcu include/linux/rculist_nulls.h:36 [inline] __sk_nulls_del_node_init_rcu include/net/sock.h:676 [inline] inet_unhash+0x38f/0x4a0 net/ipv4/inet_hashtables.c:612 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 read to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 0: sk_unhashed include/net/sock.h:607 [inline] inet_unhash+0x3d/0x4a0 net/ipv4/inet_hashtables.c:592 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 rest_init+0xec/0xf6 init/main.c:452 arch_call_rest_init+0x17/0x37 start_kernel+0x838/0x85e init/main.c:786 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ------------------------------------------------------------------------ This commit therefore replaces C-language assignments with WRITE_ONCE() in include/linux/list_nulls.h and include/linux/rculist_nulls.h. Reported-by: Eric Dumazet # For KCSAN Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- include/linux/list_nulls.h | 8 ++++---- include/linux/rculist_nulls.h | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h index 87ff4f58a2f0..9e20bf7f46a2 100644 --- a/include/linux/list_nulls.h +++ b/include/linux/list_nulls.h @@ -71,10 +71,10 @@ static inline void hlist_nulls_add_head(struct hlist_nulls_node *n, struct hlist_nulls_node *first = h->first; n->next = first; - n->pprev = &h->first; + WRITE_ONCE(n->pprev, &h->first); h->first = n; if (!is_a_nulls(first)) - first->pprev = &n->next; + WRITE_ONCE(first->pprev, &n->next); } static inline void __hlist_nulls_del(struct hlist_nulls_node *n) @@ -84,13 +84,13 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n) WRITE_ONCE(*pprev, next); if (!is_a_nulls(next)) - next->pprev = pprev; + WRITE_ONCE(next->pprev, pprev); } static inline void hlist_nulls_del(struct hlist_nulls_node *n) { __hlist_nulls_del(n); - n->pprev = LIST_POISON2; + WRITE_ONCE(n->pprev, LIST_POISON2); } /** diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h index 106f4e0d7bd3..4d71e3687d1e 100644 --- a/include/linux/rculist_nulls.h +++ b/include/linux/rculist_nulls.h @@ -33,7 +33,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n) { if (!hlist_nulls_unhashed(n)) { __hlist_nulls_del(n); - n->pprev = NULL; + WRITE_ONCE(n->pprev, NULL); } } @@ -65,7 +65,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n) static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n) { __hlist_nulls_del(n); - n->pprev = LIST_POISON2; + WRITE_ONCE(n->pprev, LIST_POISON2); } /** @@ -93,10 +93,10 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, struct hlist_nulls_node *first = h->first; n->next = first; - n->pprev = &h->first; + WRITE_ONCE(n->pprev, &h->first); rcu_assign_pointer(hlist_nulls_first_rcu(h), n); if (!is_a_nulls(first)) - first->pprev = &n->next; + WRITE_ONCE(first->pprev, &n->next); } /** -- cgit v1.2.3 From 67c33346df53fd543c740d4f6c2b8e8bda467a54 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 12 Feb 2020 21:09:00 -0800 Subject: scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" commit 807b9515b7d044cf77df31f1af9d842a76ecd5cb upstream. Since commit e9d3009cb936 introduced a regression and since the fix for that regression was not perfect, revert this commit. Link: https://marc.info/?l=target-devel&m=158157054906195 Cc: Rahul Kundu Cc: Mike Marciniszyn Cc: Sagi Grimberg Reported-by: Dakshaja Uppalapati Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session") Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- include/scsi/iscsi_proto.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h index 1a2ae0862e23..c1260d80ef30 100644 --- a/include/scsi/iscsi_proto.h +++ b/include/scsi/iscsi_proto.h @@ -638,7 +638,6 @@ struct iscsi_reject { #define ISCSI_REASON_BOOKMARK_INVALID 9 #define ISCSI_REASON_BOOKMARK_NO_RESOURCES 10 #define ISCSI_REASON_NEGOTIATION_RESET 11 -#define ISCSI_REASON_WAITING_FOR_LOGOUT 12 /* Max. number of Key=Value pairs in a text message */ #define MAX_KEY_VALUE_PAIRS 8192 -- cgit v1.2.3 From 0aa6ce52d38e3b4d257c9d314208f83781ef4854 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 14 Feb 2020 12:13:16 +0100 Subject: ALSA: rawmidi: Avoid bit fields for state flags commit dfa9a5efe8b932a84b3b319250aa3ac60c20f876 upstream. The rawmidi state flags (opened, append, active_sensing) are stored in bit fields that can be potentially racy when concurrently accessed without any locks. Although the current code should be fine, there is also no any real benefit by keeping the bitfields for this kind of short number of members. This patch changes those bit fields flags to the simple bool fields. There should be no size increase of the snd_rawmidi_substream by this change. Reported-by: syzbot+576cc007eb9f2c968200@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20200214111316.26939-4-tiwai@suse.de Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- include/sound/rawmidi.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/sound/rawmidi.h b/include/sound/rawmidi.h index f730b91e472f..5432111c8761 100644 --- a/include/sound/rawmidi.h +++ b/include/sound/rawmidi.h @@ -92,9 +92,9 @@ struct snd_rawmidi_substream { struct list_head list; /* list of all substream for given stream */ int stream; /* direction */ int number; /* substream number */ - unsigned int opened: 1, /* open flag */ - append: 1, /* append flag (merge more streams) */ - active_sensing: 1; /* send active sensing when close */ + bool opened; /* open flag */ + bool append; /* append flag (merge more streams) */ + bool active_sensing; /* send active sensing when close */ int use_count; /* use counter (for output) */ size_t bytes; struct snd_rawmidi *rmidi; -- cgit v1.2.3 From dfbf24128630ed4f38f18f7ca12b58a7ba6964ad Mon Sep 17 00:00:00 2001 From: Prabhakar Kushwaha Date: Sat, 25 Jan 2020 03:37:29 +0000 Subject: ata: ahci: Add shutdown to freeze hardware resources of ahci commit 10a663a1b15134a5a714aa515e11425a44d4fdf7 upstream. device_shutdown() called from reboot or power_shutdown expect all devices to be shutdown. Same is true for even ahci pci driver. As no ahci shutdown function is implemented, the ata subsystem always remains alive with DMA & interrupt support. File system related calls should not be honored after device_shutdown(). So defining ahci pci driver shutdown to freeze hardware (mask interrupt, stop DMA engine and free DMA resources). Signed-off-by: Prabhakar Kushwaha Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/libata.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/libata.h b/include/linux/libata.h index df58b01e6962..cdfb67b22317 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1222,6 +1222,7 @@ struct pci_bits { }; extern int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits *bits); +extern void ata_pci_shutdown_one(struct pci_dev *pdev); extern void ata_pci_remove_one(struct pci_dev *pdev); #ifdef CONFIG_PM -- cgit v1.2.3 From 259384b9b2c27b63ed94851932b4c5bac7f9046c Mon Sep 17 00:00:00 2001 From: Jason Baron Date: Mon, 17 Feb 2020 15:38:09 -0500 Subject: net: sched: correct flower port blocking [ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ] tc flower rules that are based on src or dst port blocking are sometimes ineffective due to uninitialized stack data. __skb_flow_dissect() extracts ports from the skb for tc flower to match against. However, the port dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in key_control->flags. All callers of __skb_flow_dissect(), zero-out the key_control field except for fl_classify() as used by the flower classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to __skb_flow_dissect(), since key_control is allocated on the stack and may not be initialized. Since key_basic and key_control are present for all flow keys, let's make sure they are initialized. Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments") Co-developed-by: Eric Dumazet Signed-off-by: Eric Dumazet Acked-by: Cong Wang Signed-off-by: Jason Baron Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/flow_dissector.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 1505cf7a4aaf..7a85a4ef6868 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -4,6 +4,7 @@ #include #include #include +#include #include /** @@ -204,4 +205,12 @@ static inline void *skb_flow_dissector_target(struct flow_dissector *flow_dissec return ((char *)target_container) + flow_dissector->offset[key_id]; } +static inline void +flow_dissector_init_keys(struct flow_dissector_key_control *key_control, + struct flow_dissector_key_basic *key_basic) +{ + memset(key_control, 0, sizeof(*key_control)); + memset(key_basic, 0, sizeof(*key_basic)); +} + #endif -- cgit v1.2.3 From ba7e258550e2fd5387181f30538b137381904eb3 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 12 Feb 2020 17:59:39 +0300 Subject: ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro commit 1dade3a7048ccfc675650cd2cf13d578b095e5fb upstream. Sometimes it is useful to find the access_width field value in bytes and not in bits so add a helper that can be used for this purpose. Suggested-by: Jean Delvare Signed-off-by: Mika Westerberg Reviewed-by: Jean Delvare Cc: 4.16+ # 4.16+ Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- include/acpi/actypes.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h index 1d798abae710..f502d257d494 100644 --- a/include/acpi/actypes.h +++ b/include/acpi/actypes.h @@ -551,6 +551,8 @@ typedef u64 acpi_integer; #define ACPI_VALIDATE_RSDP_SIG(a) (!strncmp (ACPI_CAST_PTR (char, (a)), ACPI_SIG_RSDP, 8)) #define ACPI_MAKE_RSDP_SIG(dest) (memcpy (ACPI_CAST_PTR (char, (dest)), ACPI_SIG_RSDP, 8)) +#define ACPI_ACCESS_BYTE_WIDTH(size) (1 << ((size) - 1)) + /******************************************************************************* * * Miscellaneous constants -- cgit v1.2.3 From 3adb1f8cbfa6addf70ac9fa2f8dc94abf0c5e4b4 Mon Sep 17 00:00:00 2001 From: Johan Korsnes Date: Fri, 17 Jan 2020 13:08:36 +0100 Subject: HID: core: increase HID report buffer size to 8KiB commit 84a4062632462c4320704fcdf8e99e89e94c0aba upstream. We have a HID touch device that reports its opens and shorts test results in HID buffers of size 8184 bytes. The maximum size of the HID buffer is currently set to 4096 bytes, causing probe of this device to fail. With this patch we increase the maximum size of the HID buffer to 8192 bytes, making device probe and acquisition of said buffers succeed. Signed-off-by: Johan Korsnes Cc: Alan Stern Cc: Armando Visconti Cc: Jiri Kosina Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- include/linux/hid.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/hid.h b/include/linux/hid.h index 04bdf5477ec5..eda06f7ee84a 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -453,7 +453,7 @@ struct hid_report_enum { }; #define HID_MIN_BUFFER_SIZE 64 /* make sure there is at least a packet size of space */ -#define HID_MAX_BUFFER_SIZE 4096 /* 4kb */ +#define HID_MAX_BUFFER_SIZE 8192 /* 8kb */ #define HID_CONTROL_FIFO_SIZE 256 /* to init devices with >100 reports */ #define HID_OUTPUT_FIFO_SIZE 64 -- cgit v1.2.3 From 366d368729d65e9ebb7bf0e113560a2f496a5935 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Tue, 21 Aug 2018 21:57:03 -0700 Subject: include/linux/bitops.h: introduce BITS_PER_TYPE commit 9144d75e22cad3c89e6b2ccab551db9ee28d250a upstream. net_dim.h has a rather useful extension to BITS_PER_BYTE to compute the number of bits in a type (BITS_PER_BYTE * sizeof(T)), so promote the macro to bitops.h, alongside BITS_PER_BYTE, for wider usage. Link: http://lkml.kernel.org/r/20180706094458.14116-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson Reviewed-by: Jani Nikula Cc: Randy Dunlap Cc: Andy Gospodarek Cc: David S. Miller Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [only take the bitops.h portion for stable kernels - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 76ad8a957ffa..cee74a52b9eb 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -3,7 +3,8 @@ #include #include -#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) +#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) +#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(long)) extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); -- cgit v1.2.3 From 0b21c9cbf647a8f1b4d9d45d6b30dfd47c8a5731 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 2 Mar 2020 21:05:13 -0800 Subject: fib: add missing attribute validation for tun_id [ Upstream commit 4c16d64ea04056f1b1b324ab6916019f6a064114 ] Add missing netlink policy entry for FRA_TUN_ID. Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id") Signed-off-by: Jakub Kicinski Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/fib_rules.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 456e4a6006ab..0b0ad792dd5c 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -87,6 +87,7 @@ struct fib_rules_ops { [FRA_OIFNAME] = { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \ [FRA_PRIORITY] = { .type = NLA_U32 }, \ [FRA_FWMARK] = { .type = NLA_U32 }, \ + [FRA_TUN_ID] = { .type = NLA_U64 }, \ [FRA_FWMASK] = { .type = NLA_U32 }, \ [FRA_TABLE] = { .type = NLA_U32 }, \ [FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \ -- cgit v1.2.3 From e9ed467f390d24a15722387bfd5e8a3dea3972a9 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 12 Mar 2020 22:25:20 +0100 Subject: net: phy: fix MDIO bus PM PHY resuming [ Upstream commit 611d779af7cad2b87487ff58e4931a90c20b113c ] So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Signed-off-by: Heiner Kallweit Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/phy.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 867110c9d707..8eafced47540 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -333,6 +333,7 @@ struct phy_c45_device_ids { * is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc. * has_fixups: Set to true if this phy has fixups/quirks. * suspended: Set to true if this phy has been suspended successfully. + * suspended_by_mdio_bus: Set to true if this phy was suspended by MDIO bus. * state: state of the PHY for management purposes * dev_flags: Device-specific flags used by the PHY driver. * link_timeout: The number of timer firings to wait before the @@ -369,6 +370,7 @@ struct phy_device { bool is_pseudo_fixed_link; bool has_fixups; bool suspended; + bool suspended_by_mdio_bus; enum phy_state state; -- cgit v1.2.3 From 8c59bdceffbc8f7485ac4e68a1eb3d618154fc35 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Sat, 21 Mar 2020 18:22:41 -0700 Subject: x86/mm: split vmalloc_sync_all() commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream. Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot Reported-by: Shile Zhang Signed-off-by: Joerg Roedel Signed-off-by: Andrew Morton Tested-by: Borislav Petkov Acked-by: Rafael J. Wysocki [GHES] Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/vmalloc.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 3d9d786a943c..d868a735545b 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -93,8 +93,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma, extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); -void vmalloc_sync_all(void); - +void vmalloc_sync_mappings(void); +void vmalloc_sync_unmappings(void); + /* * Lowlevel-APIs (not for driver use!) */ -- cgit v1.2.3 From fb099f3bb477a0ee2d0669a753f7ffcdf8884c2d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 4 Mar 2020 11:28:31 +0100 Subject: futex: Fix inode life-time issue commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream. As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Reported-by: Jann Horn Suggested-by: Linus Torvalds Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Greg Kroah-Hartman --- include/linux/fs.h | 1 + include/linux/futex.h | 17 ++++++++++------- 2 files changed, 11 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/linux/fs.h b/include/linux/fs.h index 5244df520bed..c2c04f891785 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -679,6 +679,7 @@ struct inode { struct rcu_head i_rcu; }; u64 i_version; + atomic64_t i_sequence; /* see futex */ atomic_t i_count; atomic_t i_dio_count; atomic_t i_writecount; diff --git a/include/linux/futex.h b/include/linux/futex.h index 6435f46d6e13..c015fa91e7cc 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -34,23 +34,26 @@ handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi); union futex_key { struct { + u64 i_seq; unsigned long pgoff; - struct inode *inode; - int offset; + unsigned int offset; } shared; struct { + union { + struct mm_struct *mm; + u64 __tmp; + }; unsigned long address; - struct mm_struct *mm; - int offset; + unsigned int offset; } private; struct { + u64 ptr; unsigned long word; - void *ptr; - int offset; + unsigned int offset; } both; }; -#define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = NULL } } +#define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = 0ULL } } #ifdef CONFIG_FUTEX extern void exit_robust_list(struct task_struct *curr); -- cgit v1.2.3 From fc3d6dd88e21c84550aa807d4cb1182bd161f383 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 19 Feb 2020 08:39:43 +0100 Subject: vt: selection, introduce vc_is_sel commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream. Avoid global variables (namely sel_cons) by introducing vc_is_sel. It checks whether the parameter is the current selection console. This will help putting sel_cons to a struct later. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- include/linux/selection.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/selection.h b/include/linux/selection.h index 8e4624efdb6f..f1070e4180ee 100644 --- a/include/linux/selection.h +++ b/include/linux/selection.h @@ -12,8 +12,8 @@ struct tty_struct; -extern struct vc_data *sel_cons; struct tty_struct; +struct vc_data; extern void clear_selection(void); extern int set_selection(const struct tiocl_selection __user *sel, struct tty_struct *tty); @@ -22,6 +22,8 @@ extern int sel_loadlut(char __user *p); extern int mouse_reporting(void); extern void mouse_report(struct tty_struct * tty, int butt, int mrx, int mry); +bool vc_is_sel(struct vc_data *vc); + extern int console_blanked; extern const unsigned char color_table[]; -- cgit v1.2.3 From 2e1c84e1109566b3dbff34ad8cbbdbb0d472712c Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 19 Feb 2020 08:39:48 +0100 Subject: vt: switch vt_dont_switch to bool commit f400991bf872debffb01c46da882dc97d7e3248e upstream. vt_dont_switch is pure boolean, no need for whole char. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20200219073951.16151-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/vt_kern.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/vt_kern.h b/include/linux/vt_kern.h index 6abd24f258bc..362db91266e6 100644 --- a/include/linux/vt_kern.h +++ b/include/linux/vt_kern.h @@ -141,7 +141,7 @@ static inline bool vt_force_oops_output(struct vc_data *vc) return false; } -extern char vt_dont_switch; +extern bool vt_dont_switch; extern int default_utf8; extern int global_cursor_default; -- cgit v1.2.3 From ea9df3c8cc1cae78d32ad25caa64d83435a564b2 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 14 Nov 2016 17:29:48 +0100 Subject: locking/atomic, kref: Add kref_read() commit 2c935bc57221cc2edc787c72ea0e2d30cdcd3d5e upstream. Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Greg Kroah-Hartman Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar [only add kref_read() to kref.h for stable backports - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/linux/kref.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/kref.h b/include/linux/kref.h index e15828fd71f1..e2df6d397ff0 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -33,6 +33,11 @@ static inline void kref_init(struct kref *kref) atomic_set(&kref->refcount, 1); } +static inline int kref_read(const struct kref *kref) +{ + return atomic_read(&kref->refcount); +} + /** * kref_get - increment refcount for object. * @kref: object. -- cgit v1.2.3 From 1b62259d8e9021f2b701f5259a8fcec07fbcd237 Mon Sep 17 00:00:00 2001 From: Eugene Syromiatnikov Date: Tue, 24 Mar 2020 05:22:13 +0100 Subject: coresight: do not use the BIT() macro in the UAPI header commit 9b6eaaf3db5e5888df7bca7fed7752a90f7fd871 upstream. The BIT() macro definition is not available for the UAPI headers (moreover, it can be defined differently in the user space); replace its usage with the _BITUL() macro that is defined in . Fixes: 237483aa5cf4 ("coresight: stm: adding driver for CoreSight STM component") Signed-off-by: Eugene Syromiatnikov Cc: stable Reviewed-by: Mathieu Poirier Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/coresight-stm.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/coresight-stm.h b/include/uapi/linux/coresight-stm.h index 7e4272cf1fb2..741309cedd2c 100644 --- a/include/uapi/linux/coresight-stm.h +++ b/include/uapi/linux/coresight-stm.h @@ -1,8 +1,10 @@ #ifndef __UAPI_CORESIGHT_STM_H_ #define __UAPI_CORESIGHT_STM_H_ -#define STM_FLAG_TIMESTAMPED BIT(3) -#define STM_FLAG_GUARANTEED BIT(7) +#include + +#define STM_FLAG_TIMESTAMPED _BITUL(3) +#define STM_FLAG_GUARANTEED _BITUL(7) /* * The CoreSight STM supports guaranteed and invariant timing -- cgit v1.2.3 From acca1bdda6bdc3858ce963eec6487db56cd2e2c0 Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Fri, 3 Apr 2020 22:51:33 +0200 Subject: thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n commit 3f5b9959041e0db6dacbea80bb833bff5900999f upstream. When CONFIG_DEVFREQ_THERMAL is disabled all functions except of_devfreq_cooling_register_power() were already inlined. Also inline the last function to avoid compile errors when multiple drivers call of_devfreq_cooling_register_power() when CONFIG_DEVFREQ_THERMAL is not set. Compilation failed with the following message: multiple definition of `of_devfreq_cooling_register_power' (which then lists all usages of of_devfreq_cooling_register_power()) Thomas Zimmermann reported this problem [0] on a kernel config with CONFIG_DRM_LIMA={m,y}, CONFIG_DRM_PANFROST={m,y} and CONFIG_DEVFREQ_THERMAL=n after both, the lima and panfrost drivers gained devfreq cooling support. [0] https://www.spinics.net/lists/dri-devel/msg252825.html Fixes: a76caf55e5b356 ("thermal: Add devfreq cooling") Cc: stable@vger.kernel.org Reported-by: Thomas Zimmermann Signed-off-by: Martin Blumenstingl Tested-by: Thomas Zimmermann Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20200403205133.1101808-1-martin.blumenstingl@googlemail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/devfreq_cooling.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h index 7adf6cc4b305..633346b84cae 100644 --- a/include/linux/devfreq_cooling.h +++ b/include/linux/devfreq_cooling.h @@ -53,7 +53,7 @@ void devfreq_cooling_unregister(struct thermal_cooling_device *dfc); #else /* !CONFIG_DEVFREQ_THERMAL */ -struct thermal_cooling_device * +static inline struct thermal_cooling_device * of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, struct devfreq_cooling_power *dfc_power) { -- cgit v1.2.3 From 110012a2c94ad4fa28234a1b39e54fd4114fbaf2 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 30 Mar 2020 19:01:04 -0500 Subject: signal: Extend exec_id to 64bits commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream. Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/sched.h b/include/linux/sched.h index 275511b60978..1872d4e9acbe 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1720,8 +1720,8 @@ struct task_struct { struct seccomp seccomp; /* Thread group tracking */ - u32 parent_exec_id; - u32 self_exec_id; + u64 parent_exec_id; + u64 self_exec_id; /* Protection of (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, * mempolicy */ spinlock_t alloc_lock; -- cgit v1.2.3 From a6047aaf91f76d36dc762a511b6493d1afc7a0da Mon Sep 17 00:00:00 2001 From: Tim Stallard Date: Fri, 3 Apr 2020 21:26:21 +0100 Subject: net: ipv6: do not consider routes via gateways for anycast address check [ Upstream commit 03e2a984b6165621f287fadf5f4b5cd8b58dcaba ] The behaviour for what is considered an anycast address changed in commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception"). This now considers the first address in a subnet where there is a route via a gateway to be an anycast address. This breaks path MTU discovery and traceroutes when a host in a remote network uses the address at the start of a prefix (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors will not be sent to anycast addresses. This patch excludes any routes with a gateway, or via point to point links, like the behaviour previously from rt6_is_gw_or_nonexthop in net/ipv6/route.c. This can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8:100:: dev lo nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1 ip netns exec test sysctl net.ipv6.conf.all.forwarding=1 ip route add 2001:db8:1::1 via 2001:db8::2 ping -I 2001:db8::1 2001:db8:1::1 -c1 ping -I 2001:db8:100:: 2001:db8:1::1 -c1 ip addr delete 2001:db8:100:: dev lo ip netns delete test Currently the first ping will get back a destination unreachable ICMP error, but the second will never get a response, with "icmp6_send: acast source" logged. After this patch, both get destination unreachable ICMP replies. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Tim Stallard Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/ip6_route.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 2c43993e079c..e45b6286983c 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -195,6 +195,7 @@ static inline bool ipv6_anycast_destination(const struct dst_entry *dst, return rt->rt6i_flags & RTF_ANYCAST || (rt->rt6i_dst.plen != 128 && + !(rt->rt6i_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) && ipv6_addr_equal(&rt->rt6i_dst.addr, daddr)); } -- cgit v1.2.3 From 5fadf30330d9dd4725d57cf06e7799c1f9c69edf Mon Sep 17 00:00:00 2001 From: Maurizio Lombardi Date: Fri, 13 Mar 2020 18:06:55 +0100 Subject: scsi: target: fix hang when multiple threads try to destroy the same iscsi session [ Upstream commit 57c46e9f33da530a2485fa01aa27b6d18c28c796 ] A number of hangs have been reported against the target driver; they are due to the fact that multiple threads may try to destroy the iscsi session at the same time. This may be reproduced for example when a "targetcli iscsi/iqn.../tpg1 disable" command is executed while a logout operation is underway. When this happens, two or more threads may end up sleeping and waiting for iscsit_close_connection() to execute "complete(session_wait_comp)". Only one of the threads will wake up and proceed to destroy the session structure, the remaining threads will hang forever. Note that if the blocked threads are somehow forced to wake up with complete_all(), they will try to free the same iscsi session structure destroyed by the first thread, causing double frees, memory corruptions etc... With this patch, the threads that want to destroy the iscsi session will increase the session refcount and will set the "session_close" flag to 1; then they wait for the driver to close the remaining active connections. When the last connection is closed, iscsit_close_connection() will wake up all the threads and will wait for the session's refcount to reach zero; when this happens, iscsit_close_connection() will destroy the session structure because no one is referencing it anymore. INFO: task targetcli:5971 blocked for more than 120 seconds. Tainted: P OE 4.15.0-72-generic #81~16.04.1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. targetcli D 0 5971 1 0x00000080 Call Trace: __schedule+0x3d6/0x8b0 ? vprintk_func+0x44/0xe0 schedule+0x36/0x80 schedule_timeout+0x1db/0x370 ? __dynamic_pr_debug+0x8a/0xb0 wait_for_completion+0xb4/0x140 ? wake_up_q+0x70/0x70 iscsit_free_session+0x13d/0x1a0 [iscsi_target_mod] iscsit_release_sessions_for_tpg+0x16b/0x1e0 [iscsi_target_mod] iscsit_tpg_disable_portal_group+0xca/0x1c0 [iscsi_target_mod] lio_target_tpg_enable_store+0x66/0xe0 [iscsi_target_mod] configfs_write_file+0xb9/0x120 __vfs_write+0x1b/0x40 vfs_write+0xb8/0x1b0 SyS_write+0x5c/0xe0 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Link: https://lore.kernel.org/r/20200313170656.9716-3-mlombard@redhat.com Reported-by: Matt Coleman Tested-by: Matt Coleman Tested-by: Rahul Kundu Signed-off-by: Maurizio Lombardi Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- include/target/iscsi/iscsi_target_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h index 6021c3acb6c5..b1814d2762bd 100644 --- a/include/target/iscsi/iscsi_target_core.h +++ b/include/target/iscsi/iscsi_target_core.h @@ -671,7 +671,7 @@ struct iscsi_session { atomic_t session_logout; atomic_t session_reinstatement; atomic_t session_stop_active; - atomic_t sleep_on_sess_wait_comp; + atomic_t session_close; /* connection list */ struct list_head sess_conn_list; struct list_head cr_active_list; -- cgit v1.2.3 From 9729be90f7b18d9ef191039dae34e232e220429c Mon Sep 17 00:00:00 2001 From: Qian Cai Date: Mon, 6 Apr 2020 20:10:25 -0700 Subject: percpu_counter: fix a data race at vm_committed_as [ Upstream commit 7e2345200262e4a6056580f0231cccdaffc825f3 ] "vm_committed_as.count" could be accessed concurrently as reported by KCSAN, BUG: KCSAN: data-race in __vm_enough_memory / percpu_counter_add_batch write to 0xffffffff9451c538 of 8 bytes by task 65879 on cpu 35: percpu_counter_add_batch+0x83/0xd0 percpu_counter_add_batch at lib/percpu_counter.c:91 __vm_enough_memory+0xb9/0x260 dup_mm+0x3a4/0x8f0 copy_process+0x2458/0x3240 _do_fork+0xaa/0x9f0 __do_sys_clone+0x125/0x160 __x64_sys_clone+0x70/0x90 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffffffff9451c538 of 8 bytes by task 66773 on cpu 19: __vm_enough_memory+0x199/0x260 percpu_counter_read_positive at include/linux/percpu_counter.h:81 (inlined by) __vm_enough_memory at mm/util.c:839 mmap_region+0x1b2/0xa10 do_mmap+0x45c/0x700 vm_mmap_pgoff+0xc0/0x130 ksys_mmap_pgoff+0x6e/0x300 __x64_sys_mmap+0x33/0x40 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe The read is outside percpu_counter::lock critical section which results in a data race. Fix it by adding a READ_ONCE() in percpu_counter_read_positive() which could also service as the existing compiler memory barrier. Signed-off-by: Qian Cai Signed-off-by: Andrew Morton Acked-by: Marco Elver Link: http://lkml.kernel.org/r/1582302724-2804-1-git-send-email-cai@lca.pw Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/percpu_counter.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 84a109449610..b6332cb761a4 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -76,9 +76,9 @@ static inline s64 percpu_counter_read(struct percpu_counter *fbc) */ static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc) { - s64 ret = fbc->count; + /* Prevent reloads of fbc->count */ + s64 ret = READ_ONCE(fbc->count); - barrier(); /* Prevent reloads of fbc->count */ if (ret >= 0) return ret; return 0; -- cgit v1.2.3 From 8feaf69773e0953dfe956ddae46610b97ea39b1d Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Mon, 6 Apr 2020 20:09:37 -0700 Subject: compiler.h: fix error in BUILD_BUG_ON() reporting [ Upstream commit af9c5d2e3b355854ff0e4acfbfbfadcd5198a349 ] compiletime_assert() uses __LINE__ to create a unique function name. This means that if you have more than one BUILD_BUG_ON() in the same source line (which can happen if they appear e.g. in a macro), then the error message from the compiler might output the wrong condition. For this source file: #include #define macro() \ BUILD_BUG_ON(1); \ BUILD_BUG_ON(0); void foo() { macro(); } gcc would output: ./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_9' declared with attribute error: BUILD_BUG_ON failed: 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) However, it was not the BUILD_BUG_ON(0) that failed, so it should say 1 instead of 0. With this patch, we use __COUNTER__ instead of __LINE__, so each BUILD_BUG_ON() gets a different function name and the correct condition is printed: ./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_0' declared with attribute error: BUILD_BUG_ON failed: 1 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) Signed-off-by: Vegard Nossum Signed-off-by: Andrew Morton Reviewed-by: Masahiro Yamada Reviewed-by: Daniel Santos Cc: Rasmus Villemoes Cc: Ian Abbott Cc: Joe Perches Link: http://lkml.kernel.org/r/20200331112637.25047-1-vegard.nossum@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- include/linux/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 0020ee1cab37..7837afabbd78 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -546,7 +546,7 @@ unsigned long read_word_at_a_time(const void *addr) * compiler has support to do so. */ #define compiletime_assert(condition, msg) \ - _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) + _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) #define compiletime_assert_atomic_type(t) \ compiletime_assert(__native_word(t), \ -- cgit v1.2.3