From 223f5f79f2ce8facd9d77dd44e9f403343630bfc Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Fri, 23 Jun 2023 18:45:59 -0700 Subject: bpf, net: Check skb ownership against full socket. Check skb ownership of an skb against full sockets instead of request_sock. The filters were called only if an skb is owned by the sock that the skb is sent out through. In another words, skb->sk should point to the sock that it is sending through its egress. However, the filters would miss SYN/ACK skbs that they are owned by a request_sock but sent through the listener sock, that is the socket listening incoming connections. However, the listener socket is also the full socket of the request socket. We should use the full socket as the owner socket of an skb instead. What is the ownership check for? ================================ BPF_CGROUP_RUN_PROG_INET_EGRESS() checked sk == skb->sk to ensure the ownership of an skb. Alexei referred to a mailing list conversation [0] that took place a few years ago. In that conversation, Daniel Borkmann stated that: Wouldn't that mean however, when you go through stacked devices that you'd run the same eBPF cgroup program for skb->sk multiple times? According to what Daniel said, the ownership check mentioned earlier presumably prevents multiple calls of egress filters caused by an skb. A test that reproduce this scenario shows that the BPF cgroup egress programs can be called multiple times for one skb if this ownership check is not there. So, we can not just remove this check. Test Stacked Devices ==================== We use L2TP to build an environment of stacked devices. L2TP (Layer 2 Tunneling Protocol) is a tunneling protocol used to support virtual private networks (VPNs). It relays encapsulated packets; for example in UDP, to its peer by using a socket. Using L2TP, packets are first sent through the IP stack and should then arrive at an L2TP device. The device will expand its skb header to encapsulate the packet. The skb will be sent back to the IP stack using the socket that was made for the L2TP session. After that, the routing process will occur once more, but this time for a new destination. We changed tools/testing/selftests/net/l2tp.sh to set up a test environment using L2TP. The run_ping() function in l2tp.sh is where the main change occurred. run_ping() { local desc="$1" sleep 10 run_cmd host-1 ${ping6} -s 227 -c 4 -i 10 -I fc00:101::1 fc00:101::2 log_test $? 0 "IPv6 route through L2TP tunnel ${desc}" sleep 10 } The test will use L2TP devices to send PING messages. These messages will have a message size of 227 bytes as a special label to distinguish them. This is not an ideal solution, but works. During the execution of the test script, bpftrace was attached to ip6_finish_output() and l2tp_xmit_skb(): bpftrace -e ' kfunc:ip6_finish_output { time("%H:%M:%S: "); printf("ip6_finish_output skb=%p skb->len=%d cgroup=%p sk=%p skb->sk=%p\n", args->skb, args->skb->len, args->sk->sk_cgrp_data.cgroup, args->sk, args->skb->sk); } kfunc:l2tp_xmit_skb { time("%H:%M:%S: "); printf("l2tp_xmit_skb skb=%p sk=%p\n", args->skb, args->session->tunnel->sock); }' The following is part of the output messages printed by bpftrace: 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=275 cgroup=0xffff88810741f800 sk=0xffff888105f3b900 skb->sk=0xffff888105f3b900 16:35:20: l2tp_xmit_skb skb=0xffff888103d8e600 sk=0xffff888103dd6300 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337 cgroup=0xffff88810741f800 sk=0xffff888103dd6300 skb->sk=0xffff888105f3b900 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337 cgroup=(nil) sk=(nil) skb->sk=(nil) 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=275 cgroup=0xffffffff837741d0 sk=0xffff888101fe0000 skb->sk=0xffff888101fe0000 16:35:20: l2tp_xmit_skb skb=0xffff888103d8e000 sk=0xffff888103483180 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337 cgroup=0xffff88810741f800 sk=0xffff888103483180 skb->sk=0xffff888101fe0000 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337 cgroup=(nil) sk=(nil) skb->sk=(nil) The first four entries describe a PING message that was sent using the ping command, whereas the following four entries describe the response received. Multiple sockets are used to send one skb, including the socket used by the L2TP session. This can be observed. Based on this information, it seems that the ownership check is designed to avoid multiple calls of egress filters caused by a single skb. [0] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/ Signed-off-by: Kui-Feng Lee Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230624014600.576756-2-kuifeng@meta.com --- include/linux/bpf-cgroup.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 57e9e109257e..8506690dbb9c 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -199,9 +199,9 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb) \ ({ \ int __ret = 0; \ - if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == skb->sk) { \ + if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) { \ typeof(sk) __sk = sk_to_full_sk(sk); \ - if (sk_fullsock(__sk) && \ + if (sk_fullsock(__sk) && __sk == skb_to_full_sk(skb) && \ cgroup_bpf_sock_enabled(__sk, CGROUP_INET_EGRESS)) \ __ret = __cgroup_bpf_run_filter_skb(__sk, skb, \ CGROUP_INET_EGRESS); \ -- cgit v1.2.3 From 25954730461af01f66afa9e17036b051986b007e Mon Sep 17 00:00:00 2001 From: Anton Protopopov Date: Thu, 6 Jul 2023 13:39:28 +0000 Subject: bpf: add percpu stats for bpf_map elements insertions/deletions Add a generic percpu stats for bpf_map elements insertions/deletions in order to keep track of both, the current (approximate) number of elements in a map and per-cpu statistics on update/delete operations. To expose these stats a particular map implementation should initialize the counter and adjust it as needed using the 'bpf_map_*_elem_count' helpers provided by this commit. Signed-off-by: Anton Protopopov Link: https://lore.kernel.org/r/20230706133932.45883-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f58895830ada..360433f14496 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -275,6 +275,7 @@ struct bpf_map { } owner; bool bypass_spec_v1; bool frozen; /* write-once; write-protected by freeze_mutex */ + s64 __percpu *elem_count; }; static inline const char *btf_field_type_name(enum btf_field_type type) @@ -2040,6 +2041,35 @@ bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align, } #endif +static inline int +bpf_map_init_elem_count(struct bpf_map *map) +{ + size_t size = sizeof(*map->elem_count), align = size; + gfp_t flags = GFP_USER | __GFP_NOWARN; + + map->elem_count = bpf_map_alloc_percpu(map, size, align, flags); + if (!map->elem_count) + return -ENOMEM; + + return 0; +} + +static inline void +bpf_map_free_elem_count(struct bpf_map *map) +{ + free_percpu(map->elem_count); +} + +static inline void bpf_map_inc_elem_count(struct bpf_map *map) +{ + this_cpu_inc(*map->elem_count); +} + +static inline void bpf_map_dec_elem_count(struct bpf_map *map) +{ + this_cpu_dec(*map->elem_count); +} + extern int sysctl_unprivileged_bpf_disabled; static inline bool bpf_allow_ptr_leaks(void) -- cgit v1.2.3 From 1680ac7a5ad578d7acf819912557ceea4b4a5a88 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Fri, 7 Jul 2023 16:18:29 -0700 Subject: Input: gameport - use IS_REACHABLE() instead of open-coding it Replace an open-coded preprocessor conditional with an equivalent helper. Reviewed-by: Randy Dunlap Link: https://lore.kernel.org/r/ZKYLLmsdCH0Gp7TO@google.com Signed-off-by: Dmitry Torokhov --- include/linux/gameport.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gameport.h b/include/linux/gameport.h index 0a221e768ea4..07e370113b2b 100644 --- a/include/linux/gameport.h +++ b/include/linux/gameport.h @@ -63,7 +63,7 @@ struct gameport_driver { int gameport_open(struct gameport *gameport, struct gameport_driver *drv, int mode); void gameport_close(struct gameport *gameport); -#if defined(CONFIG_GAMEPORT) || (defined(MODULE) && defined(CONFIG_GAMEPORT_MODULE)) +#if IS_REACHABLE(CONFIG_GAMEPORT) void __gameport_register_port(struct gameport *gameport, struct module *owner); /* use a define to avoid include chaining to get THIS_MODULE */ -- cgit v1.2.3 From f97fa3dcb2db02013e6904c032a1d2d45707ee40 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 3 Jul 2023 16:52:08 +0300 Subject: lib/math: Move dvb_math.c into lib/math/int_log.c Some existing and new users may benefit from the intlog2() and intlog10() APIs, make them wide available. Reviewed-by: Mauro Carvalho Chehab Acked-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/20230619172019.21457-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Reviewed-by: Randy Dunlap Link: https://lore.kernel.org/r/20230703135211.87416-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/int_log.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 include/linux/int_log.h (limited to 'include/linux') diff --git a/include/linux/int_log.h b/include/linux/int_log.h new file mode 100644 index 000000000000..332306202464 --- /dev/null +++ b/include/linux/int_log.h @@ -0,0 +1,65 @@ +/* + * Provides fixed-point logarithm operations. + * + * Copyright (C) 2006 Christoph Pfister (christophpfister@gmail.com) + * + * This library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as + * published by the Free Software Foundation; either version 2.1 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#ifndef __LINUX_INT_LOG_H +#define __LINUX_INT_LOG_H + +#include + +/** + * intlog2 - computes log2 of a value; the result is shifted left by 24 bits + * + * @value: The value (must be != 0) + * + * to use rational values you can use the following method: + * + * intlog2(value) = intlog2(value * 2^x) - x * 2^24 + * + * Some usecase examples: + * + * intlog2(8) will give 3 << 24 = 3 * 2^24 + * + * intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24 + * + * intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24 + * + * + * return: log2(value) * 2^24 + */ +extern unsigned int intlog2(u32 value); + +/** + * intlog10 - computes log10 of a value; the result is shifted left by 24 bits + * + * @value: The value (must be != 0) + * + * to use rational values you can use the following method: + * + * intlog10(value) = intlog10(value * 10^x) - x * 2^24 + * + * An usecase example: + * + * intlog10(1000) will give 3 << 24 = 3 * 2^24 + * + * due to the implementation intlog10(1000) might be not exactly 3 * 2^24 + * + * look at intlog2 for similar examples + * + * return: log10(value) * 2^24 + */ +extern unsigned int intlog10(u32 value); + +#endif -- cgit v1.2.3 From 9ab04d7ed8bdd395b0617a1647dd475681f99151 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 3 Jul 2023 16:52:10 +0300 Subject: lib/math/int_log: Replace LGPL-2.1-or-later boilerplate with SPDX identifier Replace license boilerplate in udftime.c with SPDX identifier for LGPL-2.1-or-later. Acked-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/20230619172019.21457-4-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230703135211.87416-4-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/int_log.h | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/int_log.h b/include/linux/int_log.h index 332306202464..0a6f58c38b61 100644 --- a/include/linux/int_log.h +++ b/include/linux/int_log.h @@ -1,17 +1,8 @@ +/* SPDX-License-Identifier: LGPL-2.1-or-later */ /* * Provides fixed-point logarithm operations. * * Copyright (C) 2006 Christoph Pfister (christophpfister@gmail.com) - * - * This library is free software; you can redistribute it and/or modify - * it under the terms of the GNU Lesser General Public License as - * published by the Free Software Foundation; either version 2.1 of - * the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. */ #ifndef __LINUX_INT_LOG_H -- cgit v1.2.3 From 1e1b4fbd6d0f8c54af14dcf18bd3136816153b12 Mon Sep 17 00:00:00 2001 From: Herve Codina Date: Fri, 23 Jun 2023 10:58:21 +0200 Subject: iio: consumer.h: Fix raw values documentation notes The raw values notes mention 'ADC counts' and are not fully accurate. Reword the notes in order to remove the 'ADC counts' and describe the conversion needed between a raw value and a value in the standard units. Signed-off-by: Herve Codina Acked-by: Jonathan Cameron Reviewed-by: Andy Shevchenko Reviewed-by: Christophe Leroy Link: https://lore.kernel.org/r/20230623085830.749991-5-herve.codina@bootlin.com Signed-off-by: Mark Brown --- include/linux/iio/consumer.h | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/consumer.h b/include/linux/iio/consumer.h index 6802596b017c..f536820b9cf2 100644 --- a/include/linux/iio/consumer.h +++ b/include/linux/iio/consumer.h @@ -201,8 +201,9 @@ struct iio_dev * @chan: The channel being queried. * @val: Value read back. * - * Note raw reads from iio channels are in adc counts and hence - * scale will need to be applied if standard units required. + * Note, if standard units are required, raw reads from iio channels + * need the offset (default 0) and scale (default 1) to be applied + * as (raw + offset) * scale. */ int iio_read_channel_raw(struct iio_channel *chan, int *val); @@ -212,8 +213,9 @@ int iio_read_channel_raw(struct iio_channel *chan, * @chan: The channel being queried. * @val: Value read back. * - * Note raw reads from iio channels are in adc counts and hence - * scale will need to be applied if standard units required. + * Note, if standard units are required, raw reads from iio channels + * need the offset (default 0) and scale (default 1) to be applied + * as (raw + offset) * scale. * * In opposit to the normal iio_read_channel_raw this function * returns the average of multiple reads. @@ -281,8 +283,9 @@ int iio_read_channel_attribute(struct iio_channel *chan, int *val, * @chan: The channel being queried. * @val: Value being written. * - * Note raw writes to iio channels are in dac counts and hence - * scale will need to be applied if standard units required. + * Note that for raw writes to iio channels, if the value provided is + * in standard units, the affect of the scale and offset must be removed + * as (value / scale) - offset. */ int iio_write_channel_raw(struct iio_channel *chan, int val); @@ -292,8 +295,9 @@ int iio_write_channel_raw(struct iio_channel *chan, int val); * @chan: The channel being queried. * @val: Value read back. * - * Note raw reads from iio channels are in adc counts and hence - * scale will need to be applied if standard units are required. + * Note, if standard units are required, raw reads from iio channels + * need the offset (default 0) and scale (default 1) to be applied + * as (raw + offset) * scale. */ int iio_read_max_channel_raw(struct iio_channel *chan, int *val); @@ -308,8 +312,9 @@ int iio_read_max_channel_raw(struct iio_channel *chan, int *val); * For ranges, three vals are always returned; min, step and max. * For lists, all the possible values are enumerated. * - * Note raw available values from iio channels are in adc counts and - * hence scale will need to be applied if standard units are required. + * Note, if standard units are required, raw available values from iio + * channels need the offset (default 0) and scale (default 1) to be applied + * as (raw + offset) * scale. */ int iio_read_avail_channel_raw(struct iio_channel *chan, const int **vals, int *length); -- cgit v1.2.3 From c952c748c7a983a8bda9112984e6f2c1f6e441a5 Mon Sep 17 00:00:00 2001 From: Herve Codina Date: Fri, 23 Jun 2023 10:58:24 +0200 Subject: minmax: Introduce {min,max}_array() Introduce min_array() (resp max_array()) in order to get the minimal (resp maximum) of values present in an array. Signed-off-by: Herve Codina Reviewed-by: Andy Shevchenko Reviewed-by: Christophe Leroy Link: https://lore.kernel.org/r/20230623085830.749991-8-herve.codina@bootlin.com Signed-off-by: Mark Brown --- include/linux/minmax.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) (limited to 'include/linux') diff --git a/include/linux/minmax.h b/include/linux/minmax.h index 396df1121bff..798c6963909f 100644 --- a/include/linux/minmax.h +++ b/include/linux/minmax.h @@ -133,6 +133,70 @@ */ #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >) +/* + * Remove a const qualifier from integer types + * _Generic(foo, type-name: association, ..., default: association) performs a + * comparison against the foo type (not the qualified type). + * Do not use the const keyword in the type-name as it will not match the + * unqualified type of foo. + */ +#define __unconst_integer_type_cases(type) \ + unsigned type: (unsigned type)0, \ + signed type: (signed type)0 + +#define __unconst_integer_typeof(x) typeof( \ + _Generic((x), \ + char: (char)0, \ + __unconst_integer_type_cases(char), \ + __unconst_integer_type_cases(short), \ + __unconst_integer_type_cases(int), \ + __unconst_integer_type_cases(long), \ + __unconst_integer_type_cases(long long), \ + default: (x))) + +/* + * Do not check the array parameter using __must_be_array(). + * In the following legit use-case where the "array" passed is a simple pointer, + * __must_be_array() will return a failure. + * --- 8< --- + * int *buff + * ... + * min = min_array(buff, nb_items); + * --- 8< --- + * + * The first typeof(&(array)[0]) is needed in order to support arrays of both + * 'int *buff' and 'int buff[N]' types. + * + * The array can be an array of const items. + * typeof() keeps the const qualifier. Use __unconst_integer_typeof() in order + * to discard the const qualifier for the __element variable. + */ +#define __minmax_array(op, array, len) ({ \ + typeof(&(array)[0]) __array = (array); \ + typeof(len) __len = (len); \ + __unconst_integer_typeof(__array[0]) __element = __array[--__len]; \ + while (__len--) \ + __element = op(__element, __array[__len]); \ + __element; }) + +/** + * min_array - return minimum of values present in an array + * @array: array + * @len: array length + * + * Note that @len must not be zero (empty array). + */ +#define min_array(array, len) __minmax_array(min, array, len) + +/** + * max_array - return maximum of values present in an array + * @array: array + * @len: array length + * + * Note that @len must not be zero (empty array). + */ +#define max_array(array, len) __minmax_array(max, array, len) + /** * clamp_t - return a value clamped to a given range using a given type * @type: the type of variable to use -- cgit v1.2.3 From 7560418078b939e1e83f7dce502ec3c1ca8c152f Mon Sep 17 00:00:00 2001 From: Herve Codina Date: Fri, 23 Jun 2023 10:58:27 +0200 Subject: iio: inkern: Add a helper to query an available minimum raw value A helper, iio_read_max_channel_raw() exists to read the available maximum raw value of a channel but nothing similar exists to read the available minimum raw value. This new helper, iio_read_min_channel_raw(), fills the hole and can be used for reading the available minimum raw value of a channel. It is fully based on the existing iio_read_max_channel_raw(). Signed-off-by: Herve Codina Reviewed-by: Andy Shevchenko Acked-by: Jonathan Cameron Reviewed-by: Christophe Leroy Link: https://lore.kernel.org/r/20230623085830.749991-11-herve.codina@bootlin.com Signed-off-by: Mark Brown --- include/linux/iio/consumer.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/consumer.h b/include/linux/iio/consumer.h index f536820b9cf2..e9910b41d48e 100644 --- a/include/linux/iio/consumer.h +++ b/include/linux/iio/consumer.h @@ -301,6 +301,18 @@ int iio_write_channel_raw(struct iio_channel *chan, int val); */ int iio_read_max_channel_raw(struct iio_channel *chan, int *val); +/** + * iio_read_min_channel_raw() - read minimum available raw value from a given + * channel, i.e. the minimum possible value. + * @chan: The channel being queried. + * @val: Value read back. + * + * Note, if standard units are required, raw reads from iio channels + * need the offset (default 0) and scale (default 1) to be applied + * as (raw + offset) * scale. + */ +int iio_read_min_channel_raw(struct iio_channel *chan, int *val); + /** * iio_read_avail_channel_raw() - read available raw values from a given channel * @chan: The channel being queried. -- cgit v1.2.3 From 9b6304c1d53745c300b86f202d0dcff395e2d2db Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 5 Jul 2023 14:58:10 -0400 Subject: fs: add ctime accessors infrastructure struct timespec64 has unused bits in the tv_nsec field that can be used for other purposes. In future patches, we're going to change how the inode->i_ctime is accessed in certain inodes in order to make use of them. In order to do that safely though, we'll need to eradicate raw accesses of the inode->i_ctime field from the kernel. Add new accessor functions for the ctime that we use to replace them. Reviewed-by: Jan Kara Reviewed-by: Luis Chamberlain Signed-off-by: Jeff Layton Reviewed-by: Damien Le Moal Message-Id: <20230705185812.579118-2-jlayton@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..d41bfcb26da0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1474,7 +1474,50 @@ static inline bool fsuidgid_has_mapping(struct super_block *sb, kgid_has_mapping(fs_userns, kgid); } -extern struct timespec64 current_time(struct inode *inode); +struct timespec64 current_time(struct inode *inode); +struct timespec64 inode_set_ctime_current(struct inode *inode); + +/** + * inode_get_ctime - fetch the current ctime from the inode + * @inode: inode from which to fetch ctime + * + * Grab the current ctime from the inode and return it. + */ +static inline struct timespec64 inode_get_ctime(const struct inode *inode) +{ + return inode->i_ctime; +} + +/** + * inode_set_ctime_to_ts - set the ctime in the inode + * @inode: inode in which to set the ctime + * @ts: value to set in the ctime field + * + * Set the ctime in @inode to @ts + */ +static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode, + struct timespec64 ts) +{ + inode->i_ctime = ts; + return ts; +} + +/** + * inode_set_ctime - set the ctime in the inode + * @inode: inode in which to set the ctime + * @sec: tv_sec value to set + * @nsec: tv_nsec value to set + * + * Set the ctime in @inode to { @sec, @nsec } + */ +static inline struct timespec64 inode_set_ctime(struct inode *inode, + time64_t sec, long nsec) +{ + struct timespec64 ts = { .tv_sec = sec, + .tv_nsec = nsec }; + + return inode_set_ctime_to_ts(inode, ts); +} /* * Snapshotting support. -- cgit v1.2.3 From 0c4767923ed6964d279309744cdb248890e95ec2 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 5 Jul 2023 14:58:11 -0400 Subject: fs: new helper: simple_rename_timestamp A rename potentially involves updating 4 different inode timestamps. Add a function that handles the details sanely, and convert the libfs.c callers to use it. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Message-Id: <20230705185812.579118-3-jlayton@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index d41bfcb26da0..42755cb7d55b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2977,6 +2977,8 @@ extern int simple_open(struct inode *inode, struct file *file); extern int simple_link(struct dentry *, struct inode *, struct dentry *); extern int simple_unlink(struct inode *, struct dentry *); extern int simple_rmdir(struct inode *, struct dentry *); +void simple_rename_timestamp(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry); extern int simple_rename_exchange(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); extern int simple_rename(struct mnt_idmap *, struct inode *, -- cgit v1.2.3 From bccb5c397fbfe348e22c045f5e09c35780edb817 Mon Sep 17 00:00:00 2001 From: Luca Vizzarro Date: Thu, 2 Feb 2023 11:58:31 +0000 Subject: fcntl: Cast commands with int args explicitly According to the fcntl API specification commands that expect an integer, hence not a pointer, always take an int and not long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro Cc: Christian Brauner Cc: Jeff Layton Cc: Chuck Lever Cc: Kevin Brodsky Cc: Vincenzo Frascino Cc: Szabolcs Nagy Cc: "Theodore Ts'o" Cc: David Laight Cc: Mark Rutland Cc: linux-fsdevel@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro Message-Id: <20230414152459.816046-2-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..8e546433c781 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1069,7 +1069,7 @@ extern void fasync_free(struct fasync_struct *); extern void kill_fasync(struct fasync_struct **, int, int); extern void __f_setown(struct file *filp, struct pid *, enum pid_type, int force); -extern int f_setown(struct file *filp, unsigned long arg, int force); +extern int f_setown(struct file *filp, int who, int force); extern void f_delown(struct file *filp); extern pid_t f_getown(struct file *filp); extern int send_sigurg(struct fown_struct *fown); -- cgit v1.2.3 From ed5f17f66ef39273dcf83ca89b2f1d52a52b22a5 Mon Sep 17 00:00:00 2001 From: Luca Vizzarro Date: Wed, 1 Feb 2023 15:05:33 +0000 Subject: fs: Pass argument to fcntl_setlease as int The interface for fcntl expects the argument passed for the command F_SETLEASE to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro Cc: Christian Brauner Cc: Jeff Layton Cc: Chuck Lever Cc: Trond Myklebust Cc: Anna Schumaker Cc: Kevin Brodsky Cc: Vincenzo Frascino Cc: Szabolcs Nagy Cc: "Theodore Ts'o" Cc: David Laight Cc: Mark Rutland Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro Message-Id: <20230414152459.816046-3-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner --- include/linux/filelock.h | 12 ++++++------ include/linux/fs.h | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filelock.h b/include/linux/filelock.h index efcdd1631d9b..95e868e09e29 100644 --- a/include/linux/filelock.h +++ b/include/linux/filelock.h @@ -144,7 +144,7 @@ int fcntl_setlk64(unsigned int, struct file *, unsigned int, struct flock64 *); #endif -int fcntl_setlease(unsigned int fd, struct file *filp, long arg); +int fcntl_setlease(unsigned int fd, struct file *filp, int arg); int fcntl_getlease(struct file *filp); /* fs/locks.c */ @@ -167,8 +167,8 @@ bool vfs_inode_has_locks(struct inode *inode); int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl); int __break_lease(struct inode *inode, unsigned int flags, unsigned int type); void lease_get_mtime(struct inode *, struct timespec64 *time); -int generic_setlease(struct file *, long, struct file_lock **, void **priv); -int vfs_setlease(struct file *, long, struct file_lock **, void **); +int generic_setlease(struct file *, int, struct file_lock **, void **priv); +int vfs_setlease(struct file *, int, struct file_lock **, void **); int lease_modify(struct file_lock *, int, struct list_head *); struct notifier_block; @@ -213,7 +213,7 @@ static inline int fcntl_setlk64(unsigned int fd, struct file *file, return -EACCES; } #endif -static inline int fcntl_setlease(unsigned int fd, struct file *filp, long arg) +static inline int fcntl_setlease(unsigned int fd, struct file *filp, int arg) { return -EINVAL; } @@ -306,13 +306,13 @@ static inline void lease_get_mtime(struct inode *inode, return; } -static inline int generic_setlease(struct file *filp, long arg, +static inline int generic_setlease(struct file *filp, int arg, struct file_lock **flp, void **priv) { return -EINVAL; } -static inline int vfs_setlease(struct file *filp, long arg, +static inline int vfs_setlease(struct file *filp, int arg, struct file_lock **lease, void **priv) { return -EINVAL; diff --git a/include/linux/fs.h b/include/linux/fs.h index 8e546433c781..11055d00fab8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1799,7 +1799,7 @@ struct file_operations { ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); void (*splice_eof)(struct file *file); - int (*setlease)(struct file *, long, struct file_lock **, void **); + int (*setlease)(struct file *, int, struct file_lock **, void **); long (*fallocate)(struct file *file, int mode, loff_t offset, loff_t len); void (*show_fdinfo)(struct seq_file *m, struct file *f); @@ -2950,7 +2950,7 @@ extern int simple_write_begin(struct file *file, struct address_space *mapping, extern const struct address_space_operations ram_aops; extern int always_delete_dentry(const struct dentry *); extern struct inode *alloc_anon_inode(struct super_block *); -extern int simple_nosetlease(struct file *, long, struct file_lock **, void **); +extern int simple_nosetlease(struct file *, int, struct file_lock **, void **); extern const struct dentry_operations simple_dentry_operations; extern struct dentry *simple_lookup(struct inode *, struct dentry *, unsigned int flags); -- cgit v1.2.3 From 515c5046650d8f56f93546d3867b64c1906162bc Mon Sep 17 00:00:00 2001 From: Luca Vizzarro Date: Wed, 1 Feb 2023 16:18:53 +0000 Subject: pipe: Pass argument of pipe_fcntl as int The interface for fcntl expects the argument passed for the command F_SETPIPE_SZ to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro Cc: Christian Brauner Cc: Jeff Layton Cc: Chuck Lever Cc: Kevin Brodsky Cc: Vincenzo Frascino Cc: Szabolcs Nagy Cc: "Theodore Ts'o" Cc: David Laight Cc: Mark Rutland Cc: linux-fsdevel@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro Message-Id: <20230414152459.816046-4-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner --- include/linux/pipe_fs_i.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 02e0086b10f6..608a9eb86bff 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -269,10 +269,10 @@ bool pipe_is_unprivileged_user(void); /* for F_SETPIPE_SZ and F_GETPIPE_SZ */ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots); -long pipe_fcntl(struct file *, unsigned int, unsigned long arg); +long pipe_fcntl(struct file *, unsigned int, unsigned int arg); struct pipe_inode_info *get_pipe_info(struct file *file, bool for_splice); int create_pipe_files(struct file **, int); -unsigned int round_pipe_size(unsigned long size); +unsigned int round_pipe_size(unsigned int size); #endif -- cgit v1.2.3 From f4ae4081e5a871b4c4e2f802ccf73f8d69566073 Mon Sep 17 00:00:00 2001 From: Luca Vizzarro Date: Wed, 15 Feb 2023 15:50:05 +0000 Subject: dnotify: Pass argument of fcntl_dirnotify as int The interface for fcntl expects the argument passed for the command F_DIRNOTIFY to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Jan Kara Cc: Amir Goldstein Cc: Alexander Viro Cc: Christian Brauner Cc: Jeff Layton Cc: Chuck Lever Cc: Kevin Brodsky Cc: Vincenzo Frascino Cc: Szabolcs Nagy Cc: "Theodore Ts'o" Cc: David Laight Cc: Mark Rutland Cc: linux-fsdevel@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Acked-by: Jan Kara Signed-off-by: Luca Vizzarro Message-Id: <20230414152459.816046-6-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner --- include/linux/dnotify.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h index b1d26f9f1c9f..9f183a679277 100644 --- a/include/linux/dnotify.h +++ b/include/linux/dnotify.h @@ -30,7 +30,7 @@ struct dnotify_struct { FS_MOVED_FROM | FS_MOVED_TO) extern void dnotify_flush(struct file *, fl_owner_t); -extern int fcntl_dirnotify(int, struct file *, unsigned long); +extern int fcntl_dirnotify(int, struct file *, unsigned int); #else @@ -38,7 +38,7 @@ static inline void dnotify_flush(struct file *filp, fl_owner_t id) { } -static inline int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) +static inline int fcntl_dirnotify(int fd, struct file *filp, unsigned int arg) { return -EINVAL; } -- cgit v1.2.3 From edf6a864c996f9a9f5299a3b3e574a37e64000c5 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 10 Jul 2023 18:49:24 +0300 Subject: spi: Sort headers alphabetically Sorting headers alphabetically helps locating duplicates, and make it easier to figure out where to insert new headers. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230710154932.68377-8-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 32c94eae8926..21b77bdfac29 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -6,18 +6,18 @@ #ifndef __LINUX_SPI_H #define __LINUX_SPI_H +#include #include +#include #include -#include -#include +#include #include -#include +#include #include -#include +#include +#include #include -#include -#include struct dma_chan; struct software_node; -- cgit v1.2.3 From 6bcdfd2cac5559c680aef8dd4c5facada55ab623 Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Sat, 10 Jun 2023 09:57:35 +0200 Subject: security: Allow all LSMs to provide xattrs for inode_init_security hook Currently, the LSM infrastructure supports only one LSM providing an xattr and EVM calculating the HMAC on that xattr, plus other inode metadata. Allow all LSMs to provide one or multiple xattrs, by extending the security blob reservation mechanism. Introduce the new lbs_xattr_count field of the lsm_blob_sizes structure, so that each LSM can specify how many xattrs it needs, and the LSM infrastructure knows how many xattr slots it should allocate. Modify the inode_init_security hook definition, by passing the full xattr array allocated in security_inode_init_security(), and the current number of xattr slots in that array filled by LSMs. The first parameter would allow EVM to access and calculate the HMAC on xattrs supplied by other LSMs, the second to not leave gaps in the xattr array, when an LSM requested but did not provide xattrs (e.g. if it is not initialized). Introduce lsm_get_xattr_slot(), which LSMs can call as many times as the number specified in the lbs_xattr_count field of the lsm_blob_sizes structure. During each call, lsm_get_xattr_slot() increments the number of filled xattrs, so that at the next invocation it returns the next xattr slot to fill. Cleanup security_inode_init_security(). Unify the !initxattrs and initxattrs case by simply not allocating the new_xattrs array in the former. Update the documentation to reflect the changes, and fix the description of the xattr name, as it is not allocated anymore. Adapt both SELinux and Smack to use the new definition of the inode_init_security hook, and to call lsm_get_xattr_slot() to obtain and fill the reserved slots in the xattr array. Move the xattr->name assignment after the xattr->value one, so that it is done only in case of successful memory allocation. Finally, change the default return value of the inode_init_security hook from zero to -EOPNOTSUPP, so that BPF LSM correctly follows the hook conventions. Reported-by: Nicolas Bouchinet Link: https://lore.kernel.org/linux-integrity/Y1FTSIo+1x+4X0LS@archlinux/ Signed-off-by: Roberto Sassu Acked-by: Casey Schaufler [PM: minor comment and variable tweaks, approved by RS] Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 6 +++--- include/linux/lsm_hooks.h | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 7308a1a7599b..920d8f16fa99 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -111,9 +111,9 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, unsigned int obj_type) LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) -LSM_HOOK(int, 0, inode_init_security, struct inode *inode, - struct inode *dir, const struct qstr *qstr, const char **name, - void **value, size_t *len) +LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode, + struct inode *dir, const struct qstr *qstr, struct xattr *xattrs, + int *xattr_count) LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode, const struct qstr *name, const struct inode *context_inode) LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry, diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index ab2b2fafa4a4..dcb5e5b5eb13 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -28,6 +28,7 @@ #include #include #include +#include union security_list_options { #define LSM_HOOK(RET, DEFAULT, NAME, ...) RET (*NAME)(__VA_ARGS__); @@ -63,8 +64,27 @@ struct lsm_blob_sizes { int lbs_ipc; int lbs_msg_msg; int lbs_task; + int lbs_xattr_count; /* number of xattr slots in new_xattrs array */ }; +/** + * lsm_get_xattr_slot - Return the next available slot and increment the index + * @xattrs: array storing LSM-provided xattrs + * @xattr_count: number of already stored xattrs (updated) + * + * Retrieve the first available slot in the @xattrs array to fill with an xattr, + * and increment @xattr_count. + * + * Return: The slot to fill in @xattrs if non-NULL, NULL otherwise. + */ +static inline struct xattr *lsm_get_xattr_slot(struct xattr *xattrs, + int *xattr_count) +{ + if (unlikely(!xattrs)) + return NULL; + return &xattrs[(*xattr_count)++]; +} + /* * LSM_RET_VOID is used as the default value in LSM_HOOK definitions for void * LSM hooks (in include/linux/lsm_hook_defs.h). -- cgit v1.2.3 From 6db7d1dee8003921b353d7e613471fe8995f46b5 Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Sat, 10 Jun 2023 09:57:37 +0200 Subject: evm: Align evm_inode_init_security() definition with LSM infrastructure Change the evm_inode_init_security() definition to align with the LSM infrastructure. Keep the existing behavior of including in the HMAC calculation only the first xattr provided by LSMs. Changing the evm_inode_init_security() definition requires passing the xattr array allocated by security_inode_init_security(), and the number of xattrs filled by previously invoked LSMs. Use the newly introduced lsm_get_xattr_slot() to position EVM correctly in the xattrs array, like a regular LSM, and to increment the number of filled slots. For now, the LSM infrastructure allocates enough xattrs slots to store the EVM xattr, without using the reservation mechanism. Signed-off-by: Roberto Sassu Reviewed-by: Mimi Zohar Acked-by: Mimi Zohar Signed-off-by: Paul Moore --- include/linux/evm.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/evm.h b/include/linux/evm.h index 7dc1ee74169f..01fc495a83e2 100644 --- a/include/linux/evm.h +++ b/include/linux/evm.h @@ -56,9 +56,10 @@ static inline void evm_inode_post_set_acl(struct dentry *dentry, { return evm_inode_post_setxattr(dentry, acl_name, NULL, 0); } -extern int evm_inode_init_security(struct inode *inode, - const struct xattr *xattr_array, - struct xattr *evm); + +int evm_inode_init_security(struct inode *inode, struct inode *dir, + const struct qstr *qstr, struct xattr *xattrs, + int *xattr_count); extern bool evm_revalidate_status(const char *xattr_name); extern int evm_protected_xattr_if_enabled(const char *req_xattr_name); extern int evm_read_protected_xattrs(struct dentry *dentry, u8 *buffer, @@ -157,9 +158,10 @@ static inline void evm_inode_post_set_acl(struct dentry *dentry, return; } -static inline int evm_inode_init_security(struct inode *inode, - const struct xattr *xattr_array, - struct xattr *evm) +static inline int evm_inode_init_security(struct inode *inode, struct inode *dir, + const struct qstr *qstr, + struct xattr *xattrs, + int *xattr_count) { return 0; } -- cgit v1.2.3 From c05780ef3c190c2dafbf0be8e65d4f01103ad577 Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 7 Jul 2023 09:00:51 -0700 Subject: module: Ignore RISC-V mapping symbols too RISC-V has an extended form of mapping symbols that we use to encode the ISA when it changes in the middle of an ELF. This trips up modpost as a build failure, I haven't yet verified it yet but I believe the kallsyms difference should result in stacks looking sane again. Reported-by: Randy Dunlap Closes: https://lore.kernel.org/all/9d9e2902-5489-4bf0-d9cb-556c8e5d71c2@infradead.org/ Signed-off-by: Palmer Dabbelt Reviewed-by: Randy Dunlap Tested-by: Randy Dunlap # build-tested Signed-off-by: Luis Chamberlain --- include/linux/module_symbol.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/module_symbol.h b/include/linux/module_symbol.h index 7ace7ba30203..5b799942b243 100644 --- a/include/linux/module_symbol.h +++ b/include/linux/module_symbol.h @@ -3,12 +3,22 @@ #define _LINUX_MODULE_SYMBOL_H /* This ignores the intensely annoying "mapping symbols" found in ELF files. */ -static inline int is_mapping_symbol(const char *str) +static inline int is_mapping_symbol(const char *str, int is_riscv) { if (str[0] == '.' && str[1] == 'L') return true; if (str[0] == 'L' && str[1] == '0') return true; + /* + * RISC-V defines various special symbols that start with "$".  The + * mapping symbols, which exist to differentiate between incompatible + * instruction encodings when disassembling, show up all over the place + * and are generally not meant to be treated like other symbols.  So + * just ignore any of the special symbols. + */ + if (is_riscv) + return str[0] == '$'; + return str[0] == '$' && (str[1] == 'a' || str[1] == 'd' || str[1] == 't' || str[1] == 'x') && (str[2] == '\0' || str[2] == '.'); -- cgit v1.2.3 From 20bdedafd2f63e0ba70991127f9b5c0826ebdb32 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Fri, 30 Jun 2023 21:28:53 +0900 Subject: workqueue: Warn attempt to flush system-wide workqueues. Based on commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a macro"), all in-tree users stopped flushing system-wide workqueues. Therefore, start emitting runtime message so that all out-of-tree users will understand that they need to update their code. Signed-off-by: Tetsuo Handa Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 44 +++----------------------------------------- 1 file changed, 3 insertions(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 683efe29fa69..1a4223cbdb5f 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -569,6 +569,7 @@ static inline bool schedule_work(struct work_struct *work) /* * Detect attempt to flush system-wide workqueues at compile time when possible. + * Warn attempt to flush system-wide workqueues at runtime. * * See https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp * for reasons and steps for converting system-wide workqueues into local workqueues. @@ -576,52 +577,13 @@ static inline bool schedule_work(struct work_struct *work) extern void __warn_flushing_systemwide_wq(void) __compiletime_warning("Please avoid flushing system-wide workqueues."); -/** - * flush_scheduled_work - ensure that any scheduled work has run to completion. - * - * Forces execution of the kernel-global workqueue and blocks until its - * completion. - * - * It's very easy to get into trouble if you don't take great care. - * Either of the following situations will lead to deadlock: - * - * One of the work items currently on the workqueue needs to acquire - * a lock held by your code or its caller. - * - * Your code is running in the context of a work routine. - * - * They will be detected by lockdep when they occur, but the first might not - * occur very often. It depends on what work items are on the workqueue and - * what locks they need, which you have no control over. - * - * In most situations flushing the entire workqueue is overkill; you merely - * need to know that a particular work item isn't queued and isn't running. - * In such cases you should use cancel_delayed_work_sync() or - * cancel_work_sync() instead. - * - * Please stop calling this function! A conversion to stop flushing system-wide - * workqueues is in progress. This function will be removed after all in-tree - * users stopped calling this function. - */ -/* - * The background of commit 771c035372a036f8 ("deprecate the - * '__deprecated' attribute warnings entirely and for good") is that, - * since Linus builds all modules between every single pull he does, - * the standard kernel build needs to be _clean_ in order to be able to - * notice when new problems happen. Therefore, don't emit warning while - * there are in-tree users. - */ +/* Please stop using this function, for this function will be removed in near future. */ #define flush_scheduled_work() \ ({ \ - if (0) \ - __warn_flushing_systemwide_wq(); \ + __warn_flushing_systemwide_wq(); \ __flush_workqueue(system_wq); \ }) -/* - * Although there is no longer in-tree caller, for now just emit warning - * in order to give out-of-tree callers time to update. - */ #define flush_workqueue(wq) \ ({ \ struct workqueue_struct *_wq = (wq); \ -- cgit v1.2.3 From be22255360f80d3af789daad00025171a65424a5 Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Tue, 6 Jun 2023 21:59:24 +0800 Subject: jbd2: remove t_checkpoint_io_list Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint() now, it's time to remove the whole t_checkpoint_io_list logic. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d860499e15e4..bd660aac8e07 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -613,12 +613,6 @@ struct transaction_s */ struct journal_head *t_checkpoint_list; - /* - * Doubly-linked circular list of all buffers submitted for IO while - * checkpointing. [j_list_lock] - */ - struct journal_head *t_checkpoint_io_list; - /* * Doubly-linked circular list of metadata buffers being * shadowed by log IO. The IO buffers on the iobuf list and -- cgit v1.2.3 From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Tue, 6 Jun 2023 21:59:27 +0800 Subject: jbd2: fix a race when checking checkpoint buffer busy Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem. jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: stable@vger.kernel.org Suggested-by: Jan Kara Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index bd660aac8e07..44c298aa58d4 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *); void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy); unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan); int __jbd2_journal_remove_checkpoint(struct journal_head *); +int jbd2_journal_try_remove_checkpoint(struct journal_head *jh); void jbd2_journal_destroy_checkpoint(journal_t *journal); void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *); -- cgit v1.2.3 From c397f09e5498994790503a64486213ef85e58db9 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 10 Jul 2023 18:49:28 +0300 Subject: spi: Get rid of old SPI_MASTER_NO_TX & SPI_MASTER_NO_RX Convert the users from SPI_MASTER_NO_TX and/or SPI_MASTER_NO_RX to SPI_CONTROLLER_NO_TX and/or SPI_CONTROLLER_NO_RX respectively and kill the not used anymore definitions. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230710154932.68377-12-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 21b77bdfac29..debb54f37603 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -1623,8 +1623,6 @@ spi_transfer_is_last(struct spi_controller *ctlr, struct spi_transfer *xfer) #define spi_master spi_controller #define SPI_MASTER_HALF_DUPLEX SPI_CONTROLLER_HALF_DUPLEX -#define SPI_MASTER_NO_RX SPI_CONTROLLER_NO_RX -#define SPI_MASTER_NO_TX SPI_CONTROLLER_NO_TX #define SPI_MASTER_MUST_RX SPI_CONTROLLER_MUST_RX #define SPI_MASTER_MUST_TX SPI_CONTROLLER_MUST_TX -- cgit v1.2.3 From 90366cd60133a9f5b6a2f31360367c658585e125 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 10 Jul 2023 18:49:29 +0300 Subject: spi: Get rid of old SPI_MASTER_MUST_TX & SPI_MASTER_MUST_RX Convert the users from SPI_MASTER_MUST_TX and/or SPI_MASTER_MUST_RX to SPI_CONTROLLER_MUST_TX and/or SPI_CONTROLLER_MUST_RX respectively and kill the not used anymore definitions. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230710154932.68377-13-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index debb54f37603..becad31aeea2 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -1623,8 +1623,6 @@ spi_transfer_is_last(struct spi_controller *ctlr, struct spi_transfer *xfer) #define spi_master spi_controller #define SPI_MASTER_HALF_DUPLEX SPI_CONTROLLER_HALF_DUPLEX -#define SPI_MASTER_MUST_RX SPI_CONTROLLER_MUST_RX -#define SPI_MASTER_MUST_TX SPI_CONTROLLER_MUST_TX #define spi_master_get_devdata(_ctlr) spi_controller_get_devdata(_ctlr) #define spi_master_set_devdata(_ctlr, _data) \ -- cgit v1.2.3 From 82238d2cbd99ebd09dda48fb7c1c8802097da6a2 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 10 Jul 2023 18:49:30 +0300 Subject: spi: Rename SPI_MASTER_GPIO_SS to SPI_CONTROLLER_GPIO_SS Rename SPI_MASTER_GPIO_SS to SPI_CONTROLLER_GPIO_SS and convert the users to SPI_CONTROLLER_GPIO_SS to follow the new naming shema. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230710154932.68377-14-andriy.shevchenko@linux.intel.com Reviewed-by: Serge Semin Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index becad31aeea2..39a2e6108694 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -562,8 +562,7 @@ struct spi_controller { #define SPI_CONTROLLER_NO_TX BIT(2) /* Can't do buffer write */ #define SPI_CONTROLLER_MUST_RX BIT(3) /* Requires rx */ #define SPI_CONTROLLER_MUST_TX BIT(4) /* Requires tx */ - -#define SPI_MASTER_GPIO_SS BIT(5) /* GPIO CS must select slave */ +#define SPI_CONTROLLER_GPIO_SS BIT(5) /* GPIO CS must select slave */ /* Flag indicating if the allocation of this struct is devres-managed */ bool devm_allocated; -- cgit v1.2.3 From 702ca0269ed56e2d8dae7874a4d8af268e2a382e Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 10 Jul 2023 18:49:32 +0300 Subject: spi: Fix spelling typos and acronyms capitalization Fix - spelling typos - capitalization of acronyms in the comments. While at it, fix the multi-line comment style. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230710154932.68377-16-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 137 ++++++++++++++++++++++++++---------------------- 1 file changed, 75 insertions(+), 62 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 39a2e6108694..04daf61dfd3f 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -36,7 +36,7 @@ extern struct bus_type spi_bus_type; /** * struct spi_statistics - statistics for spi transfers - * @syncp: seqcount to protect members in this struct for per-cpu udate + * @syncp: seqcount to protect members in this struct for per-cpu update * on 32-bit systems * * @messages: number of spi-messages handled @@ -55,7 +55,7 @@ extern struct bus_type spi_bus_type; * @bytes_rx: number of bytes received from device * * @transfer_bytes_histo: - * transfer bytes histogramm + * transfer bytes histogram * * @transfers_split_maxsize: * number of transfers that have been split because of @@ -156,7 +156,7 @@ extern void spi_transfer_cs_change_delay_exec(struct spi_message *msg, * the device will bind to the named driver and only the named driver. * Do not set directly, because core frees it; use driver_set_override() to * set or clear it. - * @cs_gpiod: gpio descriptor of the chipselect line (optional, NULL when + * @cs_gpiod: GPIO descriptor of the chipselect line (optional, NULL when * not using a GPIO line) * @word_delay: delay to be inserted between consecutive * words of a transfer @@ -212,7 +212,7 @@ struct spi_device { void *controller_data; char modalias[SPI_NAME_SIZE]; const char *driver_override; - struct gpio_desc *cs_gpiod; /* Chip select gpio desc */ + struct gpio_desc *cs_gpiod; /* Chip select GPIO descriptor */ struct spi_delay word_delay; /* Inter-word delay */ /* CS delays */ struct spi_delay cs_setup; @@ -223,7 +223,7 @@ struct spi_device { struct spi_statistics __percpu *pcpu_statistics; /* - * likely need more hooks for more protocol options affecting how + * Likely need more hooks for more protocol options affecting how * the controller talks to each chip, like: * - memory packing (12 bit samples into low bits, others zeroed) * - priority @@ -299,11 +299,11 @@ static inline void spi_set_csgpiod(struct spi_device *spi, u8 idx, struct gpio_d /** * struct spi_driver - Host side "protocol" driver * @id_table: List of SPI devices supported by this driver - * @probe: Binds this driver to the spi device. Drivers can verify + * @probe: Binds this driver to the SPI device. Drivers can verify * that the device is actually present, and may need to configure * characteristics (such as bits_per_word) which weren't needed for * the initial configuration done during system setup. - * @remove: Unbinds this driver from the spi device + * @remove: Unbinds this driver from the SPI device * @shutdown: Standard shutdown callback used during system state * transitions such as powerdown/halt and kexec * @driver: SPI device drivers should initialize the name and owner @@ -415,7 +415,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @queued: whether this controller is providing an internal message queue * @kworker: pointer to thread struct for message pump * @pump_messages: work struct for scheduling work to the message pump - * @queue_lock: spinlock to syncronise access to message queue + * @queue_lock: spinlock to synchronise access to message queue * @queue: message queue * @cur_msg: the currently in-flight message * @cur_msg_completion: a completion for the current in-flight message @@ -473,7 +473,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @unprepare_message: undo any work done by prepare_message(). * @slave_abort: abort the ongoing transfer request on an SPI slave controller * @target_abort: abort the ongoing transfer request on an SPI target controller - * @cs_gpiods: Array of GPIO descs to use as chip select lines; one per CS + * @cs_gpiods: Array of GPIO descriptors to use as chip select lines; one per CS * number. Any individual value may be NULL for CS lines that * are not GPIOs (driven by the SPI controller itself). * @use_gpio_descriptors: Turns on the code in the SPI core to parse and grab @@ -500,7 +500,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * If the driver does not set this, the SPI core takes the snapshot as * close to the driver hand-over as possible. * @irq_flags: Interrupt enable state during PTP system timestamping - * @fallback: fallback to pio if dma transfer return failure with + * @fallback: fallback to PIO if DMA transfer return failure with * SPI_TRANS_FAIL_NO_START. * @queue_empty: signal green light for opportunistically skipping the queue * for spi_sync transfers. @@ -522,15 +522,17 @@ struct spi_controller { struct list_head list; - /* Other than negative (== assign one dynamically), bus_num is fully - * board-specific. usually that simplifies to being SOC-specific. - * example: one SOC has three SPI controllers, numbered 0..2, - * and one board's schematics might show it using SPI-2. software + /* + * Other than negative (== assign one dynamically), bus_num is fully + * board-specific. Usually that simplifies to being SoC-specific. + * example: one SoC has three SPI controllers, numbered 0..2, + * and one board's schematics might show it using SPI-2. Software * would normally use bus_num=2 for that controller. */ s16 bus_num; - /* chipselects will be integral to many controllers; some others + /* + * Chipselects will be integral to many controllers; some others * might use board-specific GPIOs. */ u16 num_chipselect; @@ -575,8 +577,8 @@ struct spi_controller { }; /* - * on some hardware transfer / message size may be constrained - * the limit may depend on device transfer settings + * On some hardware transfer / message size may be constrained + * the limit may depend on device transfer settings. */ size_t (*max_transfer_size)(struct spi_device *spi); size_t (*max_message_size)(struct spi_device *spi); @@ -594,7 +596,8 @@ struct spi_controller { /* Flag indicating that the SPI bus is locked for exclusive use */ bool bus_lock_flag; - /* Setup mode and clock, etc (spi driver may call many times). + /* + * Setup mode and clock, etc (SPI driver may call many times). * * IMPORTANT: this may be called when transfers to another * device are active. DO NOT UPDATE SHARED REGISTERS in ways @@ -612,18 +615,19 @@ struct spi_controller { */ int (*set_cs_timing)(struct spi_device *spi); - /* Bidirectional bulk transfers + /* + * Bidirectional bulk transfers * * + The transfer() method may not sleep; its main role is * just to add the message to the queue. * + For now there's no remove-from-queue operation, or * any other request management - * + To a given spi_device, message queueing is pure fifo + * + To a given spi_device, message queueing is pure FIFO * * + The controller's main job is to process its message queue, * selecting a chip (for masters), then transferring data * + If there are multiple spi_device children, the i/o queue - * arbitration algorithm is unspecified (round robin, fifo, + * arbitration algorithm is unspecified (round robin, FIFO, * priority, reservations, preemption, etc) * * + Chipselect stays active during the entire message @@ -704,7 +708,7 @@ struct spi_controller { const struct spi_controller_mem_ops *mem_ops; const struct spi_controller_mem_caps *mem_caps; - /* gpio chip select */ + /* GPIO chip select */ struct gpio_desc **cs_gpiods; bool use_gpio_descriptors; s8 unused_native_cs; @@ -788,7 +792,7 @@ void spi_take_timestamp_post(struct spi_controller *ctlr, struct spi_transfer *xfer, size_t progress, bool irqs_off); -/* The spi driver core manages memory for the spi_controller classdev */ +/* The SPI driver core manages memory for the spi_controller classdev */ extern struct spi_controller *__spi_alloc_controller(struct device *host, unsigned int size, bool slave); @@ -877,13 +881,13 @@ typedef void (*spi_res_release_t)(struct spi_controller *ctlr, void *res); /** - * struct spi_res - spi resource management structure + * struct spi_res - SPI resource management structure * @entry: list entry * @release: release code called prior to freeing this resource * @data: extra data allocated for the specific use-case * - * this is based on ideas from devres, but focused on life-cycle - * management during spi_message processing + * This is based on ideas from devres, but focused on life-cycle + * management during spi_message processing. */ struct spi_res { struct list_head entry; @@ -901,7 +905,7 @@ struct spi_res { * * The spi_messages themselves consist of a series of read+write transfer * segments. Those segments always read the same number of bits as they - * write; but one or the other is easily ignored by passing a null buffer + * write; but one or the other is easily ignored by passing a NULL buffer * pointer. (This is unlike most types of I/O API, because SPI hardware * is full duplex.) * @@ -912,8 +916,8 @@ struct spi_res { /** * struct spi_transfer - a read/write buffer pair - * @tx_buf: data to be written (dma-safe memory), or NULL - * @rx_buf: data to be read (dma-safe memory), or NULL + * @tx_buf: data to be written (DMA-safe memory), or NULL + * @rx_buf: data to be read (DMA-safe memory), or NULL * @tx_dma: DMA address of tx_buf, if @spi_message.is_dma_mapped * @rx_dma: DMA address of rx_buf, if @spi_message.is_dma_mapped * @tx_nbits: number of bits used for writing. If 0 the default @@ -936,7 +940,7 @@ struct spi_res { * @word_delay: inter word delay to be introduced after each word size * (set by bits_per_word) transmission. * @effective_speed_hz: the effective SCK-speed that was used to - * transfer this transfer. Set to 0 if the spi bus driver does + * transfer this transfer. Set to 0 if the SPI bus driver does * not support it. * @transfer_list: transfers are sequenced through @spi_message.transfers * @tx_sg: Scatterlist for transmit, currently not for client use @@ -965,16 +969,16 @@ struct spi_res { * transmitting the "pre" word, and the "post" timestamp after receiving * transmit confirmation from the controller for the "post" word. * @timestamped: true if the transfer has been timestamped - * @error: Error status logged by spi controller driver. + * @error: Error status logged by SPI controller driver. * * SPI transfers always write the same number of bytes as they read. * Protocol drivers should always provide @rx_buf and/or @tx_buf. * In some cases, they may also want to provide DMA addresses for * the data being transferred; that may reduce overhead, when the - * underlying driver uses dma. + * underlying driver uses DMA. * - * If the transmit buffer is null, zeroes will be shifted out - * while filling @rx_buf. If the receive buffer is null, the data + * If the transmit buffer is NULL, zeroes will be shifted out + * while filling @rx_buf. If the receive buffer is NULL, the data * shifted in will be discarded. Only "len" bytes shift out (or in). * It's an error to try to shift out a partial word. (For example, by * shifting out three bytes with word size of sixteen or twenty bits; @@ -1008,7 +1012,7 @@ struct spi_res { * Some devices need protocol transactions to be built from a series of * spi_message submissions, where the content of one message is determined * by the results of previous messages and where the whole transaction - * ends when the chipselect goes intactive. + * ends when the chipselect goes inactive. * * When SPI can transfer in 1x,2x or 4x. It can get this transfer information * from device through @tx_nbits and @rx_nbits. In Bi-direction, these @@ -1022,10 +1026,11 @@ struct spi_res { * and its transfers, ignore them until its completion callback. */ struct spi_transfer { - /* It's ok if tx_buf == rx_buf (right?) - * for MicroWire, one buffer must be null - * buffers must work with dma_*map_single() calls, unless - * spi_message.is_dma_mapped reports a pre-existing mapping + /* + * It's okay if tx_buf == rx_buf (right?). + * For MicroWire, one buffer must be NULL. + * Buffers must work with dma_*map_single() calls, unless + * spi_message.is_dma_mapped reports a pre-existing mapping. */ const void *tx_buf; void *rx_buf; @@ -1045,9 +1050,9 @@ struct spi_transfer { unsigned tx_nbits:3; unsigned rx_nbits:3; unsigned timestamped:1; -#define SPI_NBITS_SINGLE 0x01 /* 1bit transfer */ -#define SPI_NBITS_DUAL 0x02 /* 2bits transfer */ -#define SPI_NBITS_QUAD 0x04 /* 4bits transfer */ +#define SPI_NBITS_SINGLE 0x01 /* 1-bit transfer */ +#define SPI_NBITS_DUAL 0x02 /* 2-bit transfer */ +#define SPI_NBITS_QUAD 0x04 /* 4-bit transfer */ u8 bits_per_word; struct spi_delay delay; struct spi_delay cs_change_delay; @@ -1068,7 +1073,7 @@ struct spi_transfer { * struct spi_message - one multi-segment SPI transaction * @transfers: list of transfer segments in this transaction * @spi: SPI device to which the transaction is queued - * @is_dma_mapped: if true, the caller provided both dma and cpu virtual + * @is_dma_mapped: if true, the caller provided both DMA and CPU virtual * addresses for each transfer buffer * @complete: called to report transaction completions * @context: the argument to complete() when it's called @@ -1078,7 +1083,7 @@ struct spi_transfer { * @status: zero for success, else negative errno * @queue: for use by whichever driver currently owns the message * @state: for use by whichever driver currently owns the message - * @resources: for resource management when the spi message is processed + * @resources: for resource management when the SPI message is processed * @prepared: spi_prepare_message was called for the this message * * A @spi_message is used to execute an atomic sequence of data transfers, @@ -1105,7 +1110,8 @@ struct spi_message { /* spi_prepare_message() was called for this message */ bool prepared; - /* REVISIT: we might want a flag affecting the behavior of the + /* + * REVISIT: we might want a flag affecting the behavior of the * last transfer ... allowing things like "read 16 bit length L" * immediately followed by "read L bytes". Basically imposing * a specific message scheduling algorithm. @@ -1123,14 +1129,15 @@ struct spi_message { unsigned frame_length; unsigned actual_length; - /* For optional use by whatever driver currently owns the + /* + * For optional use by whatever driver currently owns the * spi_message ... between calls to spi_async and then later * complete(), that's the spi_controller controller driver. */ struct list_head queue; void *state; - /* List of spi_res reources when the spi message is processed */ + /* List of spi_res resources when the SPI message is processed */ struct list_head resources; }; @@ -1167,7 +1174,7 @@ spi_transfer_delay_exec(struct spi_transfer *t) /** * spi_message_init_with_transfers - Initialize spi_message and append transfers * @m: spi_message to be initialized - * @xfers: An array of spi transfers + * @xfers: An array of SPI transfers * @num_xfers: Number of items in the xfer array * * This function initializes the given spi_message and adds each spi_transfer in @@ -1184,10 +1191,10 @@ struct spi_transfer *xfers, unsigned int num_xfers) spi_message_add_tail(&xfers[i], m); } -/* It's fine to embed message and transaction structures in other data +/* + * It's fine to embed message and transaction structures in other data * structures so long as you don't free them while they're in use. */ - static inline struct spi_message *spi_message_alloc(unsigned ntrans, gfp_t flags) { struct spi_message *m; @@ -1290,7 +1297,7 @@ typedef void (*spi_replaced_release_t)(struct spi_controller *ctlr, * replacements that have occurred * so that they can get reverted * @release: some extra release code to get executed prior to - * relasing this structure + * releasing this structure * @extradata: pointer to some extra data if requested or NULL * @replaced_transfers: transfers that have been replaced and which need * to get restored @@ -1300,9 +1307,9 @@ typedef void (*spi_replaced_release_t)(struct spi_controller *ctlr, * @inserted_transfers: array of spi_transfers of array-size @inserted, * that have been replacing replaced_transfers * - * note: that @extradata will point to @inserted_transfers[@inserted] + * Note: that @extradata will point to @inserted_transfers[@inserted] * if some extra allocation is requested, so alignment will be the same - * as for spi_transfers + * as for spi_transfers. */ struct spi_replaced_transfers { spi_replaced_release_t release; @@ -1328,7 +1335,8 @@ extern int spi_split_transfers_maxwords(struct spi_controller *ctlr, /*---------------------------------------------------------------------------*/ -/* All these synchronous SPI transfer routines are utilities layered +/* + * All these synchronous SPI transfer routines are utilities layered * over the core async transfer primitive. Here, "synchronous" means * they will sleep uninterruptibly until the async transfer completes. */ @@ -1471,7 +1479,7 @@ static inline ssize_t spi_w8r16(struct spi_device *spi, u8 cmd) * * Callable only from contexts that can sleep. * - * Return: the (unsigned) sixteen bit number returned by the device in cpu + * Return: the (unsigned) sixteen bit number returned by the device in CPU * endianness, or else a negative error code. */ static inline ssize_t spi_w8r16be(struct spi_device *spi, u8 cmd) @@ -1499,7 +1507,7 @@ static inline ssize_t spi_w8r16be(struct spi_device *spi, u8 cmd) * As a rule, SPI devices can't be probed. Instead, board init code * provides a table listing the devices which are present, with enough * information to bind and set up the device's driver. There's basic - * support for nonstatic configurations too; enough to handle adding + * support for non-static configurations too; enough to handle adding * parport adapters, or microcontrollers acting as USB-to-SPI bridges. */ @@ -1536,12 +1544,13 @@ static inline ssize_t spi_w8r16be(struct spi_device *spi, u8 cmd) * are active in some dynamic board configuration models. */ struct spi_board_info { - /* The device name and module name are coupled, like platform_bus; + /* + * The device name and module name are coupled, like platform_bus; * "modalias" is normally the driver name. * * platform_data goes to spi_device.dev.platform_data, * controller_data goes to spi_device.controller_data, - * irq is copied too + * IRQ is copied too. */ char modalias[SPI_NAME_SIZE]; const void *platform_data; @@ -1553,7 +1562,8 @@ struct spi_board_info { u32 max_speed_hz; - /* bus_num is board specific and matches the bus_num of some + /* + * bus_num is board specific and matches the bus_num of some * spi_controller that will probably be registered later. * * chip_select reflects how this chip is wired to that master; @@ -1562,12 +1572,14 @@ struct spi_board_info { u16 bus_num; u16 chip_select; - /* mode becomes spi_device.mode, and is essential for chips + /* + * mode becomes spi_device.mode, and is essential for chips * where the default of SPI_CS_HIGH = 0 is wrong. */ u32 mode; - /* ... may need additional spi_device chip config data here. + /* + * ... may need additional spi_device chip config data here. * avoid stuff protocol drivers can set; but include stuff * needed to behave without being bound to a driver: * - quirks like clock rate mattering when not selected @@ -1584,7 +1596,8 @@ spi_register_board_info(struct spi_board_info const *info, unsigned n) { return 0; } #endif -/* If you're hotplugging an adapter with devices (parport, usb, etc) +/* + * If you're hotplugging an adapter with devices (parport, USB, etc) * use spi_new_device() to describe each device. You can also call * spi_unregister_device() to start making that device vanish, but * normally that would be handled by spi_unregister_controller(). -- cgit v1.2.3 From 06a0213977dc5f8345c1222183b8fafac1a8a524 Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Tue, 11 Jul 2023 18:16:03 +0200 Subject: Non-functional cleanup of a "__user * filename" The next patch defines a very similar interface, which I copied from this definition. Since I'm touching it anyway I don't see any reason not to just go fix this one up. Signed-off-by: Palmer Dabbelt Acked-by: Arnd Bergmann Message-Id: Signed-off-by: Christian Brauner --- include/linux/syscalls.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 03e3d0121d5e..584f404bf868 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -438,7 +438,7 @@ asmlinkage long sys_chdir(const char __user *filename); asmlinkage long sys_fchdir(unsigned int fd); asmlinkage long sys_chroot(const char __user *filename); asmlinkage long sys_fchmod(unsigned int fd, umode_t mode); -asmlinkage long sys_fchmodat(int dfd, const char __user * filename, +asmlinkage long sys_fchmodat(int dfd, const char __user *filename, umode_t mode); asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group, int flag); -- cgit v1.2.3 From 2f0584f3f4bd60bcc8735172981fb0bff86e74e0 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:10:27 -0700 Subject: mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma() The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Suggested-by: Linus Torvalds Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Mike Rapoport (IBM) Acked-by: Geert Uytterhoeven Acked-by: David Hildenbrand Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com --- include/linux/pgtable.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5063b482e34f..e6ea6e0d7d8d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -515,6 +515,20 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, pud_t *pudp); #endif +#ifndef pte_mkwrite +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_novma(pte); +} +#endif + +#if defined(CONFIG_ARCH_WANT_PMD_MKWRITE) && !defined(pmd_mkwrite) +static inline pmd_t pmd_mkwrite(pmd_t pmd) +{ + return pmd_mkwrite_novma(pmd); +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_WRPROTECT struct mm_struct; static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) -- cgit v1.2.3 From 161e393c0f63592a3b95bdd8b55752653763fc6d Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:10:29 -0700 Subject: mm: Make pte_mkwrite() take a VMA The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Previous work pte_mkwrite() renamed pte_mkwrite_novma() and converted callers that don't have a VMA were to use pte_mkwrite_novma(). So now change pte_mkwrite() to take a VMA and change the remaining callers to pass a VMA. Apply the same changes for pmd_mkwrite(). No functional change. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Link: https://lore.kernel.org/all/20230613001108.3040476-4-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 2 +- include/linux/pgtable.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..d40fa0feb9dc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1277,7 +1277,7 @@ void free_compound_page(struct page *page); static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e6ea6e0d7d8d..9462f4a87d42 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -516,14 +516,14 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, #endif #ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_novma(pte); } #endif #if defined(CONFIG_ARCH_WANT_PMD_MKWRITE) && !defined(pmd_mkwrite) -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { return pmd_mkwrite_novma(pmd); } -- cgit v1.2.3 From 592b5fad1677aa98a578ae50eb81d7383752c9c8 Mon Sep 17 00:00:00 2001 From: Yu-cheng Yu Date: Mon, 12 Jun 2023 17:10:30 -0700 Subject: mm: Re-introduce vm_flags to do_mmap() There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Reviewed-by: Mark Brown Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Tested-by: Mark Brown Link: https://lore.kernel.org/all/20230613001108.3040476-5-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d40fa0feb9dc..f9a627c492f2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3176,7 +3176,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock); -- cgit v1.2.3 From fb47a799cc5ccc469c63e9174f2ad555a21ba2a1 Mon Sep 17 00:00:00 2001 From: Yu-cheng Yu Date: Mon, 12 Jun 2023 17:10:31 -0700 Subject: mm: Move VM_UFFD_MINOR_BIT from 37 to 38 The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Future patches will introduce a new VM flag VM_SHADOW_STACK that will be VM_HIGH_ARCH_BIT_5. VM_HIGH_ARCH_BIT_1 through VM_HIGH_ARCH_BIT_4 are bits 32-36, and bit 37 is the unrelated VM_UFFD_MINOR_BIT. For the sake of order, make all VM_HIGH_ARCH_BITs stay together by moving VM_UFFD_MINOR_BIT from 37 to 38. This will allow VM_SHADOW_STACK to be introduced as 37. Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Axel Rasmussen Acked-by: Mike Rapoport (IBM) Acked-by: Peter Xu Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-6-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index f9a627c492f2..82990f3390f6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -370,7 +370,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE -- cgit v1.2.3 From 54007f818206dc27309ca423df4c87dd160a7208 Mon Sep 17 00:00:00 2001 From: Yu-cheng Yu Date: Mon, 12 Jun 2023 17:10:40 -0700 Subject: mm: Introduce VM_SHADOW_STACK for shadow stack memory New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware. Shadow stack VMA creation will be tightly controlled and limited to anonymous memory to make the implementation simpler and since that is all that is required. The solution will rely on pte_mkwrite() to create the shadow stack PTEs, so it will not be required for vm_get_page_prot() to learn how to create shadow stack memory. For this reason document that VM_SHADOW_STACK should not be mixed with VM_SHARED. Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Reviewed-by: Mark Brown Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Mark Brown Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-15-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 82990f3390f6..f6c2ebde62b3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -319,11 +319,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -339,6 +341,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) -- cgit v1.2.3 From 0266e7c53647fbc18be2d0da98d5c9e92922d866 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:10:42 -0700 Subject: mm: Add guard pages around a shadow stack. The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP instruction can move the SSP to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack VMAs, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can be incremented or decremented by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore, a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stacks to grow, which is unneeded and adds a strange difference to how most regular stacks work. In the maple tree code, there is some logic for retrying the unmapped area search if a guard gap is violated. This retry should happen for shadow stack guard gap violations as well. This logic currently only checks for VM_GROWSDOWN for start gaps. Since shadow stacks also have a start gap as well, create an new define VM_STARTGAP_FLAGS to hold all the VM flag bits that have start gaps, and make mmap use it. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Mark Brown Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-17-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 48 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index f6c2ebde62b3..97eddc83d19c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -342,7 +342,36 @@ extern unsigned int kobjsize(const void *objp); #endif /* CONFIG_ARCH_HAS_PKEYS */ #ifdef CONFIG_X86_USER_SHADOW_STACK -# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +/* + * This flag should not be set with VM_SHARED because of lack of support + * core mm. It will also get a guard page. This helps userspace protect + * itself from attacks. The reasoning is as follows: + * + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The + * INCSSP instruction can increment the shadow stack pointer. It is the + * shadow stack analog of an instruction like: + * + * addq $0x80, %rsp + * + * However, there is one important difference between an ADD on %rsp + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the + * memory of the first and last elements that were "popped". It can be + * thought of as acting like this: + * + * READ_ONCE(ssp); // read+discard top element on stack + * ssp += nr_to_pop * 8; // move the shadow stack + * READ_ONCE(ssp-8); // read+discard last popped stack element + * + * The maximum distance INCSSP can move the SSP is 2040 bytes, before + * it would read the memory. Therefore a single page gap will be enough + * to prevent any operation from shifting the SSP to an adjacent stack, + * since it would have to land in the gap at least once, causing a + * fault. + * + * Prevent using INCSSP to move the SSP between shadow stacks by + * having a PAGE_SIZE guard gap. + */ +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 #else # define VM_SHADOW_STACK VM_NONE #endif @@ -405,6 +434,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS #endif +#define VM_STARTGAP_FLAGS (VM_GROWSDOWN | VM_SHADOW_STACK) + #ifdef CONFIG_STACK_GROWSUP #define VM_STACK VM_GROWSUP #define VM_STACK_EARLY VM_GROWSDOWN @@ -3273,15 +3304,26 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* See reasoning around the VM_SHADOW_STACK definition */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } -- cgit v1.2.3 From e5136e876581ba5b63220378e25fec9dcec7bad1 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:10:43 -0700 Subject: mm: Warn on shadow stack memory in wrong vma The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. In order to check if a PTE is shadow stack in core mm code, add two arch breakouts arch_check_zapped_pte/pmd(). This will allow shadow stack specific code to be kept in arch/x86. Only do the check if shadow stack is supported by the CPU and configured because in rare cases older CPUs may write Dirty=1 to a Write=0 CPU on older CPUs. This check is handled in pte_shstk()/pmd_shstk(). Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Mark Brown Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-18-rick.p.edgecombe%40intel.com --- include/linux/pgtable.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 9462f4a87d42..dd4637d6cfaa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -313,6 +313,20 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_check_zapped_pte +static inline void arch_check_zapped_pte(struct vm_area_struct *vma, + pte_t pte) +{ +} +#endif + +#ifndef arch_check_zapped_pmd +static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, + pmd_t pmd) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, -- cgit v1.2.3 From 29f890d1050fc099fd578d9db844d6c0375902b6 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:10:46 -0700 Subject: x86/mm: Introduce MAP_ABOVE4G The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which require some core mm changes to function properly. One of the properties is that the shadow stack pointer (SSP), which is a CPU register that points to the shadow stack like the stack pointer points to the stack, can't be pointing outside of the 32 bit address space when the CPU is executing in 32 bit mode. It is desirable to prevent executing in 32 bit mode when shadow stack is enabled because the kernel can't easily support 32 bit signals. On x86 it is possible to transition to 32 bit mode without any special interaction with the kernel, by doing a "far call" to a 32 bit segment. So the shadow stack implementation can use this address space behavior as a feature, by enforcing that shadow stack memory is always mapped outside of the 32 bit address space. This way userspace will trigger a general protection fault which will in turn trigger a segfault if it tries to transition to 32 bit mode with shadow stack enabled. This provides a clean error generating border for the user if they try attempt to do 32 bit mode shadow stack, rather than leave the kernel in a half working state for userspace to be surprised by. So to allow future shadow stack enabling patches to map shadow stacks out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior is pretty much like MAP_32BIT, except that it has the opposite address range. The are a few differences though. If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit syscall. Since the default search behavior is top down, the normal kaslr base can be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add its own randomization in the bottom up case. For MAP_32BIT, only the bottom up search path is used. For MAP_ABOVE4G both are potentially valid, so both are used. In the bottomup search path, the default behavior is already consistent with MAP_ABOVE4G since mmap base should be above 4GB. Without MAP_ABOVE4G, the shadow stack will already normally be above 4GB. So without introducing MAP_ABOVE4G, trying to transition to 32 bit mode with shadow stack enabled would usually segfault anyway. This is already pretty decent guard rails. But the addition of MAP_ABOVE4G is some small complexity spent to make it make it more complete. Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-21-rick.p.edgecombe%40intel.com --- include/linux/mman.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mman.h b/include/linux/mman.h index cee1e4b566d8..40d94411d492 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -15,6 +15,9 @@ #ifndef MAP_32BIT #define MAP_32BIT 0 #endif +#ifndef MAP_ABOVE4G +#define MAP_ABOVE4G 0 +#endif #ifndef MAP_HUGE_2MB #define MAP_HUGE_2MB 0 #endif @@ -50,6 +53,7 @@ | MAP_STACK \ | MAP_HUGETLB \ | MAP_32BIT \ + | MAP_ABOVE4G \ | MAP_HUGE_2MB \ | MAP_HUGE_1GB) -- cgit v1.2.3 From 5125e757e62f6c1d5478db4c2b61a744060ddf3f Mon Sep 17 00:00:00 2001 From: Yafang Shao Date: Sun, 9 Jul 2023 02:56:25 +0000 Subject: bpf: Clear the probe_addr for uprobe To avoid returning uninitialized or random values when querying the file descriptor (fd) and accessing probe_addr, it is necessary to clear the variable prior to its use. Fixes: 41bdc4b40ed6 ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") Signed-off-by: Yafang Shao Acked-by: Yonghong Song Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/trace_events.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7c4a0b72334e..36de9ebec440 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -864,7 +864,8 @@ extern int perf_uprobe_init(struct perf_event *event, extern void perf_uprobe_destroy(struct perf_event *event); extern int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type, const char **filename, - u64 *probe_offset, bool perf_type_tracepoint); + u64 *probe_offset, u64 *probe_addr, + bool perf_type_tracepoint); #endif extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); -- cgit v1.2.3 From 19bfa9ebebb5ec0695def57eb1d80de7e9cab369 Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Tue, 20 Jun 2023 16:19:04 +0300 Subject: mtd: use refcount to prevent corruption When underlying device is removed mtd core will crash in case user space is holding open handle. Need to use proper refcounting so device is release only when has no users. Signed-off-by: Tomas Winkler Signed-off-by: Alexander Usyskin Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20230620131905.648089-2-alexander.usyskin@intel.com --- include/linux/mtd/mtd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 7c58c44662b8..914a9f974baa 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -379,7 +379,7 @@ struct mtd_info { struct module *owner; struct device dev; - int usecount; + struct kref refcnt; struct mtd_debug_info dbg; struct nvmem_device *nvmem; struct nvmem_device *otp_user_nvmem; -- cgit v1.2.3 From 079c8d9da26ed041a54706de68b754337e6df17e Mon Sep 17 00:00:00 2001 From: Arseniy Krasnov Date: Wed, 5 Jul 2023 13:43:57 +0300 Subject: mtd: rawnand: export 'nand_exit_status_op()' Export this function to work in pair with 'nand_status_op()' which is already exported. Signed-off-by: Arseniy Krasnov Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20230705104403.696680-2-AVKrasnov@sberdevices.ru --- include/linux/mtd/rawnand.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h index 5159d692f9ce..90a141ba2a5a 100644 --- a/include/linux/mtd/rawnand.h +++ b/include/linux/mtd/rawnand.h @@ -1540,6 +1540,7 @@ int nand_reset_op(struct nand_chip *chip); int nand_readid_op(struct nand_chip *chip, u8 addr, void *buf, unsigned int len); int nand_status_op(struct nand_chip *chip, u8 *status); +int nand_exit_status_op(struct nand_chip *chip); int nand_erase_op(struct nand_chip *chip, unsigned int eraseblock); int nand_read_page_op(struct nand_chip *chip, unsigned int page, unsigned int offset_in_page, void *buf, unsigned int len); -- cgit v1.2.3 From 77ee9d299e6d69a75e9a3e15a1f09cd7c8f88ff0 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:45 +1200 Subject: platform/x86: asus-wmi: add support for showing charger mode Expose a WMI method in sysfs platform for showing which connected charger the laptop is currently using. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-2-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 28234dc9fa6a..f90cafe26af1 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -95,6 +95,9 @@ /* Keyboard dock */ #define ASUS_WMI_DEVID_KBD_DOCK 0x00120063 +/* Charging mode - 1=Barrel, 2=USB */ +#define ASUS_WMI_DEVID_CHARGE_MODE 0x0012006C + /* dgpu on/off */ #define ASUS_WMI_DEVID_EGPU 0x00090019 -- cgit v1.2.3 From 536fce82d72933b9c60f1f0873f75ef890d80679 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:46 +1200 Subject: platform/x86: asus-wmi: add support for showing middle fan RPM Some newer ASUS ROG laptops now have a middle/center fan in addition to the CPU and GPU fans. This new fan typically blows across the heatpipes and VRMs betweent eh CPU and GPU. This commit exposes that fan to PWM control plus showing RPM. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-3-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index f90cafe26af1..2c03bda7703f 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -80,6 +80,7 @@ #define ASUS_WMI_DEVID_FAN_CTRL 0x00110012 /* deprecated */ #define ASUS_WMI_DEVID_CPU_FAN_CTRL 0x00110013 #define ASUS_WMI_DEVID_GPU_FAN_CTRL 0x00110014 +#define ASUS_WMI_DEVID_MID_FAN_CTRL 0x00110031 #define ASUS_WMI_DEVID_CPU_FAN_CURVE 0x00110024 #define ASUS_WMI_DEVID_GPU_FAN_CURVE 0x00110025 -- cgit v1.2.3 From ee887807d05d3d6fb68917df59e450385fe630d3 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:47 +1200 Subject: platform/x86: asus-wmi: support middle fan custom curves Adds support for fan curves defined for the middle fan which is available on some ASUS ROG laptops. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-4-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 2c03bda7703f..329efc086993 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -83,6 +83,7 @@ #define ASUS_WMI_DEVID_MID_FAN_CTRL 0x00110031 #define ASUS_WMI_DEVID_CPU_FAN_CURVE 0x00110024 #define ASUS_WMI_DEVID_GPU_FAN_CURVE 0x00110025 +#define ASUS_WMI_DEVID_MID_FAN_CURVE 0x00110032 /* Power */ #define ASUS_WMI_DEVID_PROCESSOR_STATE 0x00120012 -- cgit v1.2.3 From d4eca58aafe241f68e9d7e015d262abe8c8507ac Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:48 +1200 Subject: platform/x86: asus-wmi: add WMI method to show if egpu connected Exposes the WMI method which tells if the eGPU is properly connected on the devices that support it. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-5-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 329efc086993..2034648f8cdf 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -100,7 +100,9 @@ /* Charging mode - 1=Barrel, 2=USB */ #define ASUS_WMI_DEVID_CHARGE_MODE 0x0012006C -/* dgpu on/off */ +/* epu is connected? 1 == true */ +#define ASUS_WMI_DEVID_EGPU_CONNECTED 0x00090018 +/* egpu on/off */ #define ASUS_WMI_DEVID_EGPU 0x00090019 /* dgpu on/off */ -- cgit v1.2.3 From abac4259fc0a2b691af6c3916e420f374f7fd900 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:51 +1200 Subject: platform/x86: asus-wmi: support setting mini-LED mode Support changing the mini-LED mode on some of the newer ASUS laptops. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-8-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 2034648f8cdf..ea80361ac6c7 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -66,6 +66,7 @@ #define ASUS_WMI_DEVID_CAMERA 0x00060013 #define ASUS_WMI_DEVID_LID_FLIP 0x00060062 #define ASUS_WMI_DEVID_LID_FLIP_ROG 0x00060077 +#define ASUS_WMI_DEVID_MINI_LED_MODE 0x0005001E /* Storage */ #define ASUS_WMI_DEVID_CARDREADER 0x00080013 -- cgit v1.2.3 From e0b278e7b5da62c3ebb156a8b7d76a739da2d953 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 30 Jun 2023 17:35:52 +1200 Subject: platform/x86: asus-wmi: expose dGPU and CPU tunables for ROG Expose various CPU and dGPU tunables that are available on many ASUS ROG laptops. The tunables shown in sysfs will vary depending on the CPU and dGPU vendor. All of these variables are write only and there is no easy way to find what the defaults are. In general they seem to default to the max value the vendor sets for the CPU and dGPU package - this is not the same as the min/max writable value. Values written to these variables that are beyond the capabilities of the CPU are ignored by the laptop. Signed-off-by: Luke D. Jones Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230630053552.976579-9-luke@ljones.dev Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index ea80361ac6c7..16e99a1c37fc 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -86,6 +86,15 @@ #define ASUS_WMI_DEVID_GPU_FAN_CURVE 0x00110025 #define ASUS_WMI_DEVID_MID_FAN_CURVE 0x00110032 +/* Tunables for AUS ROG laptops */ +#define ASUS_WMI_DEVID_PPT_PL2_SPPT 0x001200A0 +#define ASUS_WMI_DEVID_PPT_PL1_SPL 0x001200A3 +#define ASUS_WMI_DEVID_PPT_APU_SPPT 0x001200B0 +#define ASUS_WMI_DEVID_PPT_PLAT_SPPT 0x001200B1 +#define ASUS_WMI_DEVID_PPT_FPPT 0x001200C1 +#define ASUS_WMI_DEVID_NV_DYN_BOOST 0x001200C0 +#define ASUS_WMI_DEVID_NV_THERM_TARGET 0x001200C2 + /* Power */ #define ASUS_WMI_DEVID_PROCESSOR_STATE 0x00120012 -- cgit v1.2.3 From 43a89baecfe200cb4530f42b9fcf904925d6d14a Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 5 Jul 2023 20:34:43 -0700 Subject: rcu: Export rcu_request_urgent_qs_task() If a CPU is executing a long series of non-sleeping system calls, RCU grace periods can be delayed for on the order of a couple hundred milliseconds. This is normally not a problem, but if each system call does a call_rcu(), those callbacks can stack up. RCU will eventually notice this callback storm, but use of rcu_request_urgent_qs_task() allows the code invoking call_rcu() to give RCU a heads up. This function is not for general use, not yet, anyway. Reported-by: Alexei Starovoitov Signed-off-by: Paul E. McKenney Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230706033447.54696-11-alexei.starovoitov@gmail.com --- include/linux/rcutiny.h | 2 ++ include/linux/rcutree.h | 1 + 2 files changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 7f17acf29dda..7b949292908a 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -138,6 +138,8 @@ static inline int rcu_needs_cpu(void) return 0; } +static inline void rcu_request_urgent_qs_task(struct task_struct *t) { } + /* * Take advantage of the fact that there is only one CPU, which * allows us to ignore virtualization-based context switches. diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 56bccb5a8fde..126f6b418f6a 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -21,6 +21,7 @@ void rcu_softirq_qs(void); void rcu_note_context_switch(bool preempt); int rcu_needs_cpu(void); void rcu_cpu_stall_reset(void); +void rcu_request_urgent_qs_task(struct task_struct *t); /* * Note a virtualization-based context switch. This is simply a -- cgit v1.2.3 From 5af6807bdb10d1af9d412d7d6c177ba8440adffb Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Wed, 5 Jul 2023 20:34:45 -0700 Subject: bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Hou Tao Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com --- include/linux/bpf_mem_alloc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 3929be5743f4..d644bbb298af 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -27,10 +27,12 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr); /* kmem_cache_alloc/free equivalent: */ void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr); void bpf_mem_cache_raw_free(void *ptr); void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags); -- cgit v1.2.3 From d243b34459cea30cfe5f3a9b2feb44e7daff9938 Mon Sep 17 00:00:00 2001 From: Wander Lairson Costa Date: Wed, 14 Jun 2023 09:23:21 -0300 Subject: kernel/fork: beware of __put_task_struct() calling context Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping locks. Therefore, it can't be called from an non-preemptible context. One practical example is splat inside inactive_task_timer(), which is called in a interrupt context: CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W --------- Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012 Call Trace: dump_stack_lvl+0x57/0x7d mark_lock_irq.cold+0x33/0xba mark_lock+0x1e7/0x400 mark_usage+0x11d/0x140 __lock_acquire+0x30d/0x930 lock_acquire.part.0+0x9c/0x210 rt_spin_lock+0x27/0xe0 refill_obj_stock+0x3d/0x3a0 kmem_cache_free+0x357/0x560 inactive_task_timer+0x1ad/0x340 __run_hrtimer+0x8a/0x1a0 __hrtimer_run_queues+0x91/0x130 hrtimer_interrupt+0x10f/0x220 __sysvec_apic_timer_interrupt+0x7b/0xd0 sysvec_apic_timer_interrupt+0x4f/0xd0 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0033:0x7fff196bf6f5 Instead of calling __put_task_struct() directly, we defer it using call_rcu(). A more natural approach would use a workqueue, but since in PREEMPT_RT, we can't allocate dynamic memory from atomic context, the code would become more complex because we would need to put the work_struct instance in the task_struct and initialize it when we allocate a new task_struct. The issue is reproducible with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done Reported-by: Hu Chunyu Suggested-by: Oleg Nesterov Suggested-by: Valentin Schneider Suggested-by: Peter Zijlstra Signed-off-by: Wander Lairson Costa Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20230614122323.37957-2-wander@redhat.com --- include/linux/sched/task.h | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index dd35ce28bb90..6b687c155fb6 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -118,10 +118,36 @@ static inline struct task_struct *get_task_struct(struct task_struct *t) } extern void __put_task_struct(struct task_struct *t); +extern void __put_task_struct_rcu_cb(struct rcu_head *rhp); static inline void put_task_struct(struct task_struct *t) { - if (refcount_dec_and_test(&t->usage)) + if (!refcount_dec_and_test(&t->usage)) + return; + + /* + * under PREEMPT_RT, we can't call put_task_struct + * in atomic context because it will indirectly + * acquire sleeping locks. + * + * call_rcu() will schedule delayed_put_task_struct_rcu() + * to be called in process context. + * + * __put_task_struct() is called when + * refcount_dec_and_test(&t->usage) succeeds. + * + * This means that it can't "conflict" with + * put_task_struct_rcu_user() which abuses ->rcu the same + * way; rcu_users has a reference so task->usage can't be + * zero after rcu_users 1 -> 0 transition. + * + * delayed_free_task() also uses ->rcu, but it is only called + * when it fails to fork a process. Therefore, there is no + * way it can conflict with put_task_struct(). + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) + call_rcu(&t->rcu, __put_task_struct_rcu_cb); + else __put_task_struct(t); } -- cgit v1.2.3 From 893cdaaa3977be6afb3a7f756fbfd7be83f68d8c Mon Sep 17 00:00:00 2001 From: Wander Lairson Costa Date: Wed, 14 Jun 2023 09:23:22 -0300 Subject: sched: avoid false lockdep splat in put_task_struct() In put_task_struct(), a spin_lock is indirectly acquired under the kernel stock. When running the kernel in real-time (RT) configuration, the operation is dispatched to a preemptible context call to ensure guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled and __put_task_struct() is called while holding a raw_spinlock, lockdep incorrectly reports an "Invalid lock context" in the stock kernel. This false splat occurs because lockdep is unaware of the different route taken under RT. To address this issue, override the inner wait type to prevent the false lockdep splat. Suggested-by: Oleg Nesterov Suggested-by: Sebastian Andrzej Siewior Suggested-by: Peter Zijlstra Signed-off-by: Wander Lairson Costa Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20230614122323.37957-3-wander@redhat.com --- include/linux/sched/task.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 6b687c155fb6..a23af225c898 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -125,6 +125,19 @@ static inline void put_task_struct(struct task_struct *t) if (!refcount_dec_and_test(&t->usage)) return; + /* + * In !RT, it is always safe to call __put_task_struct(). + * Under RT, we can only call it in preemptible context. + */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) { + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); + + lock_map_acquire_try(&put_task_map); + __put_task_struct(t); + lock_map_release(&put_task_map); + return; + } + /* * under PREEMPT_RT, we can't call put_task_struct * in atomic context because it will indirectly @@ -145,10 +158,7 @@ static inline void put_task_struct(struct task_struct *t) * when it fails to fork a process. Therefore, there is no * way it can conflict with put_task_struct(). */ - if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) - call_rcu(&t->rcu, __put_task_struct_rcu_cb); - else - __put_task_struct(t); + call_rcu(&t->rcu, __put_task_struct_rcu_cb); } DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T)) -- cgit v1.2.3 From 677ea015f231aa38b3972aa7be54ecd2637e99fd Mon Sep 17 00:00:00 2001 From: Josh Don Date: Tue, 20 Jun 2023 11:32:47 -0700 Subject: sched: add throttled time stat for throttled children We currently export the total throttled time for cgroups that are given a bandwidth limit. This patch extends this accounting to also account the total time that each children cgroup has been throttled. This is useful to understand the degree to which children have been affected by the throttling control. Children which are not runnable during the entire throttled period, for example, will not show any self-throttling time during this period. Expose this in a new interface, 'cpu.stat.local', which is similar to how non-hierarchical events are accounted in 'memory.events.local'. Signed-off-by: Josh Don Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20230620183247.737942-2-joshdon@google.com --- include/linux/cgroup-defs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 8a0d5466c7be..ae20dbb885d6 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -661,6 +661,8 @@ struct cgroup_subsys { void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu); int (*css_extra_stat_show)(struct seq_file *seq, struct cgroup_subsys_state *css); + int (*css_local_stat_show)(struct seq_file *seq, + struct cgroup_subsys_state *css); int (*can_attach)(struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup_taskset *tset); -- cgit v1.2.3 From 548796e2e70b44b4661fd7feee6eb239245ff1f8 Mon Sep 17 00:00:00 2001 From: Cruz Zhao Date: Thu, 29 Jun 2023 12:02:04 +0800 Subject: sched/core: introduce sched_core_idle_cpu() As core scheduling introduced, a new state of idle is defined as force idle, running idle task but nr_running greater than zero. If a cpu is in force idle state, idle_cpu() will return zero. This result makes sense in some scenarios, e.g., load balance, showacpu when dumping, and judge the RCU boost kthread is starving. But this will cause error in other scenarios, e.g., tick_irq_exit(): When force idle, rq->curr == rq->idle but rq->nr_running > 0, results that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu() is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will not become 1, which became 0 in tick_nohz_irq_enter(). ts->idle_sleeptime won't update in function update_ts_time_stats(), if ts->idle_active is 0, which should be 1. And this bug will result that ts->idle_sleeptime is less than the actual value, and finally will result that the idle time in /proc/stat is less than the actual value. To solve this problem, we introduce sched_core_idle_cpu(), which returns 1 when force idle. We audit all users of idle_cpu(), and change idle_cpu() into sched_core_idle_cpu() in function tick_irq_exit(). v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in function tick_irq_exit(). And modify the corresponding commit log. Signed-off-by: Cruz Zhao Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Peter Zijlstra Reviewed-by: Frederic Weisbecker Reviewed-by: Joel Fernandes Link: https://lore.kernel.org/r/1688011324-42406-1-git-send-email-CruzZhao@linux.alibaba.com --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 609bde814cb0..efc9f4bdc4ca 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2433,9 +2433,11 @@ extern void sched_core_free(struct task_struct *tsk); extern void sched_core_fork(struct task_struct *p); extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type, unsigned long uaddr); +extern int sched_core_idle_cpu(int cpu); #else static inline void sched_core_free(struct task_struct *tsk) { } static inline void sched_core_fork(struct task_struct *p) { } +static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); } #endif extern void sched_set_stop_task(int cpu, struct task_struct *stop); -- cgit v1.2.3 From 69b264df8a412820e98867dbab871c6526c5e5aa Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 10 Jul 2023 18:21:35 -0500 Subject: PCI/AER: Drop unused pci_disable_pcie_error_reporting() pci_disable_pcie_error_reporting() has no callers. Remove it. Link: https://lore.kernel.org/r/20230710232136.233034-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Christoph Hellwig Reviewed-by: Kuppuswamy Sathyanarayanan --- include/linux/aer.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/aer.h b/include/linux/aer.h index 3a3ab05e13fd..aadc9242cb20 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -43,17 +43,12 @@ struct aer_capability_regs { #if defined(CONFIG_PCIEAER) /* PCIe port driver needs this function to enable AER */ int pci_enable_pcie_error_reporting(struct pci_dev *dev); -int pci_disable_pcie_error_reporting(struct pci_dev *dev); int pci_aer_clear_nonfatal_status(struct pci_dev *dev); #else static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev) { return -EINVAL; } -static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev) -{ - return -EINVAL; -} static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) { return -EINVAL; -- cgit v1.2.3 From 7ec4b34be4234599cf1241ef807cdb7c3636f6fe Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 10 Jul 2023 18:21:36 -0500 Subject: PCI/AER: Unexport pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() is used only inside aer.c. Stop exposing it outside the file. Link: https://lore.kernel.org/r/20230710232136.233034-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Reviewed-by: Christoph Hellwig Reviewed-by: Kuppuswamy Sathyanarayanan --- include/linux/aer.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/aer.h b/include/linux/aer.h index aadc9242cb20..2dd175f5debd 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -41,14 +41,8 @@ struct aer_capability_regs { }; #if defined(CONFIG_PCIEAER) -/* PCIe port driver needs this function to enable AER */ -int pci_enable_pcie_error_reporting(struct pci_dev *dev); int pci_aer_clear_nonfatal_status(struct pci_dev *dev); #else -static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev) -{ - return -EINVAL; -} static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev) { return -EINVAL; -- cgit v1.2.3 From 849846c41497e481c964b69ab1c65d65958aac28 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 18 Jun 2023 18:24:54 +0200 Subject: PCI: Reorder pci_dev fields to reduce holes Group some bitfield variables to reduce holes. On x86_64, this shrinks the size of 'struct pci_dev' by 16 bytes (from 3576 to 3560) when compiled with 'allmodconfig'. Link: https://lore.kernel.org/r/407b17c3e56764ef2c558898d4ff4c6c04b2d757.1687105455.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index c69a2cc1f412..106754757279 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -366,8 +366,8 @@ struct pci_dev { pci_power_t current_state; /* Current operating state. In ACPI, this is D0-D3, D0 being fully functional, and D3 being off. */ - unsigned int imm_ready:1; /* Supports Immediate Readiness */ u8 pm_cap; /* PM capability offset */ + unsigned int imm_ready:1; /* Supports Immediate Readiness */ unsigned int pme_support:5; /* Bitmask of states from which PME# can be generated */ unsigned int pme_poll:1; /* Poll device's PME status bit */ @@ -392,9 +392,9 @@ struct pci_dev { #ifdef CONFIG_PCIEASPM struct pcie_link_state *link_state; /* ASPM link state */ + u16 l1ss; /* L1SS Capability pointer */ unsigned int ltr_path:1; /* Latency Tolerance Reporting supported from root to here */ - u16 l1ss; /* L1SS Capability pointer */ #endif unsigned int pasid_no_tlp:1; /* PASID works without TLP Prefix */ unsigned int eetlp_prefix_path:1; /* End-to-End TLP Prefix */ -- cgit v1.2.3 From 091f9f7f3b819434eed5c1c3acad6c7b32bf13f6 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 18 Jun 2023 18:24:55 +0200 Subject: PCI: Change pdev->rom_attr_enabled to single bit Make 'rom_attr_enabled' a single bit in a bitfield and move it close to an existing bitfield so that they can be merged. This field is only used in 'drivers/pci/pci-sysfs.c' to store 0 or 1. On x86_64, this shrinks the size of 'struct pci_dev' by 8 bytes from 3560 to 3552. Link: https://lore.kernel.org/r/d7a34ad369336db73145c3efbade895e792a0ad3.1687105455.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 106754757279..0ff7500772e6 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -464,12 +464,12 @@ struct pci_dev { unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ unsigned int no_command_memory:1; /* No PCI_COMMAND_MEMORY */ unsigned int rom_bar_overlap:1; /* ROM BAR disable broken */ + unsigned int rom_attr_enabled:1; /* Display of ROM attribute enabled? */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space; - int rom_attr_enabled; /* Display of ROM attribute enabled? */ struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */ struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */ -- cgit v1.2.3 From d26979f1cef71d6fa036f6cedfa0059715092503 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:50 +0200 Subject: net: stmmac: replace the has_integrated_pcs field with a flag struct plat_stmmacenet_data contains several boolean fields that could be easily replaced with a common integer 'flags' bitfield and bit defines. Start the process with the has_integrated_pcs field. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-2-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 06090538fe2d..8e7511071ef1 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -204,6 +204,8 @@ struct dwmac4_addrs { u32 mtl_low_cred_offset; }; +#define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) + struct plat_stmmacenet_data { int bus_id; int phy_addr; @@ -293,6 +295,6 @@ struct plat_stmmacenet_data { bool sph_disable; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; - bool has_integrated_pcs; + unsigned int flags; }; #endif -- cgit v1.2.3 From 309efe6eb499d04b7c09e57298c453b602efe3fd Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:51 +0200 Subject: net: stmmac: replace the sph_disable field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-3-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 8e7511071ef1..1b02f866316c 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -205,6 +205,7 @@ struct dwmac4_addrs { }; #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) +#define STMMAC_FLAG_SPH_DISABLE BIT(1) struct plat_stmmacenet_data { int bus_id; @@ -292,7 +293,6 @@ struct plat_stmmacenet_data { int msi_rx_base_vec; int msi_tx_base_vec; bool use_phy_wol; - bool sph_disable; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; -- cgit v1.2.3 From fd1d62d80ebc1a68ba700e92c9da9d443c08f371 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:52 +0200 Subject: net: stmmac: replace the use_phy_wol field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-4-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 1b02f866316c..15fb07cc89c8 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -206,6 +206,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) #define STMMAC_FLAG_SPH_DISABLE BIT(1) +#define STMMAC_FLAG_USE_PHY_WOL BIT(2) struct plat_stmmacenet_data { int bus_id; @@ -292,7 +293,6 @@ struct plat_stmmacenet_data { int msi_sfty_ue_vec; int msi_rx_base_vec; int msi_tx_base_vec; - bool use_phy_wol; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; -- cgit v1.2.3 From d8daff284e305409fd640feda7345d0221d782ce Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:53 +0200 Subject: net: stmmac: replace the has_sun8i field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-5-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 15fb07cc89c8..66dcf84d024a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -207,6 +207,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) #define STMMAC_FLAG_SPH_DISABLE BIT(1) #define STMMAC_FLAG_USE_PHY_WOL BIT(2) +#define STMMAC_FLAG_HAS_SUN8I BIT(3) struct plat_stmmacenet_data { int bus_id; @@ -270,7 +271,6 @@ struct plat_stmmacenet_data { struct reset_control *stmmac_ahb_rst; struct stmmac_axi *axi; int has_gmac4; - bool has_sun8i; bool tso_en; int rss_en; int mac_port_sel_speed; -- cgit v1.2.3 From 68861a3bcc1caf5c15a56e02090310271fd085e1 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:54 +0200 Subject: net: stmmac: replace the tso_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-6-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 66dcf84d024a..47ae29a98835 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -208,6 +208,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_SPH_DISABLE BIT(1) #define STMMAC_FLAG_USE_PHY_WOL BIT(2) #define STMMAC_FLAG_HAS_SUN8I BIT(3) +#define STMMAC_FLAG_TSO_EN BIT(4) struct plat_stmmacenet_data { int bus_id; @@ -271,7 +272,6 @@ struct plat_stmmacenet_data { struct reset_control *stmmac_ahb_rst; struct stmmac_axi *axi; int has_gmac4; - bool tso_en; int rss_en; int mac_port_sel_speed; bool en_tx_lpi_clockgating; -- cgit v1.2.3 From efe92571bfc30f251f6f2fa7828aa5f239736abf Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:55 +0200 Subject: net: stmmac: replace the serdes_up_after_phy_linkup field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-7-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 47ae29a98835..aeb3e75dc748 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -209,6 +209,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_USE_PHY_WOL BIT(2) #define STMMAC_FLAG_HAS_SUN8I BIT(3) #define STMMAC_FLAG_TSO_EN BIT(4) +#define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) struct plat_stmmacenet_data { int bus_id; @@ -293,7 +294,6 @@ struct plat_stmmacenet_data { int msi_sfty_ue_vec; int msi_rx_base_vec; int msi_tx_base_vec; - bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; }; -- cgit v1.2.3 From fc02152bdbb28bd46df66ddcf4f469760c1b8df8 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:56 +0200 Subject: net: stmmac: replace the vlan_fail_q_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-8-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index aeb3e75dc748..155cb11b1c8a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -210,6 +210,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_SUN8I BIT(3) #define STMMAC_FLAG_TSO_EN BIT(4) #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) +#define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) struct plat_stmmacenet_data { int bus_id; @@ -278,7 +279,6 @@ struct plat_stmmacenet_data { bool en_tx_lpi_clockgating; bool rx_clk_runs_in_lpi; int has_xgmac; - bool vlan_fail_q_en; u8 vlan_fail_q; unsigned int eee_usecs_rate; struct pci_dev *pdev; -- cgit v1.2.3 From 956c3f09b9c4cc9567ca11c84007545b939e61aa Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:57 +0200 Subject: net: stmmac: replace the multi_msi_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-9-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 155cb11b1c8a..3365b8071686 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -211,6 +211,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_TSO_EN BIT(4) #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) +#define STMMAC_FLAG_MULTI_MSI_EN BIT(7) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { int ext_snapshot_num; bool int_snapshot_en; bool ext_snapshot_en; - bool multi_msi_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From aa5513f5d95f6b5311de859f8f466a09863bedf6 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:58 +0200 Subject: net: stmmac: replace the ext_snapshot_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-10-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 3365b8071686..0a77e8b05d3a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -212,6 +212,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) +#define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { int int_snapshot_num; int ext_snapshot_num; bool int_snapshot_en; - bool ext_snapshot_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From 621ba7ad7891b381baf9ebf3da4ec4e95c86ea4e Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:59 +0200 Subject: net: stmmac: replace the int_snapshot_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-11-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 0a77e8b05d3a..47708ddd57fd 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -213,6 +213,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) +#define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { struct pci_dev *pdev; int int_snapshot_num; int ext_snapshot_num; - bool int_snapshot_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From 743dd1db85f40be1e2c7416c83f0289aaa260ceb Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 11:00:00 +0200 Subject: net: stmmac: replace the rx_clk_runs_in_lpi field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-12-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 47708ddd57fd..c3769dad8238 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -214,6 +214,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) +#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) struct plat_stmmacenet_data { int bus_id; @@ -280,7 +281,6 @@ struct plat_stmmacenet_data { int rss_en; int mac_port_sel_speed; bool en_tx_lpi_clockgating; - bool rx_clk_runs_in_lpi; int has_xgmac; u8 vlan_fail_q; unsigned int eee_usecs_rate; -- cgit v1.2.3 From 9d0c0d5ebd635f914ab2ab691b68e8754fbe0a57 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 11:00:01 +0200 Subject: net: stmmac: replace the en_tx_lpi_clockgating field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-13-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index c3769dad8238..ef67dba775d0 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -215,6 +215,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) +#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING BIT(11) struct plat_stmmacenet_data { int bus_id; @@ -280,7 +281,6 @@ struct plat_stmmacenet_data { int has_gmac4; int rss_en; int mac_port_sel_speed; - bool en_tx_lpi_clockgating; int has_xgmac; u8 vlan_fail_q; unsigned int eee_usecs_rate; -- cgit v1.2.3 From 4dbb9e2322a3a9c912ce796c20c27045ae8dae22 Mon Sep 17 00:00:00 2001 From: Stephan Gerhold Date: Thu, 15 Jun 2023 18:50:40 +0200 Subject: soc: qcom: smem: Add qcom_smem_is_available() Avoid having to look up a dummy item from SMEM to detect if it is already available or if we need to defer probing. Reviewed-by: Konrad Dybcio Signed-off-by: Stephan Gerhold Link: https://lore.kernel.org/r/20230531-rpm-rproc-v3-7-a07dcdefd918@gerhold.net Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/smem.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smem.h b/include/linux/soc/qcom/smem.h index 223db6a9c733..a36a3b9d4929 100644 --- a/include/linux/soc/qcom/smem.h +++ b/include/linux/soc/qcom/smem.h @@ -4,6 +4,7 @@ #define QCOM_SMEM_HOST_ANY -1 +bool qcom_smem_is_available(void); int qcom_smem_alloc(unsigned host, unsigned item, size_t size); void *qcom_smem_get(unsigned host, unsigned item, size_t *size); -- cgit v1.2.3 From 5b52ad34f9487b2c2d1e60fe37e5bd5656b4dac8 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Tue, 11 Jul 2023 15:06:08 +0200 Subject: security: Constify sk in the sk_getsecid hook. The sk_getsecid hook shouldn't need to modify its socket argument. Make it const so that callers of security_sk_classify_flow() can use a const struct sock *. Signed-off-by: Guillaume Nault Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 7308a1a7599b..4f2621e87634 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -316,7 +316,7 @@ LSM_HOOK(int, 0, sk_alloc_security, struct sock *sk, int family, gfp_t priority) LSM_HOOK(void, LSM_RET_VOID, sk_free_security, struct sock *sk) LSM_HOOK(void, LSM_RET_VOID, sk_clone_security, const struct sock *sk, struct sock *newsk) -LSM_HOOK(void, LSM_RET_VOID, sk_getsecid, struct sock *sk, u32 *secid) +LSM_HOOK(void, LSM_RET_VOID, sk_getsecid, const struct sock *sk, u32 *secid) LSM_HOOK(void, LSM_RET_VOID, sock_graft, struct sock *sk, struct socket *parent) LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb, struct request_sock *req) diff --git a/include/linux/security.h b/include/linux/security.h index 32828502f09e..994cf099d9ac 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1439,7 +1439,8 @@ int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u int security_sk_alloc(struct sock *sk, int family, gfp_t priority); void security_sk_free(struct sock *sk); void security_sk_clone(const struct sock *sk, struct sock *newsk); -void security_sk_classify_flow(struct sock *sk, struct flowi_common *flic); +void security_sk_classify_flow(const struct sock *sk, + struct flowi_common *flic); void security_req_classify_flow(const struct request_sock *req, struct flowi_common *flic); void security_sock_graft(struct sock*sk, struct socket *parent); @@ -1597,7 +1598,7 @@ static inline void security_sk_clone(const struct sock *sk, struct sock *newsk) { } -static inline void security_sk_classify_flow(struct sock *sk, +static inline void security_sk_classify_flow(const struct sock *sk, struct flowi_common *flic) { } -- cgit v1.2.3 From 5bc67a854cb4982aa7746e8d2206a00b46a9cc0f Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Tue, 11 Jul 2023 15:06:21 +0200 Subject: ipv6: Constify the sk parameter of several helper functions. icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't need to modify their sk argument. Make that explicit using const. Signed-off-by: Guillaume Nault Reviewed-by: Simon Horman Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/icmpv6.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h index db0f4fcfdaf4..e3b3b0fa2a8f 100644 --- a/include/linux/icmpv6.h +++ b/include/linux/icmpv6.h @@ -85,12 +85,10 @@ extern void icmpv6_param_prob_reason(struct sk_buff *skb, struct flowi6; struct in6_addr; -extern void icmpv6_flow_init(struct sock *sk, - struct flowi6 *fl6, - u8 type, - const struct in6_addr *saddr, - const struct in6_addr *daddr, - int oif); + +void icmpv6_flow_init(const struct sock *sk, struct flowi6 *fl6, u8 type, + const struct in6_addr *saddr, + const struct in6_addr *daddr, int oif); static inline void icmpv6_param_prob(struct sk_buff *skb, u8 code, int pos) { -- cgit v1.2.3 From 90ef0a7b0622c62758b2638604927867775479ea Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:07 +0100 Subject: net: phylink: add pcs_enable()/pcs_disable() methods Add phylink PCS enable/disable callbacks that will allow us to place IEEE 802.3 register compliant PCS in power-down mode while not being used. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 1817940a3418..8e2fdd48a935 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -535,6 +535,8 @@ struct phylink_pcs { /** * struct phylink_pcs_ops - MAC PCS operations structure. * @pcs_validate: validate the link configuration. + * @pcs_enable: enable the PCS. + * @pcs_disable: disable the PCS. * @pcs_get_state: read the current MAC PCS link state from the hardware. * @pcs_config: configure the MAC PCS for the selected mode and state. * @pcs_an_restart: restart 802.3z BaseX autonegotiation. @@ -544,6 +546,8 @@ struct phylink_pcs { struct phylink_pcs_ops { int (*pcs_validate)(struct phylink_pcs *pcs, unsigned long *supported, const struct phylink_link_state *state); + int (*pcs_enable)(struct phylink_pcs *pcs); + void (*pcs_disable)(struct phylink_pcs *pcs); void (*pcs_get_state)(struct phylink_pcs *pcs, struct phylink_link_state *state); int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode, @@ -573,6 +577,18 @@ struct phylink_pcs_ops { int pcs_validate(struct phylink_pcs *pcs, unsigned long *supported, const struct phylink_link_state *state); +/** + * pcs_enable() - enable the PCS. + * @pcs: a pointer to a &struct phylink_pcs. + */ +int pcs_enable(struct phylink_pcs *pcs); + +/** + * pcs_disable() - disable the PCS. + * @pcs: a pointer to a &struct phylink_pcs. + */ +void pcs_disable(struct phylink_pcs *pcs); + /** * pcs_get_state() - Read the current inband link state from the hardware * @pcs: a pointer to a &struct phylink_pcs. -- cgit v1.2.3 From aee6098822ed8a298ad817da8339ba4c7ea381fe Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:12 +0100 Subject: net: phylink: add pcs_pre_config()/pcs_post_config() methods Add hooks that are called before and after the mac_config() call, which will be needed to deal with errata workarounds for the Marvell 88e639x DSA switches. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 8e2fdd48a935..99fc2fa60695 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -537,6 +537,8 @@ struct phylink_pcs { * @pcs_validate: validate the link configuration. * @pcs_enable: enable the PCS. * @pcs_disable: disable the PCS. + * @pcs_pre_config: pre-mac_config method (for errata) + * @pcs_post_config: post-mac_config method (for arrata) * @pcs_get_state: read the current MAC PCS link state from the hardware. * @pcs_config: configure the MAC PCS for the selected mode and state. * @pcs_an_restart: restart 802.3z BaseX autonegotiation. @@ -548,6 +550,10 @@ struct phylink_pcs_ops { const struct phylink_link_state *state); int (*pcs_enable)(struct phylink_pcs *pcs); void (*pcs_disable)(struct phylink_pcs *pcs); + void (*pcs_pre_config)(struct phylink_pcs *pcs, + phy_interface_t interface); + int (*pcs_post_config)(struct phylink_pcs *pcs, + phy_interface_t interface); void (*pcs_get_state)(struct phylink_pcs *pcs, struct phylink_link_state *state); int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode, -- cgit v1.2.3 From 24699cc1ff3e633d7c3a0d3ef394243db11757ec Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:17 +0100 Subject: net: phylink: add support for PCS link change notifications Add a function, phylink_pcs_change() which can be used by PCs drivers to notify phylink about changes to the PCS link state. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 99fc2fa60695..b28aa3eef7d5 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -9,6 +9,7 @@ struct device_node; struct ethtool_cmd; struct fwnode_handle; struct net_device; +struct phylink; enum { MLO_PAUSE_NONE, @@ -520,14 +521,19 @@ struct phylink_pcs_ops; /** * struct phylink_pcs - PHYLINK PCS instance * @ops: a pointer to the &struct phylink_pcs_ops structure + * @phylink: pointer to &struct phylink_config * @neg_mode: provide PCS neg mode via "mode" argument * @poll: poll the PCS for link changes * * This structure is designed to be embedded within the PCS private data, * and will be passed between phylink and the PCS. + * + * The @phylink member is private to phylink and must not be touched by + * the PCS driver. */ struct phylink_pcs { const struct phylink_pcs_ops *ops; + struct phylink *phylink; bool neg_mode; bool poll; }; @@ -699,6 +705,7 @@ int phylink_fwnode_phy_connect(struct phylink *pl, void phylink_disconnect_phy(struct phylink *); void phylink_mac_change(struct phylink *, bool up); +void phylink_pcs_change(struct phylink_pcs *, bool up); void phylink_start(struct phylink *); void phylink_stop(struct phylink *); -- cgit v1.2.3 From e6a45700e7e19b1c945ee56feab429ff8489370b Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:22 +0100 Subject: net: mdio: add unlocked mdiobus and mdiodev bus accessors Add the following unlocked accessors to complete the set: __mdiobus_modify() __mdiodev_read() __mdiodev_write() __mdiodev_modify() __mdiodev_modify_changed() which we will need for Marvell DSA PCS conversion. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/mdio.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdio.h b/include/linux/mdio.h index c1b7008826e5..8fa23bdcedbf 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -537,6 +537,8 @@ static inline void mii_c73_mod_linkmode(unsigned long *adv, u16 *lpa) int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum); int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val); +int __mdiobus_modify(struct mii_bus *bus, int addr, u32 regnum, u16 mask, + u16 set); int __mdiobus_modify_changed(struct mii_bus *bus, int addr, u32 regnum, u16 mask, u16 set); @@ -564,6 +566,30 @@ int mdiobus_c45_modify(struct mii_bus *bus, int addr, int devad, u32 regnum, int mdiobus_c45_modify_changed(struct mii_bus *bus, int addr, int devad, u32 regnum, u16 mask, u16 set); +static inline int __mdiodev_read(struct mdio_device *mdiodev, u32 regnum) +{ + return __mdiobus_read(mdiodev->bus, mdiodev->addr, regnum); +} + +static inline int __mdiodev_write(struct mdio_device *mdiodev, u32 regnum, + u16 val) +{ + return __mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val); +} + +static inline int __mdiodev_modify(struct mdio_device *mdiodev, u32 regnum, + u16 mask, u16 set) +{ + return __mdiobus_modify(mdiodev->bus, mdiodev->addr, regnum, mask, set); +} + +static inline int __mdiodev_modify_changed(struct mdio_device *mdiodev, + u32 regnum, u16 mask, u16 set) +{ + return __mdiobus_modify_changed(mdiodev->bus, mdiodev->addr, regnum, + mask, set); +} + static inline int mdiodev_read(struct mdio_device *mdiodev, u32 regnum) { return mdiobus_read(mdiodev->bus, mdiodev->addr, regnum); -- cgit v1.2.3 From 1b3aa9701bd2975f7e08aac6f13fbd75d1df8eac Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Thu, 13 Jul 2023 13:56:38 +0200 Subject: platform/x86: simatic-ipc: add another model BX-21A This adds support for the Siemens Simatic IPC model BX-21A. Actual drivers for that model will be sent in separate patches. Signed-off-by: Henning Schild Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230713115639.16419-2-henning.schild@siemens.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc-base.h | 3 ++- include/linux/platform_data/x86/simatic-ipc.h | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc-base.h b/include/linux/platform_data/x86/simatic-ipc-base.h index 57d6a10dfc9e..68c455f5edad 100644 --- a/include/linux/platform_data/x86/simatic-ipc-base.h +++ b/include/linux/platform_data/x86/simatic-ipc-base.h @@ -2,7 +2,7 @@ /* * Siemens SIMATIC IPC drivers * - * Copyright (c) Siemens AG, 2018-2021 + * Copyright (c) Siemens AG, 2018-2023 * * Authors: * Henning Schild @@ -20,6 +20,7 @@ #define SIMATIC_IPC_DEVICE_127E 3 #define SIMATIC_IPC_DEVICE_227E 4 #define SIMATIC_IPC_DEVICE_227G 5 +#define SIMATIC_IPC_DEVICE_BX_21A 6 struct simatic_ipc_platform { u8 devmode; diff --git a/include/linux/platform_data/x86/simatic-ipc.h b/include/linux/platform_data/x86/simatic-ipc.h index a48bb5240977..1a8e4c1099e3 100644 --- a/include/linux/platform_data/x86/simatic-ipc.h +++ b/include/linux/platform_data/x86/simatic-ipc.h @@ -2,7 +2,7 @@ /* * Siemens SIMATIC IPC drivers * - * Copyright (c) Siemens AG, 2018-2021 + * Copyright (c) Siemens AG, 2018-2023 * * Authors: * Henning Schild @@ -34,6 +34,7 @@ enum simatic_ipc_station_ids { SIMATIC_IPC_IPC227G = 0x00000F01, SIMATIC_IPC_IPCBX_39A = 0x00001001, SIMATIC_IPC_IPCPX_39A = 0x00001002, + SIMATIC_IPC_IPCBX_21A = 0x00001101, }; static inline u32 simatic_ipc_get_station_id(u8 *data, int max_len) -- cgit v1.2.3 From a076a860acae77bbdcbd316541e5552e81fb1772 Mon Sep 17 00:00:00 2001 From: Luca Ceresoli Date: Mon, 19 Jun 2023 14:22:06 +0200 Subject: media: i2c: add I2C Address Translator (ATR) support An ATR is a device that looks similar to an i2c-mux: it has an I2C slave "upstream" port and N master "downstream" ports, and forwards transactions from upstream to the appropriate downstream port. But it is different in that the forwarded transaction has a different slave address. The address used on the upstream bus is called the "alias" and is (potentially) different from the physical slave address of the downstream chip. Add a helper file (just like i2c-mux.c for a mux or switch) to allow implementing ATR features in a device driver. The helper takes care of adapter creation/destruction and translates addresses at each transaction. Signed-off-by: Luca Ceresoli Signed-off-by: Tomi Valkeinen Reviewed-by: Andy Shevchenko Acked-by: Wolfram Sang Signed-off-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab --- include/linux/i2c-atr.h | 116 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 include/linux/i2c-atr.h (limited to 'include/linux') diff --git a/include/linux/i2c-atr.h b/include/linux/i2c-atr.h new file mode 100644 index 000000000000..4d5da161c225 --- /dev/null +++ b/include/linux/i2c-atr.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * I2C Address Translator + * + * Copyright (c) 2019,2022 Luca Ceresoli + * Copyright (c) 2022,2023 Tomi Valkeinen + * + * Based on i2c-mux.h + */ + +#ifndef _LINUX_I2C_ATR_H +#define _LINUX_I2C_ATR_H + +#include +#include + +struct device; +struct fwnode_handle; +struct i2c_atr; + +/** + * struct i2c_atr_ops - Callbacks from ATR to the device driver. + * @attach_client: Notify the driver of a new device connected on a child + * bus, with the alias assigned to it. The driver must + * configure the hardware to use the alias. + * @detach_client: Notify the driver of a device getting disconnected. The + * driver must configure the hardware to stop using the + * alias. + * + * All these functions return 0 on success, a negative error code otherwise. + */ +struct i2c_atr_ops { + int (*attach_client)(struct i2c_atr *atr, u32 chan_id, + const struct i2c_client *client, u16 alias); + void (*detach_client)(struct i2c_atr *atr, u32 chan_id, + const struct i2c_client *client); +}; + +/** + * i2c_atr_new() - Allocate and initialize an I2C ATR helper. + * @parent: The parent (upstream) adapter + * @dev: The device acting as an ATR + * @ops: Driver-specific callbacks + * @max_adapters: Maximum number of child adapters + * + * The new ATR helper is connected to the parent adapter but has no child + * adapters. Call i2c_atr_add_adapter() to add some. + * + * Call i2c_atr_delete() to remove. + * + * Return: pointer to the new ATR helper object, or ERR_PTR + */ +struct i2c_atr *i2c_atr_new(struct i2c_adapter *parent, struct device *dev, + const struct i2c_atr_ops *ops, int max_adapters); + +/** + * i2c_atr_delete - Delete an I2C ATR helper. + * @atr: I2C ATR helper to be deleted. + * + * Precondition: all the adapters added with i2c_atr_add_adapter() must be + * removed by calling i2c_atr_del_adapter(). + */ +void i2c_atr_delete(struct i2c_atr *atr); + +/** + * i2c_atr_add_adapter - Create a child ("downstream") I2C bus. + * @atr: The I2C ATR + * @chan_id: Index of the new adapter (0 .. max_adapters-1). This value is + * passed to the callbacks in `struct i2c_atr_ops`. + * @adapter_parent: The device used as the parent of the new i2c adapter, or NULL + * to use the i2c-atr device as the parent. + * @bus_handle: The fwnode handle that points to the adapter's i2c + * peripherals, or NULL. + * + * After calling this function a new i2c bus will appear. Adding and removing + * devices on the downstream bus will result in calls to the + * &i2c_atr_ops->attach_client and &i2c_atr_ops->detach_client callbacks for the + * driver to assign an alias to the device. + * + * The adapter's fwnode is set to @bus_handle, or if @bus_handle is NULL the + * function looks for a child node whose 'reg' property matches the chan_id + * under the i2c-atr device's 'i2c-atr' node. + * + * Call i2c_atr_del_adapter() to remove the adapter. + * + * Return: 0 on success, a negative error code otherwise. + */ +int i2c_atr_add_adapter(struct i2c_atr *atr, u32 chan_id, + struct device *adapter_parent, + struct fwnode_handle *bus_handle); + +/** + * i2c_atr_del_adapter - Remove a child ("downstream") I2C bus added by + * i2c_atr_add_adapter(). If no I2C bus has been added + * this function is a no-op. + * @atr: The I2C ATR + * @chan_id: Index of the adapter to be removed (0 .. max_adapters-1) + */ +void i2c_atr_del_adapter(struct i2c_atr *atr, u32 chan_id); + +/** + * i2c_atr_set_driver_data - Set private driver data to the i2c-atr instance. + * @atr: The I2C ATR + * @data: Pointer to the data to store + */ +void i2c_atr_set_driver_data(struct i2c_atr *atr, void *data); + +/** + * i2c_atr_get_driver_data - Get the stored drive data. + * @atr: The I2C ATR + * + * Return: Pointer to the stored data + */ +void *i2c_atr_get_driver_data(struct i2c_atr *atr); + +#endif /* _LINUX_I2C_ATR_H */ -- cgit v1.2.3 From 917f543407947a007052edf7f89854259e6be008 Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Thu, 6 Jul 2023 17:48:31 +0200 Subject: platform/x86: simatic-ipc: add CMOS battery monitoring Siemens Simatic Industrial PCs can monitor the voltage of the CMOS battery with two bits that indicate low or empty state. This can be GPIO or PortIO based. Here we model that as a hwmon voltage. The core driver does the PortIO and provides boilerplate for the GPIO versions. Which are split out to model runtime dependencies while allowing fine-grained kernel configuration. Signed-off-by: Henning Schild Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20230706154831.19100-3-henning.schild@siemens.com Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc-base.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc-base.h b/include/linux/platform_data/x86/simatic-ipc-base.h index 68c455f5edad..4ca21065c862 100644 --- a/include/linux/platform_data/x86/simatic-ipc-base.h +++ b/include/linux/platform_data/x86/simatic-ipc-base.h @@ -21,6 +21,7 @@ #define SIMATIC_IPC_DEVICE_227E 4 #define SIMATIC_IPC_DEVICE_227G 5 #define SIMATIC_IPC_DEVICE_BX_21A 6 +#define SIMATIC_IPC_DEVICE_BX_39A 7 struct simatic_ipc_platform { u8 devmode; -- cgit v1.2.3 From 8529673adc2b022d40917b202b17bdabeef8e37a Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Thu, 13 Jul 2023 16:48:30 +0200 Subject: platform/x86: simatic-ipc: add another model This is the panel variant of a device we already did have. All the same, just no LEDs. Signed-off-by: Henning Schild Link: https://lore.kernel.org/r/20230713144832.26473-2-henning.schild@siemens.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc.h b/include/linux/platform_data/x86/simatic-ipc.h index 1a8e4c1099e3..f2eafa43a605 100644 --- a/include/linux/platform_data/x86/simatic-ipc.h +++ b/include/linux/platform_data/x86/simatic-ipc.h @@ -32,6 +32,7 @@ enum simatic_ipc_station_ids { SIMATIC_IPC_IPC477E = 0x00000A02, SIMATIC_IPC_IPC127E = 0x00000D01, SIMATIC_IPC_IPC227G = 0x00000F01, + SIMATIC_IPC_IPC277G = 0x00000F02, SIMATIC_IPC_IPCBX_39A = 0x00001001, SIMATIC_IPC_IPCPX_39A = 0x00001002, SIMATIC_IPC_IPCBX_21A = 0x00001101, -- cgit v1.2.3 From 61457949686fc4472e6136c72c255b4ad003e084 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Wed, 12 Jul 2023 15:59:48 -0700 Subject: platform/x86/intel/tpmi: Read feature control status Some of the PM features can be locked or disabled. In that case, write interface can be locked. This status is read via a mailbox. There is one TPMI ID which provides base address for interface and data register for mail box operation. The mailbox operations is defined in the TPMI specification. Refer to https://github.com/intel/tpmi_power_management/ for TPMI specifications. An API is exposed to feature drivers to read feature control status. Signed-off-by: Srinivas Pandruvada Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230712225950.171326-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede --- include/linux/intel_tpmi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/intel_tpmi.h b/include/linux/intel_tpmi.h index f505788c05da..04d937ad4dc4 100644 --- a/include/linux/intel_tpmi.h +++ b/include/linux/intel_tpmi.h @@ -27,4 +27,6 @@ struct intel_tpmi_plat_info *tpmi_get_platform_data(struct auxiliary_device *aux struct resource *tpmi_get_resource_at_index(struct auxiliary_device *auxdev, int index); int tpmi_get_resource_count(struct auxiliary_device *auxdev); +int tpmi_get_feature_status(struct auxiliary_device *auxdev, int feature_id, int *locked, + int *disabled); #endif -- cgit v1.2.3 From 75e308ffc4f0d36b895f1110ece8b77d4116fdb1 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 14 Jul 2023 12:17:48 +0300 Subject: spi: Use struct_size() helper The Documentation/process/deprecated.rst suggests to use flexible array members to provide a way to declare having a dynamically sized set of trailing elements in a structure.This makes code robust agains bunch of the issues described in the documentation, main of which is about the correctness of the sizeof() calculation for this data structure. Due to above, prefer struct_size() over open-coded versions. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230714091748.89681-5-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 04daf61dfd3f..7f8b478fdeb3 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1085,6 +1086,8 @@ struct spi_transfer { * @state: for use by whichever driver currently owns the message * @resources: for resource management when the SPI message is processed * @prepared: spi_prepare_message was called for the this message + * @t: for use with spi_message_alloc() when message and transfers have + * been allocated together * * A @spi_message is used to execute an atomic sequence of data transfers, * each represented by a struct spi_transfer. The sequence is "atomic" @@ -1139,6 +1142,9 @@ struct spi_message { /* List of spi_res resources when the SPI message is processed */ struct list_head resources; + + /* For embedding transfers into the memory of the message */ + struct spi_transfer t[]; }; static inline void spi_message_init_no_memset(struct spi_message *m) @@ -1199,16 +1205,13 @@ static inline struct spi_message *spi_message_alloc(unsigned ntrans, gfp_t flags { struct spi_message *m; - m = kzalloc(sizeof(struct spi_message) - + ntrans * sizeof(struct spi_transfer), - flags); + m = kzalloc(struct_size(m, t, ntrans), flags); if (m) { unsigned i; - struct spi_transfer *t = (struct spi_transfer *)(m + 1); spi_message_init_no_memset(m); - for (i = 0; i < ntrans; i++, t++) - spi_message_add_tail(t, m); + for (i = 0; i < ntrans; i++) + spi_message_add_tail(&m->t[i], m); } return m; } -- cgit v1.2.3 From 791c2b17fb4023f21c3cbf5f268af01d9b8cb7cc Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 13 Apr 2023 14:40:25 +0100 Subject: iommu: Optimise PCI SAC address trick Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for PCI SAC address trick") and its subsequent revert, this mechanism no longer serves its original purpose, but now only works around broken hardware/drivers in a way that is unfortunately too impactful to remove. This does not, however, prevent us from solving the performance impact which that workaround has on large-scale systems that don't need it. Once the 32-bit IOVA space fills up and a workload starts allocating and freeing on both sides of the boundary, the opportunistic SAC allocation can then end up spending significant time hunting down scattered fragments of free 32-bit space, or just reestablishing max32_alloc_size. This can easily be exacerbated by a change in allocation pattern, such as by changing the network MTU, which can increase pressure on the 32-bit space by leaving a large quantity of cached IOVAs which are now the wrong size to be recycled, but also won't be freed since the non-opportunistic allocations can still be satisfied from the whole 64-bit space without triggering the reclaim path. However, in the context of a workaround where smaller DMA addresses aren't simply a preference but a necessity, if we get to that point at all then in fact it's already the endgame. The nature of the allocator is currently such that the first IOVA we give to a device after the 32-bit space runs out will be the highest possible address for that device, ever. If that works, then great, we know we can optimise for speed by always allocating from the full range. And if it doesn't, then the worst has already happened and any brokenness is now showing, so there's little point in continuing to try to hide it. To that end, implement a flag to refine the SAC business into a per-device policy that can automatically get itself out of the way if and when it stops being useful. CC: Linus Torvalds CC: Jakub Kicinski Reviewed-by: John Garry Signed-off-by: Robin Murphy Tested-by: Vasant Hegde Tested-by: Jakub Kicinski Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..b1dcb1b9b170 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -409,6 +409,7 @@ struct iommu_fault_param { * @priv: IOMMU Driver private data * @max_pasids: number of PASIDs this device can consume * @attach_deferred: the dma domain attachment is deferred + * @pci_32bit_workaround: Limit DMA allocations to 32-bit IOVAs * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -422,6 +423,7 @@ struct dev_iommu { void *priv; u32 max_pasids; u32 attach_deferred:1; + u32 pci_32bit_workaround:1; }; int iommu_device_register(struct iommu_device *iommu, -- cgit v1.2.3 From 6d07a31fea53dd610da88054c839e03610560e5d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 3 Jul 2023 22:24:04 -0700 Subject: jiffies: add kernel-doc for all APIs Add kernel-doc notation in for interfaces that don't already have it (i.e. most interfaces). Fix some formatting and typos in other comments. Signed-off-by: Randy Dunlap Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Acked-by: John Stultz Signed-off-by: Jonathan Corbet Link: https://lore.kernel.org/r/20230704052405.5089-2-rdunlap@infradead.org --- include/linux/jiffies.h | 197 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 176 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 5e13f801c902..e0ae2a43e0eb 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -72,9 +72,15 @@ extern int register_refined_jiffies(long clock_tick_rate); #endif /* - * The 64-bit value is not atomic - you MUST NOT read it + * The 64-bit value is not atomic on 32-bit systems - you MUST NOT read it * without sampling the sequence number in jiffies_lock. * get_jiffies_64() will do this for you as appropriate. + * + * jiffies and jiffies_64 are at the same address for little-endian systems + * and for 64-bit big-endian systems. + * On 32-bit big-endian systems, jiffies is the lower 32 bits of jiffies_64 + * (i.e., at address @jiffies_64 + 4). + * See arch/ARCH/kernel/vmlinux.lds.S */ extern u64 __cacheline_aligned_in_smp jiffies_64; extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies; @@ -82,6 +88,14 @@ extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffi #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void); #else +/** + * get_jiffies_64 - read the 64-bit non-atomic jiffies_64 value + * + * When BITS_PER_LONG < 64, this uses sequence number sampling using + * jiffies_lock to protect the 64-bit read. + * + * Return: current 64-bit jiffies value + */ static inline u64 get_jiffies_64(void) { return (u64)jiffies; @@ -89,39 +103,76 @@ static inline u64 get_jiffies_64(void) #endif /* - * These inlines deal with timer wrapping correctly. You are - * strongly encouraged to use them + * These inlines deal with timer wrapping correctly. You are + * strongly encouraged to use them: * 1. Because people otherwise forget * 2. Because if the timer wrap changes in future you won't have to * alter your driver code. - * - * time_after(a,b) returns true if the time a is after time b. + */ + +/** + * time_after - returns true if the time a is after time b. + * @a: first comparable as unsigned long + * @b: second comparable as unsigned long * * Do this with "<0" and ">=0" to only test the sign of the result. A * good compiler would generate better code (and a really good compiler * wouldn't care). Gcc is currently neither. + * + * Return: %true is time a is after time b, otherwise %false. */ #define time_after(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ ((long)((b) - (a)) < 0)) +/** + * time_before - returns true if the time a is before time b. + * @a: first comparable as unsigned long + * @b: second comparable as unsigned long + * + * Return: %true is time a is before time b, otherwise %false. + */ #define time_before(a,b) time_after(b,a) +/** + * time_after_eq - returns true if the time a is after or the same as time b. + * @a: first comparable as unsigned long + * @b: second comparable as unsigned long + * + * Return: %true is time a is after or the same as time b, otherwise %false. + */ #define time_after_eq(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ ((long)((a) - (b)) >= 0)) +/** + * time_before_eq - returns true if the time a is before or the same as time b. + * @a: first comparable as unsigned long + * @b: second comparable as unsigned long + * + * Return: %true is time a is before or the same as time b, otherwise %false. + */ #define time_before_eq(a,b) time_after_eq(b,a) -/* - * Calculate whether a is in the range of [b, c]. +/** + * time_in_range - Calculate whether a is in the range of [b, c]. + * @a: time to test + * @b: beginning of the range + * @c: end of the range + * + * Return: %true is time a is in the range [b, c], otherwise %false. */ #define time_in_range(a,b,c) \ (time_after_eq(a,b) && \ time_before_eq(a,c)) -/* - * Calculate whether a is in the range of [b, c). +/** + * time_in_range_open - Calculate whether a is in the range of [b, c). + * @a: time to test + * @b: beginning of the range + * @c: end of the range + * + * Return: %true is time a is in the range [b, c), otherwise %false. */ #define time_in_range_open(a,b,c) \ (time_after_eq(a,b) && \ @@ -129,45 +180,138 @@ static inline u64 get_jiffies_64(void) /* Same as above, but does so with platform independent 64bit types. * These must be used when utilizing jiffies_64 (i.e. return value of - * get_jiffies_64() */ + * get_jiffies_64()). */ + +/** + * time_after64 - returns true if the time a is after time b. + * @a: first comparable as __u64 + * @b: second comparable as __u64 + * + * This must be used when utilizing jiffies_64 (i.e. return value of + * get_jiffies_64()). + * + * Return: %true is time a is after time b, otherwise %false. + */ #define time_after64(a,b) \ (typecheck(__u64, a) && \ typecheck(__u64, b) && \ ((__s64)((b) - (a)) < 0)) +/** + * time_before64 - returns true if the time a is before time b. + * @a: first comparable as __u64 + * @b: second comparable as __u64 + * + * This must be used when utilizing jiffies_64 (i.e. return value of + * get_jiffies_64()). + * + * Return: %true is time a is before time b, otherwise %false. + */ #define time_before64(a,b) time_after64(b,a) +/** + * time_after_eq64 - returns true if the time a is after or the same as time b. + * @a: first comparable as __u64 + * @b: second comparable as __u64 + * + * This must be used when utilizing jiffies_64 (i.e. return value of + * get_jiffies_64()). + * + * Return: %true is time a is after or the same as time b, otherwise %false. + */ #define time_after_eq64(a,b) \ (typecheck(__u64, a) && \ typecheck(__u64, b) && \ ((__s64)((a) - (b)) >= 0)) +/** + * time_before_eq64 - returns true if the time a is before or the same as time b. + * @a: first comparable as __u64 + * @b: second comparable as __u64 + * + * This must be used when utilizing jiffies_64 (i.e. return value of + * get_jiffies_64()). + * + * Return: %true is time a is before or the same as time b, otherwise %false. + */ #define time_before_eq64(a,b) time_after_eq64(b,a) +/** + * time_in_range64 - Calculate whether a is in the range of [b, c]. + * @a: time to test + * @b: beginning of the range + * @c: end of the range + * + * Return: %true is time a is in the range [b, c], otherwise %false. + */ #define time_in_range64(a, b, c) \ (time_after_eq64(a, b) && \ time_before_eq64(a, c)) /* - * These four macros compare jiffies and 'a' for convenience. + * These eight macros compare jiffies[_64] and 'a' for convenience. */ -/* time_is_before_jiffies(a) return true if a is before jiffies */ +/** + * time_is_before_jiffies - return true if a is before jiffies + * @a: time (unsigned long) to compare to jiffies + * + * Return: %true is time a is before jiffies, otherwise %false. + */ #define time_is_before_jiffies(a) time_after(jiffies, a) +/** + * time_is_before_jiffies64 - return true if a is before jiffies_64 + * @a: time (__u64) to compare to jiffies_64 + * + * Return: %true is time a is before jiffies_64, otherwise %false. + */ #define time_is_before_jiffies64(a) time_after64(get_jiffies_64(), a) -/* time_is_after_jiffies(a) return true if a is after jiffies */ +/** + * time_is_after_jiffies - return true if a is after jiffies + * @a: time (unsigned long) to compare to jiffies + * + * Return: %true is time a is after jiffies, otherwise %false. + */ #define time_is_after_jiffies(a) time_before(jiffies, a) +/** + * time_is_after_jiffies64 - return true if a is after jiffies_64 + * @a: time (__u64) to compare to jiffies_64 + * + * Return: %true is time a is after jiffies_64, otherwise %false. + */ #define time_is_after_jiffies64(a) time_before64(get_jiffies_64(), a) -/* time_is_before_eq_jiffies(a) return true if a is before or equal to jiffies*/ +/** + * time_is_before_eq_jiffies - return true if a is before or equal to jiffies + * @a: time (unsigned long) to compare to jiffies + * + * Return: %true is time a is before or the same as jiffies, otherwise %false. + */ #define time_is_before_eq_jiffies(a) time_after_eq(jiffies, a) +/** + * time_is_before_eq_jiffies64 - return true if a is before or equal to jiffies_64 + * @a: time (__u64) to compare to jiffies_64 + * + * Return: %true is time a is before or the same jiffies_64, otherwise %false. + */ #define time_is_before_eq_jiffies64(a) time_after_eq64(get_jiffies_64(), a) -/* time_is_after_eq_jiffies(a) return true if a is after or equal to jiffies*/ +/** + * time_is_after_eq_jiffies - return true if a is after or equal to jiffies + * @a: time (unsigned long) to compare to jiffies + * + * Return: %true is time a is after or the same as jiffies, otherwise %false. + */ #define time_is_after_eq_jiffies(a) time_before_eq(jiffies, a) +/** + * time_is_after_eq_jiffies64 - return true if a is after or equal to jiffies_64 + * @a: time (__u64) to compare to jiffies_64 + * + * Return: %true is time a is after or the same as jiffies_64, otherwise %false. + */ #define time_is_after_eq_jiffies64(a) time_before_eq64(get_jiffies_64(), a) /* - * Have the 32 bit jiffies value wrap 5 minutes after boot + * Have the 32-bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. */ #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) @@ -278,7 +422,7 @@ extern unsigned long preset_lpj; #if BITS_PER_LONG < 64 # define MAX_SEC_IN_JIFFIES \ (long)((u64)((u64)MAX_JIFFY_OFFSET * TICK_NSEC) / NSEC_PER_SEC) -#else /* take care of overflow on 64 bits machines */ +#else /* take care of overflow on 64-bit machines */ # define MAX_SEC_IN_JIFFIES \ (SH_DIV((MAX_JIFFY_OFFSET >> SEC_JIFFIE_SC) * TICK_NSEC, NSEC_PER_SEC, 1) - 1) @@ -290,6 +434,12 @@ extern unsigned long preset_lpj; extern unsigned int jiffies_to_msecs(const unsigned long j); extern unsigned int jiffies_to_usecs(const unsigned long j); +/** + * jiffies_to_nsecs - Convert jiffies to nanoseconds + * @j: jiffies value + * + * Return: nanoseconds value + */ static inline u64 jiffies_to_nsecs(const unsigned long j) { return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC; @@ -353,12 +503,14 @@ static inline unsigned long _msecs_to_jiffies(const unsigned int m) * * msecs_to_jiffies() checks for the passed in value being a constant * via __builtin_constant_p() allowing gcc to eliminate most of the - * code, __msecs_to_jiffies() is called if the value passed does not + * code. __msecs_to_jiffies() is called if the value passed does not * allow constant folding and the actual conversion must be done at * runtime. - * the HZ range specific helpers _msecs_to_jiffies() are called both + * The HZ range specific helpers _msecs_to_jiffies() are called both * directly here and from __msecs_to_jiffies() in the case where * constant folding is not possible. + * + * Return: jiffies value */ static __always_inline unsigned long msecs_to_jiffies(const unsigned int m) { @@ -400,12 +552,14 @@ static inline unsigned long _usecs_to_jiffies(const unsigned int u) * * usecs_to_jiffies() checks for the passed in value being a constant * via __builtin_constant_p() allowing gcc to eliminate most of the - * code, __usecs_to_jiffies() is called if the value passed does not + * code. __usecs_to_jiffies() is called if the value passed does not * allow constant folding and the actual conversion must be done at * runtime. - * the HZ range specific helpers _usecs_to_jiffies() are called both + * The HZ range specific helpers _usecs_to_jiffies() are called both * directly here and from __msecs_to_jiffies() in the case where * constant folding is not possible. + * + * Return: jiffies value */ static __always_inline unsigned long usecs_to_jiffies(const unsigned int u) { @@ -422,6 +576,7 @@ extern unsigned long timespec64_to_jiffies(const struct timespec64 *value); extern void jiffies_to_timespec64(const unsigned long jiffies, struct timespec64 *value); extern clock_t jiffies_to_clock_t(unsigned long x); + static inline clock_t jiffies_delta_to_clock_t(long delta) { return jiffies_to_clock_t(max(0L, delta)); -- cgit v1.2.3 From 5f8e3202696f5e94d3fc54cfed75d270ae843ee9 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 17 May 2023 05:01:28 -0700 Subject: rcuscale: Measure grace-period kthread CPU time This commit adds the ability to output the CPU time consumed by the grace-period kthread for the RCU variant under test. The CPU time is whatever is in the designated task's current->stime field, and thus is controlled by whatever CPU-time accounting scheme is in effect. This output appears in microseconds as follows on the console: rcu_scale: Grace-period kthread CPU time: 42367.037 [ paulmck: Apply feedback from Stephen Rothwell and kernel test robot. ] Signed-off-by: Paul E. McKenney Tested-by: Yujie Liu --- include/linux/rcupdate_trace.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h index 9bc8cbb33340..eda493200663 100644 --- a/include/linux/rcupdate_trace.h +++ b/include/linux/rcupdate_trace.h @@ -87,6 +87,7 @@ static inline void rcu_read_unlock_trace(void) void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func); void synchronize_rcu_tasks_trace(void); void rcu_barrier_tasks_trace(void); +struct task_struct *get_rcu_tasks_trace_gp_kthread(void); #else /* * The BPF JIT forms these addresses even when it doesn't call these -- cgit v1.2.3 From cb0116090e4cff6da2e9abd1c29b8e16491af176 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Mon, 19 Jun 2023 15:04:27 +0200 Subject: soc: qcom: smd-rpm: Add QCOM_SMD_RPM_STATE_NUM Add a preprocessor define to indicate the number of RPM contexts/states. Signed-off-by: Konrad Dybcio Acked-by: Georgi Djakov Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-2-09c78c175546@linaro.org Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/smd-rpm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smd-rpm.h b/include/linux/soc/qcom/smd-rpm.h index 2990f425fdef..e468f94fa323 100644 --- a/include/linux/soc/qcom/smd-rpm.h +++ b/include/linux/soc/qcom/smd-rpm.h @@ -6,6 +6,7 @@ struct qcom_smd_rpm; #define QCOM_SMD_RPM_ACTIVE_STATE 0 #define QCOM_SMD_RPM_SLEEP_STATE 1 +#define QCOM_SMD_RPM_STATE_NUM 2 /* * Constants used for addressing resources in the RPM. -- cgit v1.2.3 From 82a793e2d3e3da748f23a0fbe0b4615292625fe8 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Mon, 19 Jun 2023 15:04:28 +0200 Subject: soc: qcom: smd-rpm: Use tabs for defines Use tabs for defines to make things spaced consistently. Signed-off-by: Konrad Dybcio Acked-by: Georgi Djakov Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-3-09c78c175546@linaro.org Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/smd-rpm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smd-rpm.h b/include/linux/soc/qcom/smd-rpm.h index e468f94fa323..99499e4b080e 100644 --- a/include/linux/soc/qcom/smd-rpm.h +++ b/include/linux/soc/qcom/smd-rpm.h @@ -4,8 +4,8 @@ struct qcom_smd_rpm; -#define QCOM_SMD_RPM_ACTIVE_STATE 0 -#define QCOM_SMD_RPM_SLEEP_STATE 1 +#define QCOM_SMD_RPM_ACTIVE_STATE 0 +#define QCOM_SMD_RPM_SLEEP_STATE 1 #define QCOM_SMD_RPM_STATE_NUM 2 /* -- cgit v1.2.3 From e1e1267413d2e9fbe3a34c5a5f701b0f5fb0bf2c Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Mon, 19 Jun 2023 15:04:29 +0200 Subject: clk: qcom: smd-rpm: Move some RPM resources to the common header In preparation for handling the bus clocks in the icc driver, carve out some defines and a struct definition to the common rpm header. Reviewed-by: Dmitry Baryshkov Acked-by: Stephen Boyd Signed-off-by: Konrad Dybcio Acked-by: Georgi Djakov Link: https://lore.kernel.org/r/20230526-topic-smd_icc-v7-4-09c78c175546@linaro.org Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/smd-rpm.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smd-rpm.h b/include/linux/soc/qcom/smd-rpm.h index 99499e4b080e..8190878645f9 100644 --- a/include/linux/soc/qcom/smd-rpm.h +++ b/include/linux/soc/qcom/smd-rpm.h @@ -2,6 +2,8 @@ #ifndef __QCOM_SMD_RPM_H__ #define __QCOM_SMD_RPM_H__ +#include + struct qcom_smd_rpm; #define QCOM_SMD_RPM_ACTIVE_STATE 0 @@ -45,6 +47,19 @@ struct qcom_smd_rpm; #define QCOM_SMD_RPM_PKA_CLK 0x616b70 #define QCOM_SMD_RPM_MCFG_CLK 0x6766636d +#define QCOM_RPM_KEY_SOFTWARE_ENABLE 0x6e657773 +#define QCOM_RPM_KEY_PIN_CTRL_CLK_BUFFER_ENABLE_KEY 0x62636370 +#define QCOM_RPM_SMD_KEY_RATE 0x007a484b +#define QCOM_RPM_SMD_KEY_ENABLE 0x62616e45 +#define QCOM_RPM_SMD_KEY_STATE 0x54415453 +#define QCOM_RPM_SCALING_ENABLE_ID 0x2 + +struct clk_smd_rpm_req { + __le32 key; + __le32 nbytes; + __le32 value; +}; + int qcom_rpm_smd_write(struct qcom_smd_rpm *rpm, int state, u32 resource_type, u32 resource_id, -- cgit v1.2.3 From 8ce49c2a2aa53afde9a20a8ce02b069d3b262af0 Mon Sep 17 00:00:00 2001 From: Deepak Kumar Singh Date: Fri, 7 Jul 2023 03:11:36 +0530 Subject: rpmsg: core: Add signal API support Some transports like Glink support the state notifications between clients using flow control signals similar to serial protocol signals. Local glink client drivers can send and receive flow control status to glink clients running on remote processors. Add APIs to support sending and receiving of flow control status by rpmsg clients. Signed-off-by: Deepak Kumar Singh Signed-off-by: Sarannya S Acked-by: Arnaud Pouliquen Link: https://lore.kernel.org/r/1688679698-31274-2-git-send-email-quic_sarannya@quicinc.com Signed-off-by: Bjorn Andersson --- include/linux/rpmsg.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h index 523c98b96cb4..90d8e4475f80 100644 --- a/include/linux/rpmsg.h +++ b/include/linux/rpmsg.h @@ -64,12 +64,14 @@ struct rpmsg_device { }; typedef int (*rpmsg_rx_cb_t)(struct rpmsg_device *, void *, int, void *, u32); +typedef int (*rpmsg_flowcontrol_cb_t)(struct rpmsg_device *, void *, bool); /** * struct rpmsg_endpoint - binds a local rpmsg address to its user * @rpdev: rpmsg channel device * @refcount: when this drops to zero, the ept is deallocated * @cb: rx callback handler + * @flow_cb: remote flow control callback handler * @cb_lock: must be taken before accessing/changing @cb * @addr: local rpmsg address * @priv: private data for the driver's use @@ -92,6 +94,7 @@ struct rpmsg_endpoint { struct rpmsg_device *rpdev; struct kref refcount; rpmsg_rx_cb_t cb; + rpmsg_flowcontrol_cb_t flow_cb; struct mutex cb_lock; u32 addr; void *priv; @@ -106,6 +109,7 @@ struct rpmsg_endpoint { * @probe: invoked when a matching rpmsg channel (i.e. device) is found * @remove: invoked when the rpmsg channel is removed * @callback: invoked when an inbound message is received on the channel + * @flowcontrol: invoked when remote side flow control request is received */ struct rpmsg_driver { struct device_driver drv; @@ -113,6 +117,7 @@ struct rpmsg_driver { int (*probe)(struct rpmsg_device *dev); void (*remove)(struct rpmsg_device *dev); int (*callback)(struct rpmsg_device *, void *, int, void *, u32); + int (*flowcontrol)(struct rpmsg_device *, void *, bool); }; static inline u16 rpmsg16_to_cpu(struct rpmsg_device *rpdev, __rpmsg16 val) @@ -192,6 +197,8 @@ __poll_t rpmsg_poll(struct rpmsg_endpoint *ept, struct file *filp, ssize_t rpmsg_get_mtu(struct rpmsg_endpoint *ept); +int rpmsg_set_flow_control(struct rpmsg_endpoint *ept, bool pause, u32 dst); + #else static inline int rpmsg_register_device_override(struct rpmsg_device *rpdev, @@ -316,6 +323,14 @@ static inline ssize_t rpmsg_get_mtu(struct rpmsg_endpoint *ept) return -ENXIO; } +static inline int rpmsg_set_flow_control(struct rpmsg_endpoint *ept, bool pause, u32 dst) +{ + /* This shouldn't be possible */ + WARN_ON(1); + + return -ENXIO; +} + #endif /* IS_ENABLED(CONFIG_RPMSG) */ /* use a macro to avoid include chaining to get THIS_MODULE */ -- cgit v1.2.3 From f247f08da0ce822de0d6b2feec811dd6d4d599ce Mon Sep 17 00:00:00 2001 From: Siddharth Gupta Date: Fri, 24 Feb 2023 13:17:06 -0800 Subject: remoteproc: core: Export the rproc coredump APIs The remoteproc coredump APIs are currently only part of the internal remoteproc header. This prevents the remoteproc platform drivers from using these APIs when needed. This change moves the rproc_coredump() and rproc_coredump_cleanup() APIs to the linux header and marks them as exported symbols. Signed-off-by: Siddharth Gupta Signed-off-by: Gokul krishna Krishnakumar Link: https://lore.kernel.org/r/20230224211707.30916-2-quic_gokukris@quicinc.com Signed-off-by: Bjorn Andersson --- include/linux/remoteproc.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index fe8978eb69f1..b4795698d8c2 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -690,6 +690,10 @@ int rproc_detach(struct rproc *rproc); int rproc_set_firmware(struct rproc *rproc, const char *fw_name); void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type); void *rproc_da_to_va(struct rproc *rproc, u64 da, size_t len, bool *is_iomem); + +/* from remoteproc_coredump.c */ +void rproc_coredump_cleanup(struct rproc *rproc); +void rproc_coredump(struct rproc *rproc); void rproc_coredump_using_sections(struct rproc *rproc); int rproc_coredump_add_segment(struct rproc *rproc, dma_addr_t da, size_t size); int rproc_coredump_add_custom_segment(struct rproc *rproc, -- cgit v1.2.3 From 9fa0bba012c2dd6d2b0db893314a4cc252a91b5f Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 13 Jul 2023 15:19:05 -0700 Subject: net: phy: bcm7xxx: Add EPHY entry for 74165 74165 is a 16nm process SoC with a 10/100 integrated Ethernet PHY, utilize the recently defined 16nm EPHY macro to configure that PHY. Reviewed-by: Simon Horman Reviewed-by: Andrew Lunn Signed-off-by: Florian Fainelli Signed-off-by: Justin Chen Signed-off-by: David S. Miller --- include/linux/brcmphy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 5d732f48f787..c55810a43541 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -44,6 +44,7 @@ #define PHY_ID_BCM7366 0x600d8490 #define PHY_ID_BCM7346 0x600d8650 #define PHY_ID_BCM7362 0x600d84b0 +#define PHY_ID_BCM74165 0x359052c0 #define PHY_ID_BCM7425 0x600d86b0 #define PHY_ID_BCM7429 0x600d8730 #define PHY_ID_BCM7435 0x600d8750 -- cgit v1.2.3 From 43c9835b144c7ce29efe142d662529662a9eb376 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 7 Jul 2023 11:42:39 +0200 Subject: block: don't allow enabling a cache on devices that don't support it Currently the write_cache attribute allows enabling the QUEUE_FLAG_WC flag on devices that never claimed the capability. Fix that by adding a QUEUE_FLAG_HW_WC flag that is set by blk_queue_write_cache and guards re-enabling the cache through sysfs. Note that any rescan that calls blk_queue_write_cache will still re-enable the write cache as in the current code. Fixes: 93e9d8e836cb ("block: add ability to flag write back caching on a device") Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20230707094239.107968-3-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ed44a997f629..2f5371b8482c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -538,6 +538,7 @@ struct request_queue { #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ #define QUEUE_FLAG_SYNCHRONOUS 11 /* always completes in submit context */ #define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */ +#define QUEUE_FLAG_HW_WC 18 /* Write back caching supported */ #define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */ #define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */ #define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */ -- cgit v1.2.3 From 660e802c76c89e871c29cd3174c07c8d23e39c35 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Mon, 17 Jul 2023 12:00:55 +0800 Subject: blk-mq: use percpu csd to remote complete instead of per-rq csd If request need to be completed remotely, we insert it into percpu llist, and smp_call_function_single_async() if llist is empty previously. We don't need to use per-rq csd, percpu csd is enough. And the size of struct request is decreased by 24 bytes. This way is cleaner, and looks correct, given block softirq is guaranteed to be scheduled to consume the list if one new request is added to this percpu list, either smp_call_function_single_async() returns -EBUSY or 0. Signed-off-by: Chengming Zhou Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20230717040058.3993930-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b96e00499f9e..67f810857634 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -182,10 +182,7 @@ struct request { rq_end_io_fn *saved_end_io; } flush; - union { - struct __call_single_data csd; - u64 fifo_time; - }; + u64 fifo_time; /* * completion callback. -- cgit v1.2.3 From 81ada09cc25e4bf2de7d2951925fb409338a545d Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Mon, 17 Jul 2023 12:00:58 +0800 Subject: blk-flush: reuse rq queuelist in flush state machine Since we don't need to maintain inflight flush_data requests list anymore, we can reuse rq->queuelist for flush pending list. Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist before reusing it in the state machine when end, since the rq->rq_next also reuse it, may have corrupted rq->queuelist by the driver. This patch decrease the size of struct request by 16 bytes. Signed-off-by: Chengming Zhou Reviewed-by: Christoph Hellwig Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20230717040058.3993930-5-chengming.zhou@linux.dev Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 67f810857634..01e8c31db665 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -178,7 +178,6 @@ struct request { struct { unsigned int seq; - struct list_head list; rq_end_io_fn *saved_end_io; } flush; -- cgit v1.2.3 From 880b9577855edddda1e732748e849c63199d489b Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 17 Jul 2023 09:00:09 -0700 Subject: fs: distinguish between user initiated freeze and kernel initiated freeze Userspace can freeze a filesystem using the FIFREEZE ioctl or by suspending the block device; this state persists until userspace thaws the filesystem with the FITHAW ioctl or resuming the block device. Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for the fsfreeze ioctl") we only allow the first freeze command to succeed. The kernel may decide that it is necessary to freeze a filesystem for its own internal purposes, such as suspends in progress, filesystem fsck activities, or quiescing a device prior to removal. Userspace thaw commands must never break a kernel freeze, and kernel thaw commands shouldn't undo userspace's freeze command. Introduce a couple of freeze holder flags and wire it into the sb_writers state. One kernel and one userspace freeze are allowed to coexist at the same time; the filesystem will not thaw until both are lifted. I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing behaviors. Cc: mcgrof@kernel.org Cc: jack@suse.cz Cc: hch@infradead.org Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Dave Chinner Reviewed-by: Jan Kara --- include/linux/fs.h | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..5e8d81212abc 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1147,7 +1147,8 @@ enum { #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1) struct sb_writers { - int frozen; /* Is sb frozen? */ + unsigned short frozen; /* Is sb frozen? */ + unsigned short freeze_holders; /* Who froze fs? */ struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; }; @@ -1902,6 +1903,10 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct file *dst_file, loff_t dst_pos, loff_t len, unsigned int remap_flags); +enum freeze_holder { + FREEZE_HOLDER_KERNEL = (1U << 0), + FREEZE_HOLDER_USERSPACE = (1U << 1), +}; struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); @@ -1914,9 +1919,9 @@ struct super_operations { void (*evict_inode) (struct inode *); void (*put_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); - int (*freeze_super) (struct super_block *); + int (*freeze_super) (struct super_block *, enum freeze_holder who); int (*freeze_fs) (struct super_block *); - int (*thaw_super) (struct super_block *); + int (*thaw_super) (struct super_block *, enum freeze_holder who); int (*unfreeze_fs) (struct super_block *); int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); @@ -2290,8 +2295,8 @@ extern int unregister_filesystem(struct file_system_type *); extern int vfs_statfs(const struct path *, struct kstatfs *); extern int user_statfs(const char __user *, struct kstatfs *); extern int fd_statfs(int, struct kstatfs *); -extern int freeze_super(struct super_block *super); -extern int thaw_super(struct super_block *super); +int freeze_super(struct super_block *super, enum freeze_holder who); +int thaw_super(struct super_block *super, enum freeze_holder who); extern __printf(2, 3) int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); extern int super_setup_bdi(struct super_block *sb); -- cgit v1.2.3 From dcb60f9c403e03133363563ac8ea5d8bba6c2be1 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 12 Jul 2023 20:08:32 -0700 Subject: cpumask: eliminate kernel-doc warnings Update lib/cpumask.c and to fix all kernel-doc warnings: include/linux/cpumask.h:185: warning: Function parameter or member 'srcp1' not described in 'cpumask_first_and' include/linux/cpumask.h:185: warning: Function parameter or member 'srcp2' not described in 'cpumask_first_and' include/linux/cpumask.h:185: warning: Excess function parameter 'src1p' description in 'cpumask_first_and' include/linux/cpumask.h:185: warning: Excess function parameter 'src2p' description in 'cpumask_first_and' lib/cpumask.c:59: warning: Function parameter or member 'node' not described in 'alloc_cpumask_var_node' lib/cpumask.c:169: warning: Function parameter or member 'src1p' not described in 'cpumask_any_and_distribute' lib/cpumask.c:169: warning: Function parameter or member 'src2p' not described in 'cpumask_any_and_distribute' Fixes: 7b4967c53204 ("cpumask: Add alloc_cpumask_var_node()") Fixes: 839cad5fa54b ("cpumask: fix function description kernel-doc notation") Fixes: 93ba139ba819 ("cpumask: use find_first_and_bit()") Signed-off-by: Randy Dunlap Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov --- include/linux/cpumask.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 0d2e2a38b92d..f10fb87d49db 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -175,8 +175,8 @@ static inline unsigned int cpumask_first_zero(const struct cpumask *srcp) /** * cpumask_first_and - return the first cpu from *srcp1 & *srcp2 - * @src1p: the first input - * @src2p: the second input + * @srcp1: the first input + * @srcp2: the second input * * Returns >= nr_cpu_ids if no cpus set in both. See also cpumask_next_and(). */ @@ -1197,6 +1197,10 @@ cpumap_print_bitmask_to_buf(char *buf, const struct cpumask *mask, /** * cpumap_print_list_to_buf - copies the cpumask into the buffer as * comma-separated list of cpus + * @buf: the buffer to copy into + * @mask: the cpumask to copy + * @off: in the string from which we are copying, we copy to @buf + * @count: the maximum number of bytes to print * * Everything is same with the above cpumap_print_bitmask_to_buf() * except the print format. -- cgit v1.2.3 From 6f63904c8f3edb65bd85c1be01d69214ff8ca4c5 Mon Sep 17 00:00:00 2001 From: Andrei Vagin Date: Tue, 7 Mar 2023 23:31:58 -0800 Subject: sched: add a few helpers to wake up tasks on the current cpu Add complete_on_current_cpu, wake_up_poll_on_current_cpu helpers to wake up tasks on the current CPU. These two helpers are useful when the task needs to make a synchronous context switch to another task. In this context, synchronous means it wakes up the target task and falls asleep right after that. One example of such workloads is seccomp user notifies. This mechanism allows the supervisor process handles system calls on behalf of a target process. While the supervisor is handling an intercepted system call, the target process will be blocked in the kernel, waiting for a response to come back. On-CPU context switches are much faster than regular ones. Signed-off-by: Andrei Vagin Acked-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230308073201.3102738-4-avagin@google.com Signed-off-by: Kees Cook --- include/linux/completion.h | 1 + include/linux/swait.h | 2 +- include/linux/wait.h | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/completion.h b/include/linux/completion.h index 62b32b19e0a8..fb2915676574 100644 --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -116,6 +116,7 @@ extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); extern void complete(struct completion *); +extern void complete_on_current_cpu(struct completion *x); extern void complete_all(struct completion *); #endif diff --git a/include/linux/swait.h b/include/linux/swait.h index 6a8c22b8c2a5..d324419482a0 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -146,7 +146,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq) extern void swake_up_one(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); -extern void swake_up_locked(struct swait_queue_head *q); +extern void swake_up_locked(struct swait_queue_head *q, int wake_flags); extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state); extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); diff --git a/include/linux/wait.h b/include/linux/wait.h index a0307b516b09..5ec7739400f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -210,6 +210,7 @@ __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq } int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); +void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); @@ -237,6 +238,8 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head); #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) +#define wake_up_poll_on_current_cpu(x, m) \ + __wake_up_on_current_cpu(x, TASK_NORMAL, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m)) #define wake_up_interruptible_poll(x, m) \ -- cgit v1.2.3 From 76226787e137962750241bb29a9572dfc10d9eb1 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 14 Jul 2023 10:12:17 +0100 Subject: net: phylink: remove legacy mac_an_restart() method The mac_an_restart() method is now completely unused, and has been superseded by phylink_pcs support. Remove this method. Since phylink_pcs_mac_an_restart() now only deals with the PCS, rename the function to remove the _mac infix. Signed-off-by: Russell King (Oracle) Reviewed-by: Florian Fainelli Signed-off-by: Paolo Abeni --- include/linux/phylink.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index b28aa3eef7d5..9e861c8316d0 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -234,7 +234,6 @@ struct phylink_config { * @mac_prepare: prepare for a major reconfiguration of the interface. * @mac_config: configure the MAC for the selected mode and state. * @mac_finish: finish a major reconfiguration of the interface. - * @mac_an_restart: restart 802.3z BaseX autonegotiation. * @mac_link_down: take the link down. * @mac_link_up: allow the link to come up. * @@ -254,7 +253,6 @@ struct phylink_mac_ops { const struct phylink_link_state *state); int (*mac_finish)(struct phylink_config *config, unsigned int mode, phy_interface_t iface); - void (*mac_an_restart)(struct phylink_config *config); void (*mac_link_down)(struct phylink_config *config, unsigned int mode, phy_interface_t interface); void (*mac_link_up)(struct phylink_config *config, @@ -459,16 +457,6 @@ void mac_config(struct phylink_config *config, unsigned int mode, int mac_finish(struct phylink_config *config, unsigned int mode, phy_interface_t iface); -/** - * mac_an_restart() - restart 802.3z BaseX autonegotiation - * @config: a pointer to a &struct phylink_config. - * - * Note: This is a legacy method. This function will not be called unless - * legacy_pre_march2020 is set in &struct phylink_config and there is no - * PCS attached. - */ -void mac_an_restart(struct phylink_config *config); - /** * mac_link_down() - take the link down * @config: a pointer to a &struct phylink_config. -- cgit v1.2.3 From 3c6152940584290668b35fa0800026f6a1ae05fe Mon Sep 17 00:00:00 2001 From: "GONG, Ruiqi" Date: Fri, 14 Jul 2023 14:44:22 +0800 Subject: Randomized slab caches for kmalloc() When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng Signed-off-by: Xiu Jianfeng Signed-off-by: GONG, Ruiqi Reviewed-by: Kees Cook Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou # percpu Signed-off-by: Vlastimil Babka --- include/linux/percpu.h | 12 +++++++++--- include/linux/slab.h | 23 ++++++++++++++++++++--- 2 files changed, 29 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/percpu.h b/include/linux/percpu.h index b3b458442330..68fac2e7cbe6 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -35,6 +35,12 @@ #define PCPU_BITMAP_BLOCK_BITS (PCPU_BITMAP_BLOCK_SIZE >> \ PCPU_MIN_ALLOC_SHIFT) +#ifdef CONFIG_RANDOM_KMALLOC_CACHES +#define PERCPU_DYNAMIC_SIZE_SHIFT 12 +#else +#define PERCPU_DYNAMIC_SIZE_SHIFT 10 +#endif + /* * Percpu allocator can serve percpu allocations before slab is * initialized which allows slab to depend on the percpu allocator. @@ -42,7 +48,7 @@ * for this. Keep PERCPU_DYNAMIC_RESERVE equal to or larger than * PERCPU_DYNAMIC_EARLY_SIZE. */ -#define PERCPU_DYNAMIC_EARLY_SIZE (20 << 10) +#define PERCPU_DYNAMIC_EARLY_SIZE (20 << PERCPU_DYNAMIC_SIZE_SHIFT) /* * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy @@ -56,9 +62,9 @@ * intelligent way to determine this would be nice. */ #if BITS_PER_LONG > 32 -#define PERCPU_DYNAMIC_RESERVE (28 << 10) +#define PERCPU_DYNAMIC_RESERVE (28 << PERCPU_DYNAMIC_SIZE_SHIFT) #else -#define PERCPU_DYNAMIC_RESERVE (20 << 10) +#define PERCPU_DYNAMIC_RESERVE (20 << PERCPU_DYNAMIC_SIZE_SHIFT) #endif extern void *pcpu_base_addr; diff --git a/include/linux/slab.h b/include/linux/slab.h index 848c7c82ad5a..8228d1276a2f 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -19,6 +19,7 @@ #include #include #include +#include /* @@ -345,6 +346,12 @@ static inline unsigned int arch_slab_minalign(void) #define SLAB_OBJ_MIN_SIZE (KMALLOC_MIN_SIZE < 16 ? \ (KMALLOC_MIN_SIZE) : 16) +#ifdef CONFIG_RANDOM_KMALLOC_CACHES +#define RANDOM_KMALLOC_CACHES_NR 15 // # of cache copies +#else +#define RANDOM_KMALLOC_CACHES_NR 0 +#endif + /* * Whenever changing this, take care of that kmalloc_type() and * create_kmalloc_caches() still work as intended. @@ -361,6 +368,8 @@ enum kmalloc_cache_type { #ifndef CONFIG_MEMCG_KMEM KMALLOC_CGROUP = KMALLOC_NORMAL, #endif + KMALLOC_RANDOM_START = KMALLOC_NORMAL, + KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + RANDOM_KMALLOC_CACHES_NR, #ifdef CONFIG_SLUB_TINY KMALLOC_RECLAIM = KMALLOC_NORMAL, #else @@ -386,14 +395,22 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; (IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0) | \ (IS_ENABLED(CONFIG_MEMCG_KMEM) ? __GFP_ACCOUNT : 0)) -static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) +extern unsigned long random_kmalloc_seed; + +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller) { /* * The most common case is KMALLOC_NORMAL, so test for it * with a single branch for all the relevant flags. */ if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0)) +#ifdef CONFIG_RANDOM_KMALLOC_CACHES + /* RANDOM_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */ + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, + ilog2(RANDOM_KMALLOC_CACHES_NR + 1)); +#else return KMALLOC_NORMAL; +#endif /* * At least one of the flags has to be set. Their priorities in @@ -580,7 +597,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags) index = kmalloc_index(size); return kmalloc_trace( - kmalloc_caches[kmalloc_type(flags)][index], + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], flags, size); } return __kmalloc(size, flags); @@ -596,7 +613,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla index = kmalloc_index(size); return kmalloc_node_trace( - kmalloc_caches[kmalloc_type(flags)][index], + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], flags, node, size); } return __kmalloc_node(size, flags, node); -- cgit v1.2.3 From 97efc0aa96f990966fe584149afa1aadcf7b8568 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Amadeusz=20S=C5=82awi=C5=84ski?= Date: Mon, 17 Jul 2023 13:44:57 +0200 Subject: PCI: Sort Intel PCI IDs by number MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some of the PCI IDs are not sorted correctly, reorder them by growing ID number. Reviewed-by: Andy Shevchenko Acked-by: Bjorn Helgaas Acked-by: Mark Brown Reviewed-by: Cezary Rojewski Signed-off-by: Amadeusz Sławiński Link: https://lore.kernel.org/r/20230717114511.484999-2-amadeuszx.slawinski@linux.intel.com Signed-off-by: Takashi Iwai --- include/linux/pci_ids.h | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 2dc75df1437f..add7fb6bd844 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2677,8 +2677,6 @@ #define PCI_DEVICE_ID_INTEL_82815_MC 0x1130 #define PCI_DEVICE_ID_INTEL_82815_CGC 0x1132 #define PCI_DEVICE_ID_INTEL_82092AA_0 0x1221 -#define PCI_DEVICE_ID_INTEL_7505_0 0x2550 -#define PCI_DEVICE_ID_INTEL_7205_0 0x255d #define PCI_DEVICE_ID_INTEL_82437 0x122d #define PCI_DEVICE_ID_INTEL_82371FB_0 0x122e #define PCI_DEVICE_ID_INTEL_82371FB_1 0x1230 @@ -2772,6 +2770,8 @@ #define PCI_DEVICE_ID_INTEL_82850_HB 0x2530 #define PCI_DEVICE_ID_INTEL_82860_HB 0x2531 #define PCI_DEVICE_ID_INTEL_E7501_MCH 0x254c +#define PCI_DEVICE_ID_INTEL_7505_0 0x2550 +#define PCI_DEVICE_ID_INTEL_7205_0 0x255d #define PCI_DEVICE_ID_INTEL_82845G_HB 0x2560 #define PCI_DEVICE_ID_INTEL_82845G_IG 0x2562 #define PCI_DEVICE_ID_INTEL_82865_HB 0x2570 @@ -2806,9 +2806,9 @@ #define PCI_DEVICE_ID_INTEL_3000_HB 0x2778 #define PCI_DEVICE_ID_INTEL_82945GM_HB 0x27a0 #define PCI_DEVICE_ID_INTEL_82945GM_IG 0x27a2 +#define PCI_DEVICE_ID_INTEL_ICH7_30 0x27b0 #define PCI_DEVICE_ID_INTEL_ICH7_0 0x27b8 #define PCI_DEVICE_ID_INTEL_ICH7_1 0x27b9 -#define PCI_DEVICE_ID_INTEL_ICH7_30 0x27b0 #define PCI_DEVICE_ID_INTEL_TGP_LPC 0x27bc #define PCI_DEVICE_ID_INTEL_ICH7_31 0x27bd #define PCI_DEVICE_ID_INTEL_ICH7_17 0x27da @@ -2824,14 +2824,14 @@ #define PCI_DEVICE_ID_INTEL_ICH8_6 0x2850 #define PCI_DEVICE_ID_INTEL_VMD_28C0 0x28c0 #define PCI_DEVICE_ID_INTEL_ICH9_0 0x2910 -#define PCI_DEVICE_ID_INTEL_ICH9_1 0x2917 #define PCI_DEVICE_ID_INTEL_ICH9_2 0x2912 #define PCI_DEVICE_ID_INTEL_ICH9_3 0x2913 #define PCI_DEVICE_ID_INTEL_ICH9_4 0x2914 -#define PCI_DEVICE_ID_INTEL_ICH9_5 0x2919 -#define PCI_DEVICE_ID_INTEL_ICH9_6 0x2930 #define PCI_DEVICE_ID_INTEL_ICH9_7 0x2916 +#define PCI_DEVICE_ID_INTEL_ICH9_1 0x2917 #define PCI_DEVICE_ID_INTEL_ICH9_8 0x2918 +#define PCI_DEVICE_ID_INTEL_ICH9_5 0x2919 +#define PCI_DEVICE_ID_INTEL_ICH9_6 0x2930 #define PCI_DEVICE_ID_INTEL_I7_MCR 0x2c18 #define PCI_DEVICE_ID_INTEL_I7_MC_TAD 0x2c19 #define PCI_DEVICE_ID_INTEL_I7_MC_RAS 0x2c1a @@ -2848,8 +2848,8 @@ #define PCI_DEVICE_ID_INTEL_I7_MC_CH2_ADDR 0x2c31 #define PCI_DEVICE_ID_INTEL_I7_MC_CH2_RANK 0x2c32 #define PCI_DEVICE_ID_INTEL_I7_MC_CH2_TC 0x2c33 -#define PCI_DEVICE_ID_INTEL_I7_NONCORE 0x2c41 #define PCI_DEVICE_ID_INTEL_I7_NONCORE_ALT 0x2c40 +#define PCI_DEVICE_ID_INTEL_I7_NONCORE 0x2c41 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_NONCORE 0x2c50 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_NONCORE_ALT 0x2c51 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_NONCORE_REV2 0x2c70 @@ -2895,10 +2895,10 @@ #define PCI_DEVICE_ID_INTEL_IOAT_TBG3 0x3433 #define PCI_DEVICE_ID_INTEL_82830_HB 0x3575 #define PCI_DEVICE_ID_INTEL_82830_CGC 0x3577 -#define PCI_DEVICE_ID_INTEL_82854_HB 0x358c -#define PCI_DEVICE_ID_INTEL_82854_IG 0x358e #define PCI_DEVICE_ID_INTEL_82855GM_HB 0x3580 #define PCI_DEVICE_ID_INTEL_82855GM_IG 0x3582 +#define PCI_DEVICE_ID_INTEL_82854_HB 0x358c +#define PCI_DEVICE_ID_INTEL_82854_IG 0x358e #define PCI_DEVICE_ID_INTEL_E7520_MCH 0x3590 #define PCI_DEVICE_ID_INTEL_E7320_MCH 0x3592 #define PCI_DEVICE_ID_INTEL_MCH_PA 0x3595 @@ -2908,11 +2908,11 @@ #define PCI_DEVICE_ID_INTEL_MCH_PC 0x3599 #define PCI_DEVICE_ID_INTEL_MCH_PC1 0x359a #define PCI_DEVICE_ID_INTEL_E7525_MCH 0x359e +#define PCI_DEVICE_ID_INTEL_IOAT_CNB 0x360b +#define PCI_DEVICE_ID_INTEL_FBD_CNB 0x360c #define PCI_DEVICE_ID_INTEL_I7300_MCH_ERR 0x360c #define PCI_DEVICE_ID_INTEL_I7300_MCH_FB0 0x360f #define PCI_DEVICE_ID_INTEL_I7300_MCH_FB1 0x3610 -#define PCI_DEVICE_ID_INTEL_IOAT_CNB 0x360b -#define PCI_DEVICE_ID_INTEL_FBD_CNB 0x360c #define PCI_DEVICE_ID_INTEL_IOAT_JSF0 0x3710 #define PCI_DEVICE_ID_INTEL_IOAT_JSF1 0x3711 #define PCI_DEVICE_ID_INTEL_IOAT_JSF2 0x3712 @@ -2943,16 +2943,12 @@ #define PCI_DEVICE_ID_INTEL_IOAT_SNB7 0x3c27 #define PCI_DEVICE_ID_INTEL_IOAT_SNB8 0x3c2e #define PCI_DEVICE_ID_INTEL_IOAT_SNB9 0x3c2f -#define PCI_DEVICE_ID_INTEL_UNC_HA 0x3c46 -#define PCI_DEVICE_ID_INTEL_UNC_IMC0 0x3cb0 -#define PCI_DEVICE_ID_INTEL_UNC_IMC1 0x3cb1 -#define PCI_DEVICE_ID_INTEL_UNC_IMC2 0x3cb4 -#define PCI_DEVICE_ID_INTEL_UNC_IMC3 0x3cb5 #define PCI_DEVICE_ID_INTEL_UNC_QPI0 0x3c41 #define PCI_DEVICE_ID_INTEL_UNC_QPI1 0x3c42 #define PCI_DEVICE_ID_INTEL_UNC_R2PCIE 0x3c43 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI0 0x3c44 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI1 0x3c45 +#define PCI_DEVICE_ID_INTEL_UNC_HA 0x3c46 #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS 0x3c71 /* 15.1 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0 0x3c72 /* 16.2 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1 0x3c73 /* 16.3 */ @@ -2964,6 +2960,10 @@ #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ +#define PCI_DEVICE_ID_INTEL_UNC_IMC0 0x3cb0 +#define PCI_DEVICE_ID_INTEL_UNC_IMC1 0x3cb1 +#define PCI_DEVICE_ID_INTEL_UNC_IMC2 0x3cb4 +#define PCI_DEVICE_ID_INTEL_UNC_IMC3 0x3cb5 #define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO 0x3cb8 /* 17.0 */ #define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX 0x3ce0 #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0 0x3cf4 /* 12.6 */ -- cgit v1.2.3 From 2407c45329ddfac0b760d2095c484b371dec8ec9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Amadeusz=20S=C5=82awi=C5=84ski?= Date: Mon, 17 Jul 2023 13:44:58 +0200 Subject: PCI: Add Intel Audio DSP devices to pci_ids.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Those IDs are mostly sprinkled between HDA, Skylake, SOF and avs drivers. Almost every use contains additional comments to identify to which platform those IDs refer to. Add those IDs to pci_ids.h header, so that there is one place which defines those names. Acked-by: Mark Brown Acked-by: Bjorn Helgaas Reviewed-by: Andy Shevchenko # for the Intel Tangier ID Reviewed-by: Cezary Rojewski Signed-off-by: Amadeusz Sławiński Link: https://lore.kernel.org/r/20230717114511.484999-3-amadeuszx.slawinski@linux.intel.com Signed-off-by: Takashi Iwai --- include/linux/pci_ids.h | 73 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index add7fb6bd844..3066660cd39b 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2644,6 +2644,7 @@ #define PCI_VENDOR_ID_INTEL 0x8086 #define PCI_DEVICE_ID_INTEL_EESSC 0x0008 +#define PCI_DEVICE_ID_INTEL_HDA_CML_LP 0x02c8 #define PCI_DEVICE_ID_INTEL_PXHD_0 0x0320 #define PCI_DEVICE_ID_INTEL_PXHD_1 0x0321 #define PCI_DEVICE_ID_INTEL_PXH_0 0x0329 @@ -2659,8 +2660,10 @@ #define PCI_DEVICE_ID_INTEL_82424 0x0483 #define PCI_DEVICE_ID_INTEL_82378 0x0484 #define PCI_DEVICE_ID_INTEL_82425 0x0486 +#define PCI_DEVICE_ID_INTEL_HDA_CML_H 0x06c8 #define PCI_DEVICE_ID_INTEL_MRST_SD0 0x0807 #define PCI_DEVICE_ID_INTEL_MRST_SD1 0x0808 +#define PCI_DEVICE_ID_INTEL_HDA_OAKTRAIL 0x080a #define PCI_DEVICE_ID_INTEL_MFD_SD 0x0820 #define PCI_DEVICE_ID_INTEL_MFD_SDIO1 0x0821 #define PCI_DEVICE_ID_INTEL_MFD_SDIO2 0x0822 @@ -2670,12 +2673,18 @@ #define PCI_DEVICE_ID_INTEL_QUARK_X1000_ILB 0x095e #define PCI_DEVICE_ID_INTEL_I960 0x0960 #define PCI_DEVICE_ID_INTEL_I960RM 0x0962 +#define PCI_DEVICE_ID_INTEL_HDA_HSW_0 0x0a0c +#define PCI_DEVICE_ID_INTEL_HDA_HSW_2 0x0c0c #define PCI_DEVICE_ID_INTEL_CENTERTON_ILB 0x0c60 +#define PCI_DEVICE_ID_INTEL_HDA_HSW_3 0x0d0c +#define PCI_DEVICE_ID_INTEL_HDA_BYT 0x0f04 +#define PCI_DEVICE_ID_INTEL_SST_BYT 0x0f28 #define PCI_DEVICE_ID_INTEL_8257X_SOL 0x1062 #define PCI_DEVICE_ID_INTEL_82573E_SOL 0x1085 #define PCI_DEVICE_ID_INTEL_82573L_SOL 0x108f #define PCI_DEVICE_ID_INTEL_82815_MC 0x1130 #define PCI_DEVICE_ID_INTEL_82815_CGC 0x1132 +#define PCI_DEVICE_ID_INTEL_SST_TNG 0x119a #define PCI_DEVICE_ID_INTEL_82092AA_0 0x1221 #define PCI_DEVICE_ID_INTEL_82437 0x122d #define PCI_DEVICE_ID_INTEL_82371FB_0 0x122e @@ -2702,20 +2711,26 @@ #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_2C_BRIDGE 0x1576 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 +#define PCI_DEVICE_ID_INTEL_HDA_BDW 0x160c #define PCI_DEVICE_ID_INTEL_80960_RP 0x1960 #define PCI_DEVICE_ID_INTEL_QAT_C3XXX 0x19e2 #define PCI_DEVICE_ID_INTEL_QAT_C3XXX_VF 0x19e3 #define PCI_DEVICE_ID_INTEL_82840_HB 0x1a21 #define PCI_DEVICE_ID_INTEL_82845_HB 0x1a30 #define PCI_DEVICE_ID_INTEL_IOAT 0x1a38 +#define PCI_DEVICE_ID_INTEL_HDA_CPT 0x1c20 #define PCI_DEVICE_ID_INTEL_COUGARPOINT_LPC_MIN 0x1c41 #define PCI_DEVICE_ID_INTEL_COUGARPOINT_LPC_MAX 0x1c5f +#define PCI_DEVICE_ID_INTEL_HDA_PBG 0x1d20 #define PCI_DEVICE_ID_INTEL_PATSBURG_LPC_0 0x1d40 #define PCI_DEVICE_ID_INTEL_PATSBURG_LPC_1 0x1d41 +#define PCI_DEVICE_ID_INTEL_HDA_PPT 0x1e20 #define PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI 0x1e31 #define PCI_DEVICE_ID_INTEL_PANTHERPOINT_LPC_MIN 0x1e40 #define PCI_DEVICE_ID_INTEL_PANTHERPOINT_LPC_MAX 0x1e5f #define PCI_DEVICE_ID_INTEL_VMD_201D 0x201d +#define PCI_DEVICE_ID_INTEL_HDA_BSW 0x2284 +#define PCI_DEVICE_ID_INTEL_SST_BSW 0x22a8 #define PCI_DEVICE_ID_INTEL_DH89XXCC_LPC_MIN 0x2310 #define PCI_DEVICE_ID_INTEL_DH89XXCC_LPC_MAX 0x231f #define PCI_DEVICE_ID_INTEL_82801AA_0 0x2410 @@ -2793,12 +2808,14 @@ #define PCI_DEVICE_ID_INTEL_ICH6_0 0x2640 #define PCI_DEVICE_ID_INTEL_ICH6_1 0x2641 #define PCI_DEVICE_ID_INTEL_ICH6_2 0x2642 +#define PCI_DEVICE_ID_INTEL_HDA_ICH6 0x2668 #define PCI_DEVICE_ID_INTEL_ICH6_16 0x266a #define PCI_DEVICE_ID_INTEL_ICH6_17 0x266d #define PCI_DEVICE_ID_INTEL_ICH6_18 0x266e #define PCI_DEVICE_ID_INTEL_ICH6_19 0x266f #define PCI_DEVICE_ID_INTEL_ESB2_0 0x2670 #define PCI_DEVICE_ID_INTEL_ESB2_14 0x2698 +#define PCI_DEVICE_ID_INTEL_HDA_ESB2 0x269a #define PCI_DEVICE_ID_INTEL_ESB2_17 0x269b #define PCI_DEVICE_ID_INTEL_ESB2_18 0x269e #define PCI_DEVICE_ID_INTEL_82945G_HB 0x2770 @@ -2811,6 +2828,7 @@ #define PCI_DEVICE_ID_INTEL_ICH7_1 0x27b9 #define PCI_DEVICE_ID_INTEL_TGP_LPC 0x27bc #define PCI_DEVICE_ID_INTEL_ICH7_31 0x27bd +#define PCI_DEVICE_ID_INTEL_HDA_ICH7 0x27d8 #define PCI_DEVICE_ID_INTEL_ICH7_17 0x27da #define PCI_DEVICE_ID_INTEL_ICH7_19 0x27dd #define PCI_DEVICE_ID_INTEL_ICH7_20 0x27de @@ -2821,6 +2839,7 @@ #define PCI_DEVICE_ID_INTEL_ICH8_3 0x2814 #define PCI_DEVICE_ID_INTEL_ICH8_4 0x2815 #define PCI_DEVICE_ID_INTEL_ICH8_5 0x283e +#define PCI_DEVICE_ID_INTEL_HDA_ICH8 0x284b #define PCI_DEVICE_ID_INTEL_ICH8_6 0x2850 #define PCI_DEVICE_ID_INTEL_VMD_28C0 0x28c0 #define PCI_DEVICE_ID_INTEL_ICH9_0 0x2910 @@ -2832,6 +2851,8 @@ #define PCI_DEVICE_ID_INTEL_ICH9_8 0x2918 #define PCI_DEVICE_ID_INTEL_ICH9_5 0x2919 #define PCI_DEVICE_ID_INTEL_ICH9_6 0x2930 +#define PCI_DEVICE_ID_INTEL_HDA_ICH9_0 0x293e +#define PCI_DEVICE_ID_INTEL_HDA_ICH9_1 0x293f #define PCI_DEVICE_ID_INTEL_I7_MCR 0x2c18 #define PCI_DEVICE_ID_INTEL_I7_MC_TAD 0x2c19 #define PCI_DEVICE_ID_INTEL_I7_MC_RAS 0x2c1a @@ -2883,6 +2904,7 @@ #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_ADDR_REV2 0x2db1 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2 0x2db2 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV2 0x2db3 +#define PCI_DEVICE_ID_INTEL_HDA_GML 0x3198 #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340 #define PCI_DEVICE_ID_INTEL_IOAT_TBG4 0x3429 #define PCI_DEVICE_ID_INTEL_IOAT_TBG5 0x342a @@ -2893,6 +2915,7 @@ #define PCI_DEVICE_ID_INTEL_IOAT_TBG1 0x3431 #define PCI_DEVICE_ID_INTEL_IOAT_TBG2 0x3432 #define PCI_DEVICE_ID_INTEL_IOAT_TBG3 0x3433 +#define PCI_DEVICE_ID_INTEL_HDA_ICL_LP 0x34c8 #define PCI_DEVICE_ID_INTEL_82830_HB 0x3575 #define PCI_DEVICE_ID_INTEL_82830_CGC 0x3577 #define PCI_DEVICE_ID_INTEL_82855GM_HB 0x3580 @@ -2925,14 +2948,19 @@ #define PCI_DEVICE_ID_INTEL_IOAT_JSF9 0x3719 #define PCI_DEVICE_ID_INTEL_QAT_C62X 0x37c8 #define PCI_DEVICE_ID_INTEL_QAT_C62X_VF 0x37c9 +#define PCI_DEVICE_ID_INTEL_HDA_ICL_N 0x38c8 #define PCI_DEVICE_ID_INTEL_ICH10_0 0x3a14 #define PCI_DEVICE_ID_INTEL_ICH10_1 0x3a16 #define PCI_DEVICE_ID_INTEL_ICH10_2 0x3a18 #define PCI_DEVICE_ID_INTEL_ICH10_3 0x3a1a #define PCI_DEVICE_ID_INTEL_ICH10_4 0x3a30 +#define PCI_DEVICE_ID_INTEL_HDA_ICH10_0 0x3a3e #define PCI_DEVICE_ID_INTEL_ICH10_5 0x3a60 +#define PCI_DEVICE_ID_INTEL_HDA_ICH10_1 0x3a6e #define PCI_DEVICE_ID_INTEL_5_3400_SERIES_LPC_MIN 0x3b00 #define PCI_DEVICE_ID_INTEL_5_3400_SERIES_LPC_MAX 0x3b1f +#define PCI_DEVICE_ID_INTEL_HDA_5_3400_SERIES_0 0x3b56 +#define PCI_DEVICE_ID_INTEL_HDA_5_3400_SERIES_1 0x3b57 #define PCI_DEVICE_ID_INTEL_IOAT_SNB0 0x3c20 #define PCI_DEVICE_ID_INTEL_IOAT_SNB1 0x3c21 #define PCI_DEVICE_ID_INTEL_IOAT_SNB2 0x3c22 @@ -2969,12 +2997,31 @@ #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0 0x3cf4 /* 12.6 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5 /* 13.6 */ #define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1 0x3cf6 /* 12.7 */ +#define PCI_DEVICE_ID_INTEL_HDA_ICL_H 0x3dc8 #define PCI_DEVICE_ID_INTEL_IOAT_SNB 0x402f #define PCI_DEVICE_ID_INTEL_5400_ERR 0x4030 #define PCI_DEVICE_ID_INTEL_5400_FBD0 0x4035 #define PCI_DEVICE_ID_INTEL_5400_FBD1 0x4036 +#define PCI_DEVICE_ID_INTEL_HDA_TGL_H 0x43c8 +#define PCI_DEVICE_ID_INTEL_HDA_DG1 0x490d +#define PCI_DEVICE_ID_INTEL_HDA_EHL_0 0x4b55 +#define PCI_DEVICE_ID_INTEL_HDA_EHL_3 0x4b58 +#define PCI_DEVICE_ID_INTEL_HDA_JSL_N 0x4dc8 +#define PCI_DEVICE_ID_INTEL_HDA_DG2_0 0x4f90 +#define PCI_DEVICE_ID_INTEL_HDA_DG2_1 0x4f91 +#define PCI_DEVICE_ID_INTEL_HDA_DG2_2 0x4f92 #define PCI_DEVICE_ID_INTEL_EP80579_0 0x5031 #define PCI_DEVICE_ID_INTEL_EP80579_1 0x5032 +#define PCI_DEVICE_ID_INTEL_HDA_ADL_P 0x51c8 +#define PCI_DEVICE_ID_INTEL_HDA_ADL_PS 0x51c9 +#define PCI_DEVICE_ID_INTEL_HDA_RPL_P_0 0x51ca +#define PCI_DEVICE_ID_INTEL_HDA_RPL_P_1 0x51cb +#define PCI_DEVICE_ID_INTEL_HDA_ADL_M 0x51cc +#define PCI_DEVICE_ID_INTEL_HDA_ADL_PX 0x51cd +#define PCI_DEVICE_ID_INTEL_HDA_RPL_M 0x51ce +#define PCI_DEVICE_ID_INTEL_HDA_RPL_PX 0x51cf +#define PCI_DEVICE_ID_INTEL_HDA_ADL_N 0x54c8 +#define PCI_DEVICE_ID_INTEL_HDA_APL 0x5a98 #define PCI_DEVICE_ID_INTEL_5100_16 0x65f0 #define PCI_DEVICE_ID_INTEL_5100_19 0x65f3 #define PCI_DEVICE_ID_INTEL_5100_21 0x65f5 @@ -3008,8 +3055,12 @@ #define PCI_DEVICE_ID_INTEL_82443GX_0 0x71a0 #define PCI_DEVICE_ID_INTEL_82443GX_2 0x71a2 #define PCI_DEVICE_ID_INTEL_82372FB_1 0x7601 +#define PCI_DEVICE_ID_INTEL_HDA_RPL_S 0x7a50 +#define PCI_DEVICE_ID_INTEL_HDA_ADL_S 0x7ad0 +#define PCI_DEVICE_ID_INTEL_HDA_MTL 0x7e28 #define PCI_DEVICE_ID_INTEL_SCH_LPC 0x8119 #define PCI_DEVICE_ID_INTEL_SCH_IDE 0x811a +#define PCI_DEVICE_ID_INTEL_HDA_POULSBO 0x811b #define PCI_DEVICE_ID_INTEL_E6XX_CU 0x8183 #define PCI_DEVICE_ID_INTEL_ITC_LPC 0x8186 #define PCI_DEVICE_ID_INTEL_82454GX 0x84c4 @@ -3018,9 +3069,31 @@ #define PCI_DEVICE_ID_INTEL_82454NX 0x84cb #define PCI_DEVICE_ID_INTEL_84460GX 0x84ea #define PCI_DEVICE_ID_INTEL_IXP4XX 0x8500 +#define PCI_DEVICE_ID_INTEL_HDA_LPT 0x8c20 +#define PCI_DEVICE_ID_INTEL_HDA_9_SERIES 0x8ca0 +#define PCI_DEVICE_ID_INTEL_HDA_WBG_0 0x8d20 +#define PCI_DEVICE_ID_INTEL_HDA_WBG_1 0x8d21 #define PCI_DEVICE_ID_INTEL_IXP2800 0x9004 +#define PCI_DEVICE_ID_INTEL_HDA_LKF 0x98c8 #define PCI_DEVICE_ID_INTEL_VMD_9A0B 0x9a0b +#define PCI_DEVICE_ID_INTEL_HDA_LPT_LP_0 0x9c20 +#define PCI_DEVICE_ID_INTEL_HDA_LPT_LP_1 0x9c21 +#define PCI_DEVICE_ID_INTEL_HDA_WPT_LP 0x9ca0 +#define PCI_DEVICE_ID_INTEL_HDA_SKL_LP 0x9d70 +#define PCI_DEVICE_ID_INTEL_HDA_KBL_LP 0x9d71 +#define PCI_DEVICE_ID_INTEL_HDA_CNL_LP 0x9dc8 +#define PCI_DEVICE_ID_INTEL_HDA_TGL_LP 0xa0c8 +#define PCI_DEVICE_ID_INTEL_HDA_SKL 0xa170 +#define PCI_DEVICE_ID_INTEL_HDA_KBL 0xa171 +#define PCI_DEVICE_ID_INTEL_HDA_LBG_0 0xa1f0 +#define PCI_DEVICE_ID_INTEL_HDA_LBG_1 0xa270 +#define PCI_DEVICE_ID_INTEL_HDA_KBL_H 0xa2f0 +#define PCI_DEVICE_ID_INTEL_HDA_CNL_H 0xa348 +#define PCI_DEVICE_ID_INTEL_HDA_CML_S 0xa3f0 +#define PCI_DEVICE_ID_INTEL_HDA_LNL_P 0xa828 #define PCI_DEVICE_ID_INTEL_S21152BB 0xb152 +#define PCI_DEVICE_ID_INTEL_HDA_CML_R 0xf0c8 +#define PCI_DEVICE_ID_INTEL_HDA_RKL_S 0xf1c8 #define PCI_VENDOR_ID_WANGXUN 0x8088 -- cgit v1.2.3 From 78908f45ccf1dc2f4d5fb395c460fdbbf7e9ac3a Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Mon, 17 Jul 2023 21:33:03 +0100 Subject: regmap: Let users check if a register is cached The HDA driver has a use case for checking if a register is cached which it bodges in awkwardly and unclearly. Provide an API which allows it to directly do what it's trying to do. Reviewed-by: Takashi Iwai Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20230717-regmap-cache-check-v1-1-73ef688afae3@kernel.org Signed-off-by: Mark Brown --- include/linux/regmap.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 8fc0b3ebce44..c9182a47736e 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1287,6 +1287,7 @@ int regcache_drop_region(struct regmap *map, unsigned int min, void regcache_cache_only(struct regmap *map, bool enable); void regcache_cache_bypass(struct regmap *map, bool enable); void regcache_mark_dirty(struct regmap *map); +bool regcache_reg_cached(struct regmap *map, unsigned int reg); bool regmap_check_range_table(struct regmap *map, unsigned int reg, const struct regmap_access_table *table); -- cgit v1.2.3 From da7c07b1083809888c82522e74370f962fb7685e Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Tue, 18 Jul 2023 01:28:42 +0100 Subject: driver core: Provide stubs for !IOMEM builds The various _ioremap_resource functions are not built when CONFIG_HAS_IOMEM is disabled but no stubs are provided. Given how widespread IOMEM usage is in drivers and how rare !IOMEM configurations are in practical use let's just provide some stubs so users will build without having to add explicit dependencies on HAS_IOMEM. The most likely use case is builds with UML for KUnit testing. Reviewed-by: Greg Kroah-Hartman Reviewed-by: David Gow Reviewed-by: Takashi Iwai Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20230718-asoc-topology-kunit-enable-v2-1-0ee11e662b92@kernel.org Signed-off-by: Mark Brown --- include/linux/device.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index bbaeabd04b0d..6731d7dc1a2a 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -349,6 +349,7 @@ unsigned long devm_get_free_pages(struct device *dev, gfp_t gfp_mask, unsigned int order); void devm_free_pages(struct device *dev, unsigned long addr); +#ifdef CONFIG_HAS_IOMEM void __iomem *devm_ioremap_resource(struct device *dev, const struct resource *res); void __iomem *devm_ioremap_resource_wc(struct device *dev, @@ -357,6 +358,31 @@ void __iomem *devm_ioremap_resource_wc(struct device *dev, void __iomem *devm_of_iomap(struct device *dev, struct device_node *node, int index, resource_size_t *size); +#else + +static inline +void __iomem *devm_ioremap_resource(struct device *dev, + const struct resource *res) +{ + return ERR_PTR(-EINVAL); +} + +static inline +void __iomem *devm_ioremap_resource_wc(struct device *dev, + const struct resource *res) +{ + return ERR_PTR(-EINVAL); +} + +static inline +void __iomem *devm_of_iomap(struct device *dev, + struct device_node *node, int index, + resource_size_t *size) +{ + return ERR_PTR(-EINVAL); +} + +#endif /* allows to add/remove a custom action to devres stack */ void devm_remove_action(struct device *dev, void (*action)(void *), void *data); -- cgit v1.2.3 From a0c74f6c9ea9cebd7a8f38142bf87e7c12c2905d Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Tue, 18 Jul 2023 01:28:43 +0100 Subject: platform: Provide stubs for !HAS_IOMEM builds The various _ioremap_resource functions are not built when CONFIG_HAS_IOMEM is disabled but no stubs are provided. Given how widespread IOMEM usage is in drivers and how rare !IOMEM configurations are in practical use let's just provide some stubs so users will build without having to add explicit dependencies on IOMEM. The most likely use case is builds with UML for KUnit testing. Reviewed-by: Greg Kroah-Hartman Reviewed-by: David Gow Reviewed-by: Takashi Iwai Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20230718-asoc-topology-kunit-enable-v2-2-0ee11e662b92@kernel.org Signed-off-by: Mark Brown --- include/linux/platform_device.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index b845fd83f429..7a41c72c1959 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -63,6 +63,8 @@ extern struct resource *platform_get_mem_or_io(struct platform_device *, extern struct device * platform_find_device_by_driver(struct device *start, const struct device_driver *drv); + +#ifdef CONFIG_HAS_IOMEM extern void __iomem * devm_platform_get_and_ioremap_resource(struct platform_device *pdev, unsigned int index, struct resource **res); @@ -72,6 +74,32 @@ devm_platform_ioremap_resource(struct platform_device *pdev, extern void __iomem * devm_platform_ioremap_resource_byname(struct platform_device *pdev, const char *name); +#else + +static inline void __iomem * +devm_platform_get_and_ioremap_resource(struct platform_device *pdev, + unsigned int index, struct resource **res) +{ + return ERR_PTR(-EINVAL); +} + + +static inline void __iomem * +devm_platform_ioremap_resource(struct platform_device *pdev, + unsigned int index) +{ + return ERR_PTR(-EINVAL); +} + +static inline void __iomem * +devm_platform_ioremap_resource_byname(struct platform_device *pdev, + const char *name) +{ + return ERR_PTR(-EINVAL); +} + +#endif + extern int platform_get_irq(struct platform_device *, unsigned int); extern int platform_get_irq_optional(struct platform_device *, unsigned int); extern int platform_irq_count(struct platform_device *); -- cgit v1.2.3 From 62157e11d9a4ca7210bb2b0e8fa0557a6ada7fad Mon Sep 17 00:00:00 2001 From: Kamalesh Babulal Date: Tue, 18 Jul 2023 14:38:34 +0530 Subject: cgroup/misc: update struct members descriptions Update the miscellaneous controller's structure member's description of struct misc_res and struct misc_cg. Signed-off-by: Kamalesh Babulal Signed-off-by: Tejun Heo --- include/linux/misc_cgroup.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index c238207d1615..6555c0f57158 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -31,7 +31,7 @@ struct misc_cg; * struct misc_res: Per cgroup per misc type resource * @max: Maximum limit on the resource. * @usage: Current usage of the resource. - * @failed: True if charged failed for the resource in a cgroup. + * @events: Number of times, the resource limit exceeded. */ struct misc_res { unsigned long max; @@ -42,6 +42,7 @@ struct misc_res { /** * struct misc_cg - Miscellaneous controller's cgroup structure. * @css: cgroup subsys state object. + * @events_file: Handle for the misc resources events file. * @res: Array of misc resources usage in the cgroup. */ struct misc_cg { -- cgit v1.2.3 From 32bf85c60ca3584a7ba3bef19da2779b73b2e7d6 Mon Sep 17 00:00:00 2001 From: Haitao Huang Date: Mon, 17 Jul 2023 18:08:45 -0700 Subject: cgroup/misc: Change counters to be explicit 64bit types So the variables can account for resources of huge quantities even on 32-bit machines. Signed-off-by: Haitao Huang Signed-off-by: Tejun Heo --- include/linux/misc_cgroup.h | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 6555c0f57158..e799b1f8d05b 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -34,9 +34,9 @@ struct misc_cg; * @events: Number of times, the resource limit exceeded. */ struct misc_res { - unsigned long max; - atomic_long_t usage; - atomic_long_t events; + u64 max; + atomic64_t usage; + atomic64_t events; }; /** @@ -54,12 +54,10 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; -unsigned long misc_cg_res_total_usage(enum misc_res_type type); -int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); -int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, - unsigned long amount); -void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, - unsigned long amount); +u64 misc_cg_res_total_usage(enum misc_res_type type); +int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); +int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount); +void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount); /** * css_misc() - Get misc cgroup from the css. @@ -100,27 +98,26 @@ static inline void put_misc_cg(struct misc_cg *cg) #else /* !CONFIG_CGROUP_MISC */ -static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) +static inline u64 misc_cg_res_total_usage(enum misc_res_type type) { return 0; } -static inline int misc_cg_set_capacity(enum misc_res_type type, - unsigned long capacity) +static inline int misc_cg_set_capacity(enum misc_res_type type, u64 capacity) { return 0; } static inline int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, - unsigned long amount) + u64 amount) { return 0; } static inline void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, - unsigned long amount) + u64 amount) { } -- cgit v1.2.3 From 0a1f7bfe35a3e1302529fa900bf0574a5dfc8ea6 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Tue, 18 Jul 2023 01:38:09 -0700 Subject: bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node Structs bpf_rb_node and bpf_list_node are opaquely defined in uapi/linux/bpf.h, as BPF program writers are not expected to touch their fields - nor does the verifier allow them to do so. Currently these structs are simple wrappers around structs rb_node and list_head and linked_list / rbtree implementation just casts and passes to library functions for those data structures. Later patches in this series, though, will add an "owner" field to bpf_{rb,list}_node, such that they're not just wrapping an underlying node type. Moreover, the bpf linked_list and rbtree implementations will deal with these owner pointers directly in a few different places. To avoid having to do void *owner = (void*)bpf_list_node + sizeof(struct list_head) with opaque UAPI node types, add bpf_{list,rb}_node_kern struct definitions to internal headers and modify linked_list and rbtree to use the internal types where appropriate. Signed-off-by: Dave Marchevsky Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 360433f14496..511ed49c3fe9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -228,6 +228,16 @@ struct btf_record { struct btf_field fields[]; }; +/* Non-opaque version of bpf_rb_node in uapi/linux/bpf.h */ +struct bpf_rb_node_kern { + struct rb_node rb_node; +} __attribute__((aligned(8))); + +/* Non-opaque version of bpf_list_node in uapi/linux/bpf.h */ +struct bpf_list_node_kern { + struct list_head list_head; +} __attribute__((aligned(8))); + struct bpf_map { /* The first two cachelines with read-mostly members of which some * are also accessed in fast-path (e.g. ops, max_entries). -- cgit v1.2.3 From c3c510ce431cd99fa10dcd50d995c8e89330ee5b Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Tue, 18 Jul 2023 01:38:10 -0700 Subject: bpf: Add 'owner' field to bpf_{list,rb}_node As described by Kumar in [0], in shared ownership scenarios it is necessary to do runtime tracking of {rb,list} node ownership - and synchronize updates using this ownership information - in order to prevent races. This patch adds an 'owner' field to struct bpf_list_node and bpf_rb_node to implement such runtime tracking. The owner field is a void * that describes the ownership state of a node. It can have the following values: NULL - the node is not owned by any data structure BPF_PTR_POISON - the node is in the process of being added to a data structure ptr_to_root - the pointee is a data structure 'root' (bpf_rb_root / bpf_list_head) which owns this node The field is initially NULL (set by bpf_obj_init_field default behavior) and transitions states in the following sequence: Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root Removal: ptr_to_root -> NULL Before a node has been successfully inserted, it is not protected by any root's lock, and therefore two programs can attempt to add the same node to different roots simultaneously. For this reason the intermediate BPF_PTR_POISON state is necessary. For removal, the node is protected by some root's lock so this intermediate hop isn't necessary. Note that bpf_list_pop_{front,back} helpers don't need to check owner before removing as the node-to-be-removed is not passed in as input and is instead taken directly from the list. Do the check anyways and WARN_ON_ONCE in this unexpected scenario. Selftest changes in this patch are entirely mechanical: some BTF tests have hardcoded struct sizes for structs that contain bpf_{list,rb}_node fields, those were adjusted to account for the new sizes. Selftest additions to validate the owner field are added in a further patch in the series. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2 Signed-off-by: Dave Marchevsky Suggested-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 511ed49c3fe9..ceaa8c23287f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -231,11 +231,13 @@ struct btf_record { /* Non-opaque version of bpf_rb_node in uapi/linux/bpf.h */ struct bpf_rb_node_kern { struct rb_node rb_node; + void *owner; } __attribute__((aligned(8))); /* Non-opaque version of bpf_list_node in uapi/linux/bpf.h */ struct bpf_list_node_kern { struct list_head list_head; + void *owner; } __attribute__((aligned(8))); struct bpf_map { -- cgit v1.2.3 From dfa2f0483360d4d6f2324405464c9f281156bd87 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Jul 2023 15:29:17 +0000 Subject: tcp: get rid of sysctl_tcp_adv_win_scale With modern NIC drivers shifting to full page allocations per received frame, we face the following issue: TCP has one per-netns sysctl used to tweak how to translate a memory use into an expected payload (RWIN), in RX path. tcp_win_from_space() implementation is limited to few cases. For hosts dealing with various MSS, we either under estimate or over estimate the RWIN we send to the remote peers. For instance with the default sysctl_tcp_adv_win_scale value, we expect to store 50% of payload per allocated chunk of memory. For the typical use of MTU=1500 traffic, and order-0 pages allocations by NIC drivers, we are sending too big RWIN, leading to potential tcp collapse operations, which are extremely expensive and source of latency spikes. This patch makes sysctl_tcp_adv_win_scale obsolete, and instead uses a per socket scaling factor, so that we can precisely adjust the RWIN based on effective skb->len/skb->truesize ratio. This patch alone can double TCP receive performance when receivers are too slow to drain their receive queue, or by allowing a bigger RWIN when MSS is close to PAGE_SIZE. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Link: https://lore.kernel.org/r/20230717152917.751987-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index b4c08ac86983..fbcb0ce13171 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -172,6 +172,8 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } +#define TCP_RMEM_TO_WIN_SCALE 8 + struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -238,7 +240,7 @@ struct tcp_sock { u32 window_clamp; /* Maximal window to advertise */ u32 rcv_ssthresh; /* Current window clamp */ - + u8 scaling_ratio; /* see tcp_win_from_space() */ /* Information of the most recently (s)acked skb */ struct tcp_rack { u64 mstamp; /* (Re)sent time of the skb */ -- cgit v1.2.3 From 48b5583719cdfbdee238f9549a6a1a47af2b0469 Mon Sep 17 00:00:00 2001 From: Chin Yik Ming Date: Mon, 17 Jul 2023 14:49:52 +0800 Subject: sched/headers: Rename task_struct::state to task_struct::__state in the comments too The rename in 2f064a59a11f ("sched: Change task_struct::state") missed the comments. [ mingo: Improved the changelog. ] Signed-off-by: Chin Yik Ming Signed-off-by: Ingo Molnar Reviewed-by: Daniel Bristot de Oliveira Link: https://lore.kernel.org/r/20230717064952.2804-1-yikming2222@gmail.com --- include/linux/sched.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index efc9f4bdc4ca..2aab7be46f7e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -75,14 +75,14 @@ struct user_event_mm; * Task state bitmask. NOTE! These bits are also * encoded in fs/proc/array.c: get_task_state(). * - * We have two separate sets of flags: task->state + * We have two separate sets of flags: task->__state * is about runnability, while task->exit_state are * about the task exiting. Confusing, but this way * modifying one set can't modify the other one by * mistake. */ -/* Used in tsk->state: */ +/* Used in tsk->__state: */ #define TASK_RUNNING 0x00000000 #define TASK_INTERRUPTIBLE 0x00000001 #define TASK_UNINTERRUPTIBLE 0x00000002 @@ -92,7 +92,7 @@ struct user_event_mm; #define EXIT_DEAD 0x00000010 #define EXIT_ZOMBIE 0x00000020 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) -/* Used in tsk->state again: */ +/* Used in tsk->__state again: */ #define TASK_PARKED 0x00000040 #define TASK_DEAD 0x00000080 #define TASK_WAKEKILL 0x00000100 @@ -173,7 +173,7 @@ struct user_event_mm; #endif /* - * set_current_state() includes a barrier so that the write of current->state + * set_current_state() includes a barrier so that the write of current->__state * is correctly serialised wrt the caller's subsequent test of whether to * actually sleep: * @@ -196,9 +196,9 @@ struct user_event_mm; * wake_up_state(p, TASK_UNINTERRUPTIBLE); * * where wake_up_state()/try_to_wake_up() executes a full memory barrier before - * accessing p->state. + * accessing p->__state. * - * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is, + * Wakeup will do: if (@state & p->__state) p->__state = TASK_RUNNING, that is, * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING). * -- cgit v1.2.3 From 86bfbb7ce4f67a88df2639198169b685668e7349 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 31 May 2023 13:58:42 +0200 Subject: sched/fair: Add lag based placement With the introduction of avg_vruntime, it is possible to approximate lag (the entire purpose of introducing it in fact). Use this to do lag based placement over sleep+wake. Specifically, the FAIR_SLEEPERS thing places things too far to the left and messes up the deadline aspect of EEVDF. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230531124603.794929315@infradead.org --- include/linux/sched.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 2aab7be46f7e..ba1828b2a6a5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -554,8 +554,9 @@ struct sched_entity { u64 exec_start; u64 sum_exec_runtime; - u64 vruntime; u64 prev_sum_exec_runtime; + u64 vruntime; + s64 vlag; u64 nr_migrations; -- cgit v1.2.3 From 99d4d26551b56f4e523dd04e4970b94aa796a64e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 31 May 2023 13:58:43 +0200 Subject: rbtree: Add rb_add_augmented_cached() helper While slightly sub-optimal, updating the augmented data while going down the tree during lookup would be faster -- alas the augment interface does not currently allow for that, provide a generic helper to add a node to an augmented cached tree. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230531124603.862983648@infradead.org --- include/linux/rbtree_augmented.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h index 7ee7ed5de722..6dbc5a1bf6a8 100644 --- a/include/linux/rbtree_augmented.h +++ b/include/linux/rbtree_augmented.h @@ -60,6 +60,32 @@ rb_insert_augmented_cached(struct rb_node *node, rb_insert_augmented(node, &root->rb_root, augment); } +static __always_inline struct rb_node * +rb_add_augmented_cached(struct rb_node *node, struct rb_root_cached *tree, + bool (*less)(struct rb_node *, const struct rb_node *), + const struct rb_augment_callbacks *augment) +{ + struct rb_node **link = &tree->rb_root.rb_node; + struct rb_node *parent = NULL; + bool leftmost = true; + + while (*link) { + parent = *link; + if (less(node, parent)) { + link = &parent->rb_left; + } else { + link = &parent->rb_right; + leftmost = false; + } + } + + rb_link_node(node, parent, link); + augment->propagate(parent, NULL); /* suboptimal */ + rb_insert_augmented_cached(node, tree, leftmost, augment); + + return leftmost ? node : NULL; +} + /* * Template for declaring augmented rbtree callbacks (generic case) * -- cgit v1.2.3 From 147f3efaa24182a21706bca15eab2f3f4630b5fe Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 31 May 2023 13:58:44 +0200 Subject: sched/fair: Implement an EEVDF-like scheduling policy Where CFS is currently a WFQ based scheduler with only a single knob, the weight. The addition of a second, latency oriented parameter, makes something like WF2Q or EEVDF based a much better fit. Specifically, EEVDF does EDF like scheduling in the left half of the tree -- those entities that are owed service. Except because this is a virtual time scheduler, the deadlines are in virtual time as well, which is what allows over-subscription. EEVDF has two parameters: - weight, or time-slope: which is mapped to nice just as before - request size, or slice length: which is used to compute the virtual deadline as: vd_i = ve_i + r_i/w_i Basically, by setting a smaller slice, the deadline will be earlier and the task will be more eligible and ran earlier. Tick driven preemption is driven by request/slice completion; while wakeup preemption is driven by the deadline. Because the tree is now effectively an interval tree, and the selection is no longer 'leftmost', over-scheduling is less of a problem. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230531124603.931005524@infradead.org --- include/linux/sched.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index ba1828b2a6a5..177b3f3676ef 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -549,6 +549,9 @@ struct sched_entity { /* For load-balancing: */ struct load_weight load; struct rb_node run_node; + u64 deadline; + u64 min_deadline; + struct list_head group_node; unsigned int on_rq; @@ -557,6 +560,7 @@ struct sched_entity { u64 prev_sum_exec_runtime; u64 vruntime; s64 vlag; + u64 slice; u64 nr_migrations; -- cgit v1.2.3 From 5ba190c29cf92f157bd63c9909c7050d6dc43df7 Mon Sep 17 00:00:00 2001 From: Anton Protopopov Date: Wed, 19 Jul 2023 09:29:50 +0000 Subject: bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map Add the BTF id of struct bpf_map to the reg2btf_ids array. This makes the values of the CONST_PTR_TO_MAP type to be considered as trusted by kfuncs. This, in turn, allows users to execute trusted kfuncs which accept `struct bpf_map *` arguments from non-tracing programs. While exporting the btf_bpf_map_id variable, save some bytes by defining it as BTF_ID_LIST_GLOBAL_SINGLE (which is u32[1]) and not as BTF_ID_LIST (which is u32[64]). Signed-off-by: Anton Protopopov Link: https://lore.kernel.org/r/20230719092952.41202-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov --- include/linux/btf_ids.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 00950cc03bff..a3462a9b8e18 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -267,5 +267,6 @@ MAX_BTF_TRACING_TYPE, extern u32 btf_tracing_ids[]; extern u32 bpf_cgroup_btf_id[]; extern u32 bpf_local_storage_map_btf_id[]; +extern u32 btf_bpf_map_id[]; #endif -- cgit v1.2.3 From 13ce2daa259a3bfbc9a5aeeee8b9a87058703731 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 19 Jul 2023 15:24:07 +0200 Subject: xsk: add new netlink attribute dedicated for ZC max frags Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will carry maximum fragments that underlying ZC driver is able to handle on TX side. It is going to be included in netlink response only when driver supports ZC. Any value higher than 1 implies multi-buffer ZC support on underlying device. Signed-off-by: Maciej Fijalkowski Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b828c7a75be2..b12477ea4032 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2250,6 +2250,7 @@ struct net_device { #define GRO_MAX_SIZE (8 * 65535u) unsigned int gro_max_size; unsigned int gro_ipv4_max_size; + unsigned int xdp_zc_max_segs; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; -- cgit v1.2.3 From 053c8e1f235dc3f69d13375b32f4209228e1cb96 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 19 Jul 2023 16:08:51 +0200 Subject: bpf: Add generic attach/detach/query API for multi-progs This adds a generic layer called bpf_mprog which can be reused by different attachment layers to enable multi-program attachment and dependency resolution. In-kernel users of the bpf_mprog don't need to care about the dependency resolution internals, they can just consume it with few API calls. The initial idea of having a generic API sparked out of discussion [0] from an earlier revision of this work where tc's priority was reused and exposed via BPF uapi as a way to coordinate dependencies among tc BPF programs, similar as-is for classic tc BPF. The feedback was that priority provides a bad user experience and is hard to use [1], e.g.: I cannot help but feel that priority logic copy-paste from old tc, netfilter and friends is done because "that's how things were done in the past". [...] Priority gets exposed everywhere in uapi all the way to bpftool when it's right there for users to understand. And that's the main problem with it. The user don't want to and don't need to be aware of it, but uapi forces them to pick the priority. [...] Your cover letter [0] example proves that in real life different service pick the same priority. They simply don't know any better. Priority is an unnecessary magic that apps _have_ to pick, so they just copy-paste and everyone ends up using the same. The course of the discussion showed more and more the need for a generic, reusable API where the "same look and feel" can be applied for various other program types beyond just tc BPF, for example XDP today does not have multi- program support in kernel, but also there was interest around this API for improving management of cgroup program types. Such common multi-program management concept is useful for BPF management daemons or user space BPF applications coordinating internally about their attachments. Both from Cilium and Meta side [2], we've collected the following requirements for a generic attach/detach/query API for multi-progs which has been implemented as part of this work: - Support prog-based attach/detach and link API - Dependency directives (can also be combined): - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none} - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user space application does not need CAP_SYS_ADMIN to retrieve foreign fds via bpf_*_get_fd_by_id() - BPF_F_LINK flag as {prog,link} toggle - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and BPF_F_AFTER will just append for attaching - Enforced only at attach time - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their own infra for replacing their internal prog - If no flags are set, then it's default append behavior for attaching - Internal revision counter and optionally being able to pass expected_revision - User space application can query current state with revision, and pass it along for attachment to assert current state before doing updates - Query also gets extension for link_ids array and link_attach_flags: - prog_ids are always filled with program IDs - link_ids are filled with link IDs when link was used, otherwise 0 - {prog,link}_attach_flags for holding {prog,link}-specific flags - Must be easy to integrate/reuse for in-kernel users The uapi-side changes needed for supporting bpf_mprog are rather minimal, consisting of the additions of the attachment flags, revision counter, and expanding existing union with relative_{fd,id} member. The bpf_mprog framework consists of an bpf_mprog_entry object which holds an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path structure) is part of bpf_mprog_bundle. Both have been separated, so that fast-path gets efficient packing of bpf_prog pointers for maximum cache efficiency. Also, array has been chosen instead of linked list or other structures to remove unnecessary indirections for a fast point-to-entry in tc for BPF. The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of updates the peer bpf_mprog_entry is populated and then just swapped which avoids additional allocations that could otherwise fail, for example, in detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could be converted to dynamic allocation if necessary at a point in future. Locking is deferred to the in-kernel user of bpf_mprog, for example, in case of tcx which uses this API in the next patch, it piggybacks on rtnl. An extensive test suite for checking all aspects of this API for prog-based attach/detach and link API comes as BPF selftests in this series. Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog management. [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mprog.h | 318 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 318 insertions(+) create mode 100644 include/linux/bpf_mprog.h (limited to 'include/linux') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h new file mode 100644 index 000000000000..6feefec43422 --- /dev/null +++ b/include/linux/bpf_mprog.h @@ -0,0 +1,318 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2023 Isovalent */ +#ifndef __BPF_MPROG_H +#define __BPF_MPROG_H + +#include + +/* bpf_mprog framework: + * + * bpf_mprog is a generic layer for multi-program attachment. In-kernel users + * of the bpf_mprog don't need to care about the dependency resolution + * internals, they can just consume it with few API calls. Currently available + * dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of + * a BPF program or BPF link relative to an existing BPF program or BPF link + * inside the multi-program array as well as prepend and append behavior if + * no relative object was specified, see corresponding selftests for concrete + * examples (e.g. tc_links and tc_opts test cases of test_progs). + * + * Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code: + * + * Attach case: + * + * struct bpf_mprog_entry *entry, *entry_new; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_attach(entry, &entry_new, [...]); + * if (!ret) { + * if (entry != entry_new) { + * // swap @entry to @entry_new at attach location + * // ensure there are no inflight users of @entry: + * synchronize_rcu(); + * } + * bpf_mprog_commit(entry); + * } else { + * // error path, bail out, propagate @ret + * } + * // bpf_mprog user-side unlock + * + * Detach case: + * + * struct bpf_mprog_entry *entry, *entry_new; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_detach(entry, &entry_new, [...]); + * if (!ret) { + * // all (*) marked is optional and depends on the use-case + * // whether bpf_mprog_bundle should be freed or not + * if (!bpf_mprog_total(entry_new)) (*) + * entry_new = NULL (*) + * // swap @entry to @entry_new at attach location + * // ensure there are no inflight users of @entry: + * synchronize_rcu(); + * bpf_mprog_commit(entry); + * if (!entry_new) (*) + * // free bpf_mprog_bundle (*) + * } else { + * // error path, bail out, propagate @ret + * } + * // bpf_mprog user-side unlock + * + * Query case: + * + * struct bpf_mprog_entry *entry; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_query(attr, uattr, entry); + * // bpf_mprog user-side unlock + * + * Data/fast path: + * + * struct bpf_mprog_entry *entry; + * struct bpf_mprog_fp *fp; + * struct bpf_prog *prog; + * int ret = [...]; + * + * rcu_read_lock(); + * // fetch active @entry from attach location + * [...] + * bpf_mprog_foreach_prog(entry, fp, prog) { + * ret = bpf_prog_run(prog, [...]); + * // process @ret from program + * } + * [...] + * rcu_read_unlock(); + * + * bpf_mprog locking considerations: + * + * bpf_mprog_{attach,detach,query}() must be protected by an external lock + * (like RTNL in case of tcx). + * + * bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx + * the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets + * updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of + * the bpf_mprog_bundle. + * + * Fast path accesses the active bpf_mprog_entry within RCU critical section + * (in case of tcx it runs in NAPI which provides RCU protection there, + * other users might need explicit rcu_read_lock()). The bpf_mprog_commit() + * assumes that for the old bpf_mprog_entry there are no inflight users + * anymore. + * + * The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for + * the replacement case where we don't swap the bpf_mprog_entry. + */ + +#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \ + for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\ + ({ \ + t.prog = READ_ONCE(fp->prog); \ + t.link = cp->link; \ + t.prog; \ + }); \ + fp++, cp++) + +#define bpf_mprog_foreach_prog(entry, fp, p) \ + for (fp = &entry->fp_items[0]; \ + (p = READ_ONCE(fp->prog)); \ + fp++) + +#define BPF_MPROG_MAX 64 + +struct bpf_mprog_fp { + struct bpf_prog *prog; +}; + +struct bpf_mprog_cp { + struct bpf_link *link; +}; + +struct bpf_mprog_entry { + struct bpf_mprog_fp fp_items[BPF_MPROG_MAX]; + struct bpf_mprog_bundle *parent; +}; + +struct bpf_mprog_bundle { + struct bpf_mprog_entry a; + struct bpf_mprog_entry b; + struct bpf_mprog_cp cp_items[BPF_MPROG_MAX]; + struct bpf_prog *ref; + atomic64_t revision; + u32 count; +}; + +struct bpf_tuple { + struct bpf_prog *prog; + struct bpf_link *link; +}; + +static inline struct bpf_mprog_entry * +bpf_mprog_peer(const struct bpf_mprog_entry *entry) +{ + if (entry == &entry->parent->a) + return &entry->parent->b; + else + return &entry->parent->a; +} + +static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle) +{ + BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64)); + BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) != + ARRAY_SIZE(bundle->cp_items)); + + memset(bundle, 0, sizeof(*bundle)); + atomic64_set(&bundle->revision, 1); + bundle->a.parent = bundle; + bundle->b.parent = bundle; +} + +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry) +{ + entry->parent->count++; +} + +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry) +{ + entry->parent->count--; +} + +static inline int bpf_mprog_max(void) +{ + return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1; +} + +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry) +{ + int total = entry->parent->count; + + WARN_ON_ONCE(total > bpf_mprog_max()); + return total; +} + +static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry, + struct bpf_prog *prog) +{ + const struct bpf_mprog_fp *fp; + const struct bpf_prog *tmp; + + bpf_mprog_foreach_prog(entry, fp, tmp) { + if (tmp == prog) + return true; + } + return false; +} + +static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry, + struct bpf_tuple *tuple) +{ + WARN_ON_ONCE(entry->parent->ref); + if (!tuple->link) + entry->parent->ref = tuple->prog; +} + +static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry) +{ + /* In the non-link case prog deletions can only drop the reference + * to the prog after the bpf_mprog_entry got swapped and the + * bpf_mprog ensured that there are no inflight users anymore. + * + * Paired with bpf_mprog_mark_for_release(). + */ + if (entry->parent->ref) { + bpf_prog_put(entry->parent->ref); + entry->parent->ref = NULL; + } +} + +static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry) +{ + atomic64_inc(&entry->parent->revision); +} + +static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry) +{ + bpf_mprog_complete_release(entry); + bpf_mprog_revision_new(entry); +} + +static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry) +{ + return atomic64_read(&entry->parent->revision); +} + +static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, + struct bpf_mprog_entry *src) +{ + memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); +} + +static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) +{ + int total = bpf_mprog_total(entry); + + memmove(entry->fp_items + idx + 1, + entry->fp_items + idx, + (total - idx) * sizeof(struct bpf_mprog_fp)); + + memmove(entry->parent->cp_items + idx + 1, + entry->parent->cp_items + idx, + (total - idx) * sizeof(struct bpf_mprog_cp)); +} + +static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx) +{ + /* Total array size is needed in this case to enure the NULL + * entry is copied at the end. + */ + int total = ARRAY_SIZE(entry->fp_items); + + memmove(entry->fp_items + idx, + entry->fp_items + idx + 1, + (total - idx - 1) * sizeof(struct bpf_mprog_fp)); + + memmove(entry->parent->cp_items + idx, + entry->parent->cp_items + idx + 1, + (total - idx - 1) * sizeof(struct bpf_mprog_cp)); +} + +static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx, + struct bpf_mprog_fp **fp, + struct bpf_mprog_cp **cp) +{ + *fp = &entry->fp_items[idx]; + *cp = &entry->parent->cp_items[idx]; +} + +static inline void bpf_mprog_write(struct bpf_mprog_fp *fp, + struct bpf_mprog_cp *cp, + struct bpf_tuple *tuple) +{ + WRITE_ONCE(fp->prog, tuple->prog); + cp->link = tuple->link; +} + +int bpf_mprog_attach(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new, + struct bpf_prog *prog_new, struct bpf_link *link, + struct bpf_prog *prog_old, + u32 flags, u32 id_or_fd, u64 revision); + +int bpf_mprog_detach(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new, + struct bpf_prog *prog, struct bpf_link *link, + u32 flags, u32 id_or_fd, u64 revision); + +int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, + struct bpf_mprog_entry *entry); + +#endif /* __BPF_MPROG_H */ -- cgit v1.2.3 From e420bed025071a623d2720a92bc2245c84757ecb Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 19 Jul 2023 16:08:52 +0200 Subject: bpf: Add fd-based tcx multi-prog infra with link support This work refactors and adds a lightweight extension ("tcx") to the tc BPF ingress and egress data path side for allowing BPF program management based on fds via bpf() syscall through the newly added generic multi-prog API. The main goal behind this work which we also presented at LPC [0] last year and a recent update at LSF/MM/BPF this year [3] is to support long-awaited BPF link functionality for tc BPF programs, which allows for a model of safe ownership and program detachment. Given the rise in tc BPF users in cloud native environments, this becomes necessary to avoid hard to debug incidents either through stale leftover programs or 3rd party applications accidentally stepping on each others toes. As a recap, a BPF link represents the attachment of a BPF program to a BPF hook point. The BPF link holds a single reference to keep BPF program alive. Moreover, hook points do not reference a BPF link, only the application's fd or pinning does. A BPF link holds meta-data specific to attachment and implements operations for link creation, (atomic) BPF program update, detachment and introspection. The motivation for BPF links for tc BPF programs is multi-fold, for example: - From Meta: "It's especially important for applications that are deployed fleet-wide and that don't "control" hosts they are deployed to. If such application crashes and no one notices and does anything about that, BPF program will keep running draining resources or even just, say, dropping packets. We at FB had outages due to such permanent BPF attachment semantics. With fd-based BPF link we are getting a framework, which allows safe, auto-detachable behavior by default, unless application explicitly opts in by pinning the BPF link." [1] - From Cilium-side the tc BPF programs we attach to host-facing veth devices and phys devices build the core datapath for Kubernetes Pods, and they implement forwarding, load-balancing, policy, EDT-management, etc, within BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently experienced hard-to-debug issues in a user's staging environment where another Kubernetes application using tc BPF attached to the same prio/handle of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath it. The goal is to establish a clear/safe ownership model via links which cannot accidentally be overridden. [0,2] BPF links for tc can co-exist with non-link attachments, and the semantics are in line also with XDP links: BPF links cannot replace other BPF links, BPF links cannot replace non-BPF links, non-BPF links cannot replace BPF links and lastly only non-BPF links can replace non-BPF links. In case of Cilium, this would solve mentioned issue of safe ownership model as 3rd party applications would not be able to accidentally wipe Cilium programs, even if they are not BPF link aware. Earlier attempts [4] have tried to integrate BPF links into core tc machinery to solve cls_bpf, which has been intrusive to the generic tc kernel API with extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could be wiped from the qdisc also. Locking a tc BPF program in place this way, is getting into layering hacks given the two object models are vastly different. We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF attach API, so that the BPF link implementation blends in naturally similar to other link types which are fd-based and without the need for changing core tc internal APIs. BPF programs for tc can then be successively migrated from classic cls_bpf to the new tc BPF link without needing to change the program's source code, just the BPF loader mechanics for attaching is sufficient. For the current tc framework, there is no change in behavior with this change and neither does this change touch on tc core kernel APIs. The gist of this patch is that the ingress and egress hook have a lightweight, qdisc-less extension for BPF to attach its tc BPF programs, in other words, a minimal entry point for tc BPF. The name tcx has been suggested from discussion of earlier revisions of this work as a good fit, and to more easily differ between the classic cls_bpf attachment and the fd-based one. For the ingress and egress tcx points, the device holds a cache-friendly array with program pointers which is separated from control plane (slow-path) data. Earlier versions of this work used priority to determine ordering and expression of dependencies similar as with classic tc, but it was challenged that for something more future-proof a better user experience is required. Hence this resulted in the design and development of the generic attach/detach/query API for multi-progs. See prior patch with its discussion on the API design. tcx is the first user and later we plan to integrate also others, for example, one candidate is multi-prog support for XDP which would benefit and have the same 'look and feel' from API perspective. The goal with tcx is to have maximum compatibility to existing tc BPF programs, so they don't need to be rewritten specifically. Compatibility to call into classic tcf_classify() is also provided in order to allow successive migration or both to cleanly co-exist where needed given its all one logical tc layer and the tcx plus classic tc cls/act build one logical overall processing pipeline. tcx supports the simplified return codes TCX_NEXT which is non-terminating (go to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT. The fd-based API is behind a static key, so that when unused the code is also not entered. The struct tcx_entry's program array is currently static, but could be made dynamic if necessary at a point in future. The a/b pair swap design has been chosen so that for detachment there are no allocations which otherwise could fail. The work has been tested with tc-testing selftest suite which all passes, as well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB. Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews of this work. [0] https://lpc.events/event/16/contributions/1353/ [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com Signed-off-by: Daniel Borkmann Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mprog.h | 9 +++++++++ include/linux/netdevice.h | 15 ++++++--------- include/linux/skbuff.h | 4 ++-- 3 files changed, 17 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h index 6feefec43422..2b429488f840 100644 --- a/include/linux/bpf_mprog.h +++ b/include/linux/bpf_mprog.h @@ -315,4 +315,13 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry, int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, struct bpf_mprog_entry *entry); +static inline bool bpf_mprog_supported(enum bpf_prog_type type) +{ + switch (type) { + case BPF_PROG_TYPE_SCHED_CLS: + return true; + default: + return false; + } +} #endif /* __BPF_MPROG_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b12477ea4032..3800d0479698 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type { * * @rx_handler: handler for received packets * @rx_handler_data: XXX: need comments on this one - * @miniq_ingress: ingress/clsact qdisc specific data for - * ingress processing + * @tcx_ingress: BPF & clsact qdisc specific data for ingress processing * @ingress_queue: XXX: need comments on this one * @nf_hooks_ingress: netfilter hooks executed for ingress packets * @broadcast: hw bcast address @@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type { * @xps_maps: all CPUs/RXQs maps for XPS device * * @xps_maps: XXX: need comments on this one - * @miniq_egress: clsact qdisc specific data for - * egress processing + * @tcx_egress: BPF & clsact qdisc specific data for egress processing * @nf_hooks_egress: netfilter hooks executed for egress packets * @qdisc_hash: qdisc hash table * @watchdog_timeo: Represents the timeout that is used by @@ -2253,9 +2251,8 @@ struct net_device { unsigned int xdp_zc_max_segs; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; - -#ifdef CONFIG_NET_CLS_ACT - struct mini_Qdisc __rcu *miniq_ingress; +#ifdef CONFIG_NET_XGRESS + struct bpf_mprog_entry __rcu *tcx_ingress; #endif struct netdev_queue __rcu *ingress_queue; #ifdef CONFIG_NETFILTER_INGRESS @@ -2283,8 +2280,8 @@ struct net_device { #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX]; #endif -#ifdef CONFIG_NET_CLS_ACT - struct mini_Qdisc __rcu *miniq_egress; +#ifdef CONFIG_NET_XGRESS + struct bpf_mprog_entry __rcu *tcx_egress; #endif #ifdef CONFIG_NETFILTER_EGRESS struct nf_hook_entries __rcu *nf_hooks_egress; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 91ed66952580..ed83f1c5fc1f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -944,7 +944,7 @@ struct sk_buff { __u8 __mono_tc_offset[0]; /* public: */ __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ -#ifdef CONFIG_NET_CLS_ACT +#ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; #endif @@ -993,7 +993,7 @@ struct sk_buff { __u8 csum_not_inet:1; #endif -#ifdef CONFIG_NET_SCHED +#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) __u16 tc_index; /* traffic control index */ #endif -- cgit v1.2.3 From 6f5a630d7c57cd79b1f526a95e757311e32d41e5 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Tue, 18 Jul 2023 16:40:21 -0700 Subject: bpf, net: Introduce skb_pointer_if_linear(). Network drivers always call skb_header_pointer() with non-null buffer. Remove !buffer check to prevent accidental misuse of skb_header_pointer(). Introduce skb_pointer_if_linear() instead. Reported-by: Jakub Kicinski Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230718234021.43640-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/skbuff.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ed83f1c5fc1f..faaba050f843 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4023,7 +4023,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset; - if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) + if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL; return buffer; @@ -4036,6 +4036,14 @@ skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) skb_headlen(skb), buffer); } +static inline void * __must_check +skb_pointer_if_linear(const struct sk_buff *skb, int offset, int len) +{ + if (likely(skb_headlen(skb) - offset >= len)) + return skb->data + offset; + return NULL; +} + /** * skb_needs_linearize - check if we need to linearize a given skb * depending on the given device features. -- cgit v1.2.3 From 6716f4d39c17febf7aa4fa5f5923da67a8d10e85 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 16 May 2023 03:41:37 -0700 Subject: rcu: Update synchronize_rcu_mult() comment for call_rcu_hurry() Those who have worked with RCU for some time will naturally think in terms of the long-standing call_rcu() API rather than the much newer call_rcu_hurry() API. But it is call_rcu_hurry() that you should normally pass to synchronize_rcu_mult(). This commit therefore updates the header comment to point this out. Signed-off-by: Paul E. McKenney Reviewed-by: Joel Fernandes (Google) --- include/linux/rcupdate_wait.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h index 699b938358bf..5e0f74f2f8ca 100644 --- a/include/linux/rcupdate_wait.h +++ b/include/linux/rcupdate_wait.h @@ -42,6 +42,11 @@ do { \ * call_srcu() function, with this wrapper supplying the pointer to the * corresponding srcu_struct. * + * Note that call_rcu_hurry() should be used instead of call_rcu() + * because in kernels built with CONFIG_RCU_LAZY=y the delay between the + * invocation of call_rcu() and that of the corresponding RCU callback + * can be multiple seconds. + * * The first argument tells Tiny RCU's _wait_rcu_gp() not to * bother waiting for RCU. The reason for this is because anywhere * synchronize_rcu_mult() can be called is automatically already a full -- cgit v1.2.3 From d40befed9a581740f6ceb0e5998ec2f59bfbc559 Mon Sep 17 00:00:00 2001 From: Kamel Bouhara Date: Tue, 20 Jun 2023 08:26:57 +0200 Subject: power: reset: at91-reset: add sysfs interface to the power on reason Introduce a list of generic reset sources and use them to export the power on reason through sysfs. Update the ABI documentation to describe this new interface. Signed-off-by: Kamel Bouhara Acked-by: Nicolas Ferre [Miquel Raynal: Follow-up on Kamel's work, 4 years later] Signed-off-by: Miquel Raynal Signed-off-by: Sebastian Reichel --- include/linux/power/power_on_reason.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 include/linux/power/power_on_reason.h (limited to 'include/linux') diff --git a/include/linux/power/power_on_reason.h b/include/linux/power/power_on_reason.h new file mode 100644 index 000000000000..95a1ec0c403c --- /dev/null +++ b/include/linux/power/power_on_reason.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Author: Kamel Bouhra + */ + +#ifndef POWER_ON_REASON_H +#define POWER_ON_REASON_H + +#define POWER_ON_REASON_REGULAR "regular power-up" +#define POWER_ON_REASON_RTC "RTC wakeup" +#define POWER_ON_REASON_WATCHDOG "watchdog timeout" +#define POWER_ON_REASON_SOFTWARE "software reset" +#define POWER_ON_REASON_RST_BTN "reset button action" +#define POWER_ON_REASON_CPU_CLK_FAIL "CPU clock failure" +#define POWER_ON_REASON_XTAL_FAIL "crystal oscillator failure" +#define POWER_ON_REASON_BROWN_OUT "brown-out reset" +#define POWER_ON_REASON_UNKNOWN "unknown reason" + +#endif /* POWER_ON_REASON_H */ -- cgit v1.2.3 From b4f78ff746ec5274fffa92fa2a4dc531360b5016 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Fri, 14 Jul 2023 22:56:14 +0200 Subject: pwm: Use a consistent name for pwm_chip pointers in the core MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Most variables of type struct pwm_chip * are named "chip", there are only three outliers called "pc". Change these three to "chip", too, for consistency. Signed-off-by: Uwe Kleine-König Signed-off-by: Thierry Reding --- include/linux/pwm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 04ae1d9073a7..d2f9f690a9c1 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -298,7 +298,7 @@ struct pwm_chip { int base; unsigned int npwm; - struct pwm_device * (*of_xlate)(struct pwm_chip *pc, + struct pwm_device * (*of_xlate)(struct pwm_chip *chip, const struct of_phandle_args *args); unsigned int of_pwm_n_cells; @@ -395,9 +395,9 @@ struct pwm_device *pwm_request_from_chip(struct pwm_chip *chip, unsigned int index, const char *label); -struct pwm_device *of_pwm_xlate_with_flags(struct pwm_chip *pc, +struct pwm_device *of_pwm_xlate_with_flags(struct pwm_chip *chip, const struct of_phandle_args *args); -struct pwm_device *of_pwm_single_xlate(struct pwm_chip *pc, +struct pwm_device *of_pwm_single_xlate(struct pwm_chip *chip, const struct of_phandle_args *args); struct pwm_device *pwm_get(struct device *dev, const char *con_id); -- cgit v1.2.3 From c04cf9e14f109ebcc425c1efd2c01294c52a4d62 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 23 Jun 2023 08:49:55 -0500 Subject: crypto: ccp - Add support for fetching a nonce for dynamic boost control Dynamic Boost Control is a feature offered on AMD client platforms that allows software to request and set power or frequency limits. Only software that has authenticated with the PSP can retrieve or set these limits. Create a character device and ioctl for fetching the nonce. This ioctl supports optionally passing authentication information which will influence how many calls the nonce is valid for. Acked-by: Tom Lendacky Signed-off-by: Mario Limonciello Signed-off-by: Herbert Xu --- include/linux/psp-platform-access.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/psp-platform-access.h b/include/linux/psp-platform-access.h index 75da8f5f7ad8..53b4a1df5180 100644 --- a/include/linux/psp-platform-access.h +++ b/include/linux/psp-platform-access.h @@ -8,6 +8,7 @@ enum psp_platform_access_msg { PSP_CMD_NONE = 0x0, PSP_I2C_REQ_BUS_CMD = 0x64, + PSP_DYNAMIC_BOOST_GET_NONCE, }; struct psp_req_buffer_hdr { -- cgit v1.2.3 From d9408716d2126439fbc46f6c40e72792069b8411 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 23 Jun 2023 08:49:56 -0500 Subject: crypto: ccp - Add support for setting user ID for dynamic boost control As part of the authentication flow for Dynamic Boost Control, the calling software will need to send a uid used in all of its future communications. Add support for another IOCTL call to let userspace software set this up. Acked-by: Tom Lendacky Signed-off-by: Mario Limonciello Signed-off-by: Herbert Xu --- include/linux/psp-platform-access.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/psp-platform-access.h b/include/linux/psp-platform-access.h index 53b4a1df5180..18b9e0f0cb03 100644 --- a/include/linux/psp-platform-access.h +++ b/include/linux/psp-platform-access.h @@ -9,6 +9,7 @@ enum psp_platform_access_msg { PSP_CMD_NONE = 0x0, PSP_I2C_REQ_BUS_CMD = 0x64, PSP_DYNAMIC_BOOST_GET_NONCE, + PSP_DYNAMIC_BOOST_SET_UID, }; struct psp_req_buffer_hdr { -- cgit v1.2.3 From e2cfe05e9277b5a7abbbc186fec1ad37348dd956 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 23 Jun 2023 08:49:57 -0500 Subject: crypto: ccp - Add support for getting and setting DBC parameters After software has authenticated a dynamic boost control request, it can fetch and set supported parameters using a selection of messages. Add support for these messages and export the ability to do this to userspace. Acked-by: Tom Lendacky Signed-off-by: Mario Limonciello Signed-off-by: Herbert Xu --- include/linux/psp-platform-access.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/psp-platform-access.h b/include/linux/psp-platform-access.h index 18b9e0f0cb03..c1dc87fc536b 100644 --- a/include/linux/psp-platform-access.h +++ b/include/linux/psp-platform-access.h @@ -10,6 +10,8 @@ enum psp_platform_access_msg { PSP_I2C_REQ_BUS_CMD = 0x64, PSP_DYNAMIC_BOOST_GET_NONCE, PSP_DYNAMIC_BOOST_SET_UID, + PSP_DYNAMIC_BOOST_GET_PARAMETER, + PSP_DYNAMIC_BOOST_SET_PARAMETER, }; struct psp_req_buffer_hdr { -- cgit v1.2.3 From eba2e4c2faef618b6e33dfdba918c76727f891b5 Mon Sep 17 00:00:00 2001 From: Stefan Eichenberger Date: Wed, 19 Jul 2023 08:42:56 +0200 Subject: net: phy: c45: add a separate function to read BASE-T1 abilities Add a separate function to read the BASE-T1 abilities. Some PHYs do not indicate the availability of the extended BASE-T1 ability register, so this function must be called separately. Signed-off-by: Stefan Eichenberger Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 11c1e91563d4..b254848a9c99 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1826,6 +1826,7 @@ int genphy_c45_an_config_aneg(struct phy_device *phydev); int genphy_c45_an_disable_aneg(struct phy_device *phydev); int genphy_c45_read_mdix(struct phy_device *phydev); int genphy_c45_pma_read_abilities(struct phy_device *phydev); +int genphy_c45_pma_baset1_read_abilities(struct phy_device *phydev); int genphy_c45_read_eee_abilities(struct phy_device *phydev); int genphy_c45_pma_baset1_read_master_slave(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); -- cgit v1.2.3 From 00f11ac71708d2e5e434aa2ef9249f95b5e7e313 Mon Sep 17 00:00:00 2001 From: Stefan Eichenberger Date: Wed, 19 Jul 2023 08:42:58 +0200 Subject: net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2110 PHY Add a driver for the Marvell 88Q2110. This driver allows to detect the link, switch between 100BASE-T1 and 1000BASE-T1 and switch between master and slave mode. Autonegotiation supported by the PHY does not yet work. Signed-off-by: Stefan Eichenberger Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/linux/marvell_phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h index 0f06c2287b52..9b54c4f0677f 100644 --- a/include/linux/marvell_phy.h +++ b/include/linux/marvell_phy.h @@ -25,6 +25,7 @@ #define MARVELL_PHY_ID_88X3310 0x002b09a0 #define MARVELL_PHY_ID_88E2110 0x002b09b0 #define MARVELL_PHY_ID_88X2222 0x01410f10 +#define MARVELL_PHY_ID_88Q2110 0x002b0980 /* Marvel 88E1111 in Finisar SFP module with modified PHY ID */ #define MARVELL_PHY_ID_88E1111_FINISAR 0x01ff0cc0 -- cgit v1.2.3 From 9e70a5e109a4a23367810de09be826c52d27ee2f Mon Sep 17 00:00:00 2001 From: John Ogness Date: Mon, 17 Jul 2023 21:52:06 +0206 Subject: printk: Add per-console suspended state Currently the global @console_suspended is used to determine if consoles are in a suspended state. Its primary purpose is to allow usage of the console_lock when suspended without causing console printing. It is synchronized by the console_lock. Rather than relying on the console_lock to determine suspended state, make it an official per-console state that is set within console->flags. This allows the state to be queried via SRCU. Remove @console_suspended. Console printing will still be avoided when suspended because console_is_usable() returns false when the new suspended flag is set for that console. Signed-off-by: John Ogness Reviewed-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20230717194607.145135-7-john.ogness@linutronix.de --- include/linux/console.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index d3195664baa5..7de11c763eb3 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -154,6 +154,8 @@ static inline int con_debug_leave(void) * receiving the printk spam for obvious reasons. * @CON_EXTENDED: The console supports the extended output format of * /dev/kmesg which requires a larger output buffer. + * @CON_SUSPENDED: Indicates if a console is suspended. If true, the + * printing callbacks must not be called. */ enum cons_flags { CON_PRINTBUFFER = BIT(0), @@ -163,6 +165,7 @@ enum cons_flags { CON_ANYTIME = BIT(4), CON_BRL = BIT(5), CON_EXTENDED = BIT(6), + CON_SUSPENDED = BIT(7), }; /** -- cgit v1.2.3 From d99ff463ecf651437e9e4abe68f331dfb6b5bd9d Mon Sep 17 00:00:00 2001 From: Jean-Baptiste Maneyrol Date: Tue, 6 Jun 2023 16:21:45 +0000 Subject: iio: move inv_icm42600 timestamp module in common Create new inv_sensors common modules and move inv_icm42600 timestamp module inside. This module will be used by IMUs and also in the future by other chips. Modify inv_icm42600 driver to use timestamp module and do some headers cleanup. Signed-off-by: Jean-Baptiste Maneyrol Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230606162147.79667-3-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron --- include/linux/iio/common/inv_icm42600_timestamp.h | 79 +++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 include/linux/iio/common/inv_icm42600_timestamp.h (limited to 'include/linux') diff --git a/include/linux/iio/common/inv_icm42600_timestamp.h b/include/linux/iio/common/inv_icm42600_timestamp.h new file mode 100644 index 000000000000..b808a6da15e5 --- /dev/null +++ b/include/linux/iio/common/inv_icm42600_timestamp.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2020 Invensense, Inc. + */ + +#ifndef INV_ICM42600_TIMESTAMP_H_ +#define INV_ICM42600_TIMESTAMP_H_ + +/** + * struct inv_icm42600_timestamp_interval - timestamps interval + * @lo: interval lower bound + * @up: interval upper bound + */ +struct inv_icm42600_timestamp_interval { + int64_t lo; + int64_t up; +}; + +/** + * struct inv_icm42600_timestamp_acc - accumulator for computing an estimation + * @val: current estimation of the value, the mean of all values + * @idx: current index of the next free place in values table + * @values: table of all measured values, use for computing the mean + */ +struct inv_icm42600_timestamp_acc { + uint32_t val; + size_t idx; + uint32_t values[32]; +}; + +/** + * struct inv_icm42600_timestamp - timestamp management states + * @it: interrupts interval timestamps + * @timestamp: store last timestamp for computing next data timestamp + * @mult: current internal period multiplier + * @new_mult: new set internal period multiplier (not yet effective) + * @period: measured current period of the sensor + * @chip_period: accumulator for computing internal chip period + */ +struct inv_icm42600_timestamp { + struct inv_icm42600_timestamp_interval it; + int64_t timestamp; + uint32_t mult; + uint32_t new_mult; + uint32_t period; + struct inv_icm42600_timestamp_acc chip_period; +}; + +void inv_icm42600_timestamp_init(struct inv_icm42600_timestamp *ts, + uint32_t period); + +int inv_icm42600_timestamp_update_odr(struct inv_icm42600_timestamp *ts, + uint32_t period, bool fifo); + +void inv_icm42600_timestamp_interrupt(struct inv_icm42600_timestamp *ts, + uint32_t fifo_period, size_t fifo_nb, + size_t sensor_nb, int64_t timestamp); + +static inline int64_t +inv_icm42600_timestamp_pop(struct inv_icm42600_timestamp *ts) +{ + ts->timestamp += ts->period; + return ts->timestamp; +} + +void inv_icm42600_timestamp_apply_odr(struct inv_icm42600_timestamp *ts, + uint32_t fifo_period, size_t fifo_nb, + unsigned int fifo_no); + +static inline void +inv_icm42600_timestamp_reset(struct inv_icm42600_timestamp *ts) +{ + const struct inv_icm42600_timestamp_interval interval_init = {0LL, 0LL}; + + ts->it = interval_init; + ts->timestamp = 0; +} + +#endif -- cgit v1.2.3 From 0ecc363ccea71eda6a2cceade120489259bcdb33 Mon Sep 17 00:00:00 2001 From: Jean-Baptiste Maneyrol Date: Tue, 6 Jun 2023 16:21:46 +0000 Subject: iio: make invensense timestamp module generic Rename common module to inv_sensors_timestamp, add configuration at init (chip internal clock, acceptable jitter, ...) and update inv_icm42600 driver integration. Signed-off-by: Jean-Baptiste Maneyrol Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230606162147.79667-4-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron --- include/linux/iio/common/inv_icm42600_timestamp.h | 79 ------------------- include/linux/iio/common/inv_sensors_timestamp.h | 95 +++++++++++++++++++++++ 2 files changed, 95 insertions(+), 79 deletions(-) delete mode 100644 include/linux/iio/common/inv_icm42600_timestamp.h create mode 100644 include/linux/iio/common/inv_sensors_timestamp.h (limited to 'include/linux') diff --git a/include/linux/iio/common/inv_icm42600_timestamp.h b/include/linux/iio/common/inv_icm42600_timestamp.h deleted file mode 100644 index b808a6da15e5..000000000000 --- a/include/linux/iio/common/inv_icm42600_timestamp.h +++ /dev/null @@ -1,79 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Copyright (C) 2020 Invensense, Inc. - */ - -#ifndef INV_ICM42600_TIMESTAMP_H_ -#define INV_ICM42600_TIMESTAMP_H_ - -/** - * struct inv_icm42600_timestamp_interval - timestamps interval - * @lo: interval lower bound - * @up: interval upper bound - */ -struct inv_icm42600_timestamp_interval { - int64_t lo; - int64_t up; -}; - -/** - * struct inv_icm42600_timestamp_acc - accumulator for computing an estimation - * @val: current estimation of the value, the mean of all values - * @idx: current index of the next free place in values table - * @values: table of all measured values, use for computing the mean - */ -struct inv_icm42600_timestamp_acc { - uint32_t val; - size_t idx; - uint32_t values[32]; -}; - -/** - * struct inv_icm42600_timestamp - timestamp management states - * @it: interrupts interval timestamps - * @timestamp: store last timestamp for computing next data timestamp - * @mult: current internal period multiplier - * @new_mult: new set internal period multiplier (not yet effective) - * @period: measured current period of the sensor - * @chip_period: accumulator for computing internal chip period - */ -struct inv_icm42600_timestamp { - struct inv_icm42600_timestamp_interval it; - int64_t timestamp; - uint32_t mult; - uint32_t new_mult; - uint32_t period; - struct inv_icm42600_timestamp_acc chip_period; -}; - -void inv_icm42600_timestamp_init(struct inv_icm42600_timestamp *ts, - uint32_t period); - -int inv_icm42600_timestamp_update_odr(struct inv_icm42600_timestamp *ts, - uint32_t period, bool fifo); - -void inv_icm42600_timestamp_interrupt(struct inv_icm42600_timestamp *ts, - uint32_t fifo_period, size_t fifo_nb, - size_t sensor_nb, int64_t timestamp); - -static inline int64_t -inv_icm42600_timestamp_pop(struct inv_icm42600_timestamp *ts) -{ - ts->timestamp += ts->period; - return ts->timestamp; -} - -void inv_icm42600_timestamp_apply_odr(struct inv_icm42600_timestamp *ts, - uint32_t fifo_period, size_t fifo_nb, - unsigned int fifo_no); - -static inline void -inv_icm42600_timestamp_reset(struct inv_icm42600_timestamp *ts) -{ - const struct inv_icm42600_timestamp_interval interval_init = {0LL, 0LL}; - - ts->it = interval_init; - ts->timestamp = 0; -} - -#endif diff --git a/include/linux/iio/common/inv_sensors_timestamp.h b/include/linux/iio/common/inv_sensors_timestamp.h new file mode 100644 index 000000000000..a47d304d1ba7 --- /dev/null +++ b/include/linux/iio/common/inv_sensors_timestamp.h @@ -0,0 +1,95 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2020 Invensense, Inc. + */ + +#ifndef INV_SENSORS_TIMESTAMP_H_ +#define INV_SENSORS_TIMESTAMP_H_ + +/** + * struct inv_sensors_timestamp_chip - chip internal properties + * @clock_period: internal clock period in ns + * @jitter: acceptable jitter in per-mille + * @init_period: chip initial period at reset in ns + */ +struct inv_sensors_timestamp_chip { + uint32_t clock_period; + uint32_t jitter; + uint32_t init_period; +}; + +/** + * struct inv_sensors_timestamp_interval - timestamps interval + * @lo: interval lower bound + * @up: interval upper bound + */ +struct inv_sensors_timestamp_interval { + int64_t lo; + int64_t up; +}; + +/** + * struct inv_sensors_timestamp_acc - accumulator for computing an estimation + * @val: current estimation of the value, the mean of all values + * @idx: current index of the next free place in values table + * @values: table of all measured values, use for computing the mean + */ +struct inv_sensors_timestamp_acc { + uint32_t val; + size_t idx; + uint32_t values[32]; +}; + +/** + * struct inv_sensors_timestamp - timestamp management states + * @chip: chip internal characteristics + * @min_period: minimal acceptable clock period + * @max_period: maximal acceptable clock period + * @it: interrupts interval timestamps + * @timestamp: store last timestamp for computing next data timestamp + * @mult: current internal period multiplier + * @new_mult: new set internal period multiplier (not yet effective) + * @period: measured current period of the sensor + * @chip_period: accumulator for computing internal chip period + */ +struct inv_sensors_timestamp { + struct inv_sensors_timestamp_chip chip; + uint32_t min_period; + uint32_t max_period; + struct inv_sensors_timestamp_interval it; + int64_t timestamp; + uint32_t mult; + uint32_t new_mult; + uint32_t period; + struct inv_sensors_timestamp_acc chip_period; +}; + +void inv_sensors_timestamp_init(struct inv_sensors_timestamp *ts, + const struct inv_sensors_timestamp_chip *chip); + +int inv_sensors_timestamp_update_odr(struct inv_sensors_timestamp *ts, + uint32_t period, bool fifo); + +void inv_sensors_timestamp_interrupt(struct inv_sensors_timestamp *ts, + uint32_t fifo_period, size_t fifo_nb, + size_t sensor_nb, int64_t timestamp); + +static inline int64_t inv_sensors_timestamp_pop(struct inv_sensors_timestamp *ts) +{ + ts->timestamp += ts->period; + return ts->timestamp; +} + +void inv_sensors_timestamp_apply_odr(struct inv_sensors_timestamp *ts, + uint32_t fifo_period, size_t fifo_nb, + unsigned int fifo_no); + +static inline void inv_sensors_timestamp_reset(struct inv_sensors_timestamp *ts) +{ + const struct inv_sensors_timestamp_interval interval_init = {0LL, 0LL}; + + ts->it = interval_init; + ts->timestamp = 0; +} + +#endif -- cgit v1.2.3 From 3641c90c4e369c8d0af5483e879174400a152cf8 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Thu, 20 Jul 2023 17:55:12 +0800 Subject: blk-mq: delete dead struct blk_mq_hw_ctx->queued field This counter is not used anywhere, so delete it. Signed-off-by: Chengming Zhou Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20230720095512.1403123-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index b96e00499f9e..495ca198775f 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -397,8 +397,6 @@ struct blk_mq_hw_ctx { */ struct blk_mq_tags *sched_tags; - /** @queued: Number of queued requests. */ - unsigned long queued; /** @run: Number of dispatched requests. */ unsigned long run; -- cgit v1.2.3 From 70f360dd7042cb843635ece9d28335a4addff9eb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:57 +0000 Subject: tcp: annotate data-races around fastopenq.max_qlen This field can be read locklessly. Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index b4c08ac86983..91a37c99ba66 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -513,7 +513,7 @@ static inline void fastopen_queue_tune(struct sock *sk, int backlog) struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue; int somaxconn = READ_ONCE(sock_net(sk)->core.sysctl_somaxconn); - queue->fastopenq.max_qlen = min_t(unsigned int, backlog, somaxconn); + WRITE_ONCE(queue->fastopenq.max_qlen, min_t(unsigned int, backlog, somaxconn)); } static inline void tcp_move_syn(struct tcp_sock *tp, -- cgit v1.2.3 From dc52cd2eff4ac924a795efcef27f8fd58a5260bb Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Thu, 20 Jul 2023 08:22:41 -0400 Subject: fs: dlm: fix F_CANCELLK to cancel pending request This patch fixes the current handling of F_CANCELLK by not just doing a unlock as we need to try to cancel a lock at first. A unlock makes sense on a non-blocking lock request but if it's a blocking lock request we need to cancel the request until it's not granted yet. This patch is fixing this behaviour by first try to cancel a lock request and if it's failed it's unlocking the lock which seems to be granted. Note: currently the nfs locking handling was disabled by commit 40595cdc93ed ("nfs: block notification on fs with its own ->lock"). However DLM was never being updated regarding to this change. Future patches will try to fix lockd lock requests for DLM. This patch is currently assuming the upstream DLM lockd handling is correct. Signed-off-by: Alexander Aring Signed-off-by: David Teigland --- include/linux/dlm_plock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dlm_plock.h b/include/linux/dlm_plock.h index e6d76e8715a6..15fc856d198c 100644 --- a/include/linux/dlm_plock.h +++ b/include/linux/dlm_plock.h @@ -11,6 +11,8 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, int cmd, struct file_lock *fl); int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file, struct file_lock *fl); +int dlm_posix_cancel(dlm_lockspace_t *lockspace, u64 number, struct file *file, + struct file_lock *fl); int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file, struct file_lock *fl); #endif -- cgit v1.2.3 From 754833b3194c30dad5af0145e25192a8e29521ab Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 21 Jul 2023 13:34:54 +0530 Subject: OPP: Rearrange entries in pm_opp.h Rearrange the helper function declarations / definitions to keep them in order of freq, level and then bw. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index dc1fb5890792..3821f50b9b89 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -121,17 +121,19 @@ unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev); struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, unsigned long freq, bool available); + struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq); +struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, + unsigned long *freq); + struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, unsigned int level); + struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, unsigned int *level); -struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, - unsigned long *freq); - struct dev_pm_opp *dev_pm_opp_find_bw_ceil(struct device *dev, unsigned int *bw, int index); @@ -247,32 +249,32 @@ static inline unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev) return 0; } -static inline struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, - unsigned int level) +static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, + unsigned long freq, bool available) { return ERR_PTR(-EOPNOTSUPP); } -static inline struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, - unsigned int *level) +static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, + unsigned long *freq) { return ERR_PTR(-EOPNOTSUPP); } -static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, - unsigned long freq, bool available) +static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, + unsigned long *freq) { return ERR_PTR(-EOPNOTSUPP); } -static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, - unsigned long *freq) +static inline struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, + unsigned int level) { return ERR_PTR(-EOPNOTSUPP); } -static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, - unsigned long *freq) +static inline struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, + unsigned int *level) { return ERR_PTR(-EOPNOTSUPP); } -- cgit v1.2.3 From 142e17c1c2b48e3fb4f024e62ab6dee18f268694 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Thu, 20 Jul 2023 11:10:52 +0530 Subject: OPP: Introduce dev_pm_opp_find_freq_{ceil/floor}_indexed() APIs In the case of devices with multiple clocks, drivers need to specify the clock index for the OPP framework to find the OPP corresponding to the floor/ceil of the supplied frequency. So let's introduce the two new APIs accepting the clock index as an argument. These APIs use the exising _find_key_ceil() helper by supplying the clock index to it. Signed-off-by: Manivannan Sadhasivam [ Viresh: Rearranged definitions in pm_opp.h ] Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 3821f50b9b89..2617f2c51f29 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -125,9 +125,15 @@ struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq); +struct dev_pm_opp *dev_pm_opp_find_freq_floor_indexed(struct device *dev, + unsigned long *freq, u32 index); + struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, unsigned long *freq); +struct dev_pm_opp *dev_pm_opp_find_freq_ceil_indexed(struct device *dev, + unsigned long *freq, u32 index); + struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, unsigned int level); @@ -261,12 +267,24 @@ static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, return ERR_PTR(-EOPNOTSUPP); } +static inline struct dev_pm_opp * +dev_pm_opp_find_freq_floor_indexed(struct device *dev, unsigned long *freq, u32 index) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, unsigned long *freq) { return ERR_PTR(-EOPNOTSUPP); } +static inline struct dev_pm_opp * +dev_pm_opp_find_freq_ceil_indexed(struct device *dev, unsigned long *freq, u32 index) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static inline struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, unsigned int level) { -- cgit v1.2.3 From ca22eca6e2ad7eaed1c791628ef7cb4c739e3da6 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 21 Jul 2023 20:28:40 +0800 Subject: cred: remove unsued extern declaration change_create_files_as() Since commit 3a3b7ce93369 ("CRED: Allow kernel services to override LSM settings for task actions") this is never used, so can be removed. Signed-off-by: YueHaibing Fixes: 3a3b7ce93369 ("CRED: Allow kernel services to override LSM settings for task actions") [PM: subject tweak, fixes tag] Signed-off-by: Paul Moore --- include/linux/cred.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cred.h b/include/linux/cred.h index 9ed9232af934..f923528d5cc4 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -164,7 +164,6 @@ extern void abort_creds(struct cred *); extern const struct cred *override_creds(const struct cred *); extern void revert_creds(const struct cred *); extern struct cred *prepare_kernel_cred(struct task_struct *); -extern int change_create_files_as(struct cred *, struct inode *); extern int set_security_override(struct cred *, u32); extern int set_security_override_from_ctx(struct cred *, const char *); extern int set_create_files_as(struct cred *, struct inode *); -- cgit v1.2.3 From 4b3ee3ff2dd66b45dd5ec374a95af06eb26d35ac Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Fri, 14 Jul 2023 19:41:36 +0800 Subject: crypto: hisilicon/qm - stop function and write data to memory When the system is shut down, the process is killed, but the accelerator device does not stop executing the tasks. If the accelerator device still accesses the memory and writes back data to the memory after the memory is reclaimed by the system, an NFE error may occur. Therefore, before the system is shut down, the driver needs to stop the device and write data back to the memory. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index a7d54d4d41fd..39fbfb4be944 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -104,7 +104,7 @@ enum qm_stop_reason { QM_NORMAL, QM_SOFT_RESET, - QM_FLR, + QM_DOWN, }; enum qm_state { -- cgit v1.2.3 From a3377386b56420d78a4c0a931a40f9a25c3ca2bd Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:16 -0700 Subject: netlink: Reverse the patch which removed filtering To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink filtering - commit ID for that patch: 549017aa1bb7 (netlink: remove netlink_broadcast_filtered). Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netlink.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 9eec3f4f5351..3a6563681b50 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -227,6 +227,11 @@ bool netlink_strict_get_check(struct sk_buff *skb); int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 portid, int nonblock); int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 portid, __u32 group, gfp_t allocation); +int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, + __u32 portid, __u32 group, gfp_t allocation, + int (*filter)(struct sock *dsk, + struct sk_buff *skb, void *data), + void *filter_data); int netlink_set_err(struct sock *ssk, __u32 portid, __u32 group, int code); int netlink_register_notifier(struct notifier_block *nb); int netlink_unregister_notifier(struct notifier_block *nb); -- cgit v1.2.3 From a4c9a56e6a2cdeeab7caef1f496b7bfefd95b50e Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:17 -0700 Subject: netlink: Add new netlink_release function A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be supplied by the protocol via the release function in netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR protocol, so it can free it's data when socket is deleted. Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 3a6563681b50..75d7de34c908 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -50,6 +50,7 @@ struct netlink_kernel_cfg { struct mutex *cb_mutex; int (*bind)(struct net *net, int group); void (*unbind)(struct net *net, int group); + void (*release) (struct sock *sk, unsigned long *groups); }; struct sock *__netlink_kernel_create(struct net *net, int unit, -- cgit v1.2.3 From 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:18 -0700 Subject: connector/cn_proc: Add filtering to fix some bugs The current proc connector code has the foll. bugs - if there are more than one listeners for the proc connector messages, and one of them deregisters for listening using PROC_CN_MCAST_IGNORE, they will still get all proc connector messages, as long as there is another listener. Another issue is if one client calls PROC_CN_MCAST_LISTEN, and another one calls PROC_CN_MCAST_IGNORE, then both will end up not getting any messages. This patch adds filtering and drops packet if client has sent PROC_CN_MCAST_IGNORE. This data is stored in the client socket's sk_user_data. In addition, we only increment or decrement proc_event_num_listeners once per client. This fixes the above issues. cn_release is the release function added for NETLINK_CONNECTOR. It uses the newly added netlink_release function added to netlink_sock. It will free sk_user_data. Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Signed-off-by: David S. Miller --- include/linux/connector.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/connector.h b/include/linux/connector.h index 487350bb19c3..cec2d99ae902 100644 --- a/include/linux/connector.h +++ b/include/linux/connector.h @@ -90,13 +90,19 @@ void cn_del_callback(const struct cb_id *id); * If @group is not zero, then message will be delivered * to the specified group. * @gfp_mask: GFP mask. + * @filter: Filter function to be used at netlink layer. + * @filter_data:Filter data to be supplied to the filter function * * It can be safely called from softirq context, but may silently * fail under strong memory pressure. * * If there are no listeners for given group %-ESRCH can be returned. */ -int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 group, gfp_t gfp_mask); +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, + u32 group, gfp_t gfp_mask, + int (*filter)(struct sock *dsk, struct sk_buff *skb, + void *data), + void *filter_data); /** * cn_netlink_send - Sends message to the specified groups. -- cgit v1.2.3 From 1671bcfd76fdc0b9e65153cf759153083755fe4c Mon Sep 17 00:00:00 2001 From: Patrick Rohr Date: Wed, 19 Jul 2023 07:52:13 -0700 Subject: net: add sysctl accept_ra_min_rtr_lft MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr Cc: Maciej Żenczykowski Cc: Lorenzo Colitti Signed-off-by: David S. Miller --- include/linux/ipv6.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 839247a4f48e..ed3d110c2eb5 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -33,6 +33,7 @@ struct ipv6_devconf { __s32 accept_ra_defrtr; __u32 ra_defrtr_metric; __s32 accept_ra_min_hop_limit; + __s32 accept_ra_min_rtr_lft; __s32 accept_ra_pinfo; __s32 ignore_routes_with_linkdown; #ifdef CONFIG_IPV6_ROUTER_PREF -- cgit v1.2.3 From 5e1cd3e97e8673d7e3d61e3060cbae83c6e65757 Mon Sep 17 00:00:00 2001 From: Waqar Hameed Date: Wed, 19 Jul 2023 09:51:17 +0200 Subject: iio: Add event enums for running period and count There are devices (such as Murata IRS-D200 PIR proximity sensor) that check the data signal with a running period. I.e. for a specified time, they count the number of conditions that have occurred, and then signal if that is more than a specified amount. `IIO_EV_INFO_PERIOD` resets when the condition no longer is true and is therefore not suitable for these devices. Add a new `iio_event_info` `IIO_EV_INFO_RUNNING_PERIOD` that can be used as a running period. Also add a new `IIO_EV_INFO_RUNNING_COUNT` that can be used to specify the number of conditions that must occur during this running period. Signed-off-by: Waqar Hameed Link: https://lore.kernel.org/r/ee4a801ae9b9c4716c7bd23d8f79f232351df8bd.1689753076.git.waqar.hameed@axis.com Signed-off-by: Jonathan Cameron --- include/linux/iio/types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/types.h b/include/linux/iio/types.h index 82faa98c719a..117bde7d6ad7 100644 --- a/include/linux/iio/types.h +++ b/include/linux/iio/types.h @@ -19,6 +19,8 @@ enum iio_event_info { IIO_EV_INFO_TIMEOUT, IIO_EV_INFO_RESET_TIMEOUT, IIO_EV_INFO_TAP2_MIN_DELAY, + IIO_EV_INFO_RUNNING_PERIOD, + IIO_EV_INFO_RUNNING_COUNT, }; #define IIO_VAL_INT 1 -- cgit v1.2.3 From 52e4e2878236823ac3ba4d6033d81e8c389745fd Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 20 Jul 2023 22:37:12 +0800 Subject: extcon: Remove unused inline functions commit 830ae442202e ("extcon: Remove the deprecated extcon functions") left behind this. Signed-off-by: YueHaibing Signed-off-by: Chanwoo Choi --- include/linux/extcon.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/extcon.h b/include/linux/extcon.h index 3c45c3846fe9..e596a0abcb27 100644 --- a/include/linux/extcon.h +++ b/include/linux/extcon.h @@ -328,16 +328,4 @@ struct extcon_specific_cable_nb { struct extcon_dev *edev; unsigned long previous_value; }; - -static inline int extcon_register_interest(struct extcon_specific_cable_nb *obj, - const char *extcon_name, const char *cable_name, - struct notifier_block *nb) -{ - return -EINVAL; -} - -static inline int extcon_unregister_interest(struct extcon_specific_cable_nb *obj) -{ - return -EINVAL; -} #endif /* __LINUX_EXTCON_H__ */ -- cgit v1.2.3 From 80ddce5f2dbd0e83eadc9f9d373439180d599fe5 Mon Sep 17 00:00:00 2001 From: Ahmad Fatoum Date: Sat, 8 Jul 2023 13:27:19 +0200 Subject: thermal: core: constify params in thermal_zone_device_register Since commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure"), thermal_zone_device_register() allocates a copy of the tzp argument and callers need not explicitly manage its lifetime. This means the function no longer cares about the parameter being mutable, so constify it. No functional change. Signed-off-by: Ahmad Fatoum Acked-by: Daniel Lezcano Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 87837094d549..dee66ade89a0 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -301,14 +301,14 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp); #ifdef CONFIG_THERMAL struct thermal_zone_device *thermal_zone_device_register(const char *, int, int, void *, struct thermal_zone_device_ops *, - struct thermal_zone_params *, int, int); + const struct thermal_zone_params *, int, int); void thermal_zone_device_unregister(struct thermal_zone_device *); struct thermal_zone_device * thermal_zone_device_register_with_trips(const char *, struct thermal_trip *, int, int, void *, struct thermal_zone_device_ops *, - struct thermal_zone_params *, int, int); + const struct thermal_zone_params *, int, int); void *thermal_zone_device_priv(struct thermal_zone_device *tzd); const char *thermal_zone_device_type(struct thermal_zone_device *tzd); @@ -348,7 +348,7 @@ void thermal_zone_device_critical(struct thermal_zone_device *tz); static inline struct thermal_zone_device *thermal_zone_device_register( const char *type, int trips, int mask, void *devdata, struct thermal_zone_device_ops *ops, - struct thermal_zone_params *tzp, + const struct thermal_zone_params *tzp, int passive_delay, int polling_delay) { return ERR_PTR(-ENODEV); } static inline void thermal_zone_device_unregister( -- cgit v1.2.3 From e7b915219baa4b2bf30afe1dbaca7a24ff627ad2 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 13 Jul 2023 16:57:40 +0200 Subject: PM: sleep: wakeirq: drop unused enable helpers Drop the wake-irq enable and disable helpers which have not been used since commit bed570307ed7 ("PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend"). Note that these functions are essentially just leftovers from the first iteration of the wake-irq implementation where device drivers were supposed to call these functions themselves instead of PM core (as is also indicated by the bogus kernel doc comments). Signed-off-by: Johan Hovold Reviewed-by: Tony Lindgren Signed-off-by: Rafael J. Wysocki --- include/linux/pm_wakeirq.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_wakeirq.h b/include/linux/pm_wakeirq.h index dd42d16945d0..d9642c6cf852 100644 --- a/include/linux/pm_wakeirq.h +++ b/include/linux/pm_wakeirq.h @@ -10,8 +10,6 @@ extern int dev_pm_set_wake_irq(struct device *dev, int irq); extern int dev_pm_set_dedicated_wake_irq(struct device *dev, int irq); extern int dev_pm_set_dedicated_wake_irq_reverse(struct device *dev, int irq); extern void dev_pm_clear_wake_irq(struct device *dev); -extern void dev_pm_enable_wake_irq(struct device *dev); -extern void dev_pm_disable_wake_irq(struct device *dev); #else /* !CONFIG_PM */ @@ -34,13 +32,5 @@ static inline void dev_pm_clear_wake_irq(struct device *dev) { } -static inline void dev_pm_enable_wake_irq(struct device *dev) -{ -} - -static inline void dev_pm_disable_wake_irq(struct device *dev) -{ -} - #endif /* CONFIG_PM */ #endif /* _LINUX_PM_WAKEIRQ_H */ -- cgit v1.2.3 From 5f756d03e2c7db63c1df7148d7b1739f29ff1532 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Thu, 20 Jul 2023 11:10:53 +0530 Subject: OPP: Introduce dev_pm_opp_get_freq_indexed() API In the case of devices with multiple clocks, drivers need to specify the frequency index for the OPP framework to get the specific frequency within the required OPP. So let's introduce the dev_pm_opp_get_freq_indexed() API accepting the frequency index as an argument. Signed-off-by: Manivannan Sadhasivam [ Viresh: Fixed potential access to NULL opp pointer ] Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 2617f2c51f29..a13a1705df57 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -105,6 +105,8 @@ unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp); unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp); +unsigned long dev_pm_opp_get_freq_indexed(struct dev_pm_opp *opp, u32 index); + unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp); unsigned int dev_pm_opp_get_required_pstate(struct dev_pm_opp *opp, @@ -213,6 +215,11 @@ static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp) return 0; } +static inline unsigned long dev_pm_opp_get_freq_indexed(struct dev_pm_opp *opp, u32 index) +{ + return 0; +} + static inline unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp) { return 0; -- cgit v1.2.3 From a5893928bb179d67ca1d44a8f66c990480ba541d Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 21 Jul 2023 13:41:20 +0530 Subject: OPP: Add dev_pm_opp_find_freq_exact_indexed() The indexed version of the API is added for other floor and ceil, add the same for exact as well for completeness. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index a13a1705df57..23e4e4eaaa42 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -124,6 +124,10 @@ struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, unsigned long freq, bool available); +struct dev_pm_opp * +dev_pm_opp_find_freq_exact_indexed(struct device *dev, unsigned long freq, + u32 index, bool available); + struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq); @@ -268,6 +272,13 @@ static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, return ERR_PTR(-EOPNOTSUPP); } +static inline struct dev_pm_opp * +dev_pm_opp_find_freq_exact_indexed(struct device *dev, unsigned long freq, + u32 index, bool available) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq) { -- cgit v1.2.3 From 746de8255076c9924ffa51baad9822adddccb94e Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 21 Jul 2023 14:55:07 +0530 Subject: OPP: Reuse dev_pm_opp_get_freq_indexed() Reuse dev_pm_opp_get_freq_indexed() from dev_pm_opp_get_freq(). Signed-off-by: Viresh Kumar Acked-by: Manivannan Sadhasivam --- include/linux/pm_opp.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 23e4e4eaaa42..91f87d7e807c 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -103,8 +103,6 @@ int dev_pm_opp_get_supplies(struct dev_pm_opp *opp, struct dev_pm_opp_supply *su unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp); -unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp); - unsigned long dev_pm_opp_get_freq_indexed(struct dev_pm_opp *opp, u32 index); unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp); @@ -214,11 +212,6 @@ static inline unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp) return 0; } -static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp) -{ - return 0; -} - static inline unsigned long dev_pm_opp_get_freq_indexed(struct dev_pm_opp *opp, u32 index) { return 0; @@ -669,4 +662,9 @@ static inline void dev_pm_opp_put_prop_name(int token) dev_pm_opp_clear_config(token); } +static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp) +{ + return dev_pm_opp_get_freq_indexed(opp, 0); +} + #endif /* __LINUX_OPP_H__ */ -- cgit v1.2.3 From e359147f01606e17d74df96cfeb0027f540f5e97 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 5 Jul 2023 15:01:49 -0400 Subject: linux: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Message-Id: <20230705190309.579783-82-jlayton@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs_stack.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h index 54210a42c30d..010d39d0dc1c 100644 --- a/include/linux/fs_stack.h +++ b/include/linux/fs_stack.h @@ -24,7 +24,7 @@ static inline void fsstack_copy_attr_times(struct inode *dest, { dest->i_atime = src->i_atime; dest->i_mtime = src->i_mtime; - dest->i_ctime = src->i_ctime; + inode_set_ctime_to_ts(dest, inode_get_ctime(src)); } #endif /* _LINUX_FS_STACK_H */ -- cgit v1.2.3 From 13bc24457850583a2e7203ded05b7209ab4bc5ef Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 5 Jul 2023 14:58:12 -0400 Subject: fs: rename i_ctime field to __i_ctime Now that everything in-tree is converted to use the accessor functions, rename the i_ctime field in the inode to discourage direct access. Signed-off-by: Jeff Layton Reviewed-by: Damien Le Moal Reviewed-by: Jan Kara Message-Id: <20230705185812.579118-4-jlayton@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 42755cb7d55b..61f27011fd04 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -642,7 +642,7 @@ struct inode { loff_t i_size; struct timespec64 i_atime; struct timespec64 i_mtime; - struct timespec64 i_ctime; + struct timespec64 __i_ctime; /* use inode_*_ctime accessors! */ spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; u8 i_blkbits; @@ -1485,7 +1485,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode); */ static inline struct timespec64 inode_get_ctime(const struct inode *inode) { - return inode->i_ctime; + return inode->__i_ctime; } /** @@ -1498,7 +1498,7 @@ static inline struct timespec64 inode_get_ctime(const struct inode *inode) static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts) { - inode->i_ctime = ts; + inode->__i_ctime = ts; return ts; } -- cgit v1.2.3 From f5f80e32de12fad2813d37270e8364a03e6d3ef0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 20 Jul 2023 11:09:01 +0000 Subject: ipv6: remove hard coded limitation on ipv6_pinfo IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit f5d547676ca0 ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet Cc: Chao Wu Cc: Wei Wang Cc: Coco Li Cc: YiFei Zhu Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/ipv6.h | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ed3d110c2eb5..0295b47c10a3 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -200,14 +200,7 @@ struct inet6_cork { u8 tclass; }; -/** - * struct ipv6_pinfo - ipv6 private area - * - * In the struct sock hierarchy (tcp6_sock, upd6_sock, etc) - * this _must_ be the last member, so that inet6_sk_generic - * is able to calculate its offset from the base struct sock - * by using the struct proto->slab_obj_size member. -acme - */ +/* struct ipv6_pinfo - ipv6 private area */ struct ipv6_pinfo { struct in6_addr saddr; struct in6_pktinfo sticky_pktinfo; @@ -307,19 +300,19 @@ struct raw6_sock { __u32 offset; /* checksum offset */ struct icmp6_filter filter; __u32 ip6mr_table; - /* ipv6_pinfo has to be the last member of raw6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; struct udp6_sock { struct udp_sock udp; - /* ipv6_pinfo has to be the last member of udp6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; struct tcp6_sock { struct tcp_sock tcp; - /* ipv6_pinfo has to be the last member of tcp6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; -- cgit v1.2.3 From 481461f5109919babbb393d6f68002936b8e2493 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 16 Jul 2023 19:15:54 +0900 Subject: linux/export.h: make independent of CONFIG_MODULES Currently, all files with EXPORT_SYMBOL() are rebuilt when CONFIG_MODULES is flipped due to depending on CONFIG_MODULES. Now that modpost can make a final decision about export symbols, does not need to make EXPORT_SYMBOL() no-op. Instead, modpost can skip emitting KSYMTAB when CONFIG_MODULES is unset. This commit will reduce the number of recompilation when CONFIG_MODULES is toggled. Signed-off-by: Masahiro Yamada --- include/linux/export.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/export.h b/include/linux/export.h index beed8387e0a4..9911508a9604 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -50,7 +50,7 @@ extern struct module __this_module; __EXPORT_SYMBOL_REF(sym) ASM_NL \ .previous -#if !defined(CONFIG_MODULES) || defined(__DISABLE_EXPORTS) +#if defined(__DISABLE_EXPORTS) /* * Allow symbol exports to be disabled completely so that C code may @@ -75,7 +75,7 @@ extern struct module __this_module; __ADDRESSABLE(sym) \ asm(__stringify(___EXPORT_SYMBOL(sym, license, ns))) -#endif /* CONFIG_MODULES */ +#endif #ifdef DEFAULT_SYMBOL_NAMESPACE #define _EXPORT_SYMBOL(sym, license) __EXPORT_SYMBOL(sym, license, __stringify(DEFAULT_SYMBOL_NAMESPACE)) -- cgit v1.2.3 From ff09f6fd297293175eaa0ed492495e36b3eb1a8e Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 21 Jul 2023 08:01:48 -0700 Subject: modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols Trying to restrict the '$'-prefix change to RISC-V caused some fallout, so let's just treat all those symbols as special. Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too") Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/ Signed-off-by: Palmer Dabbelt Reviewed-by: Masahiro Yamada Signed-off-by: Luis Chamberlain --- include/linux/module_symbol.h | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/module_symbol.h b/include/linux/module_symbol.h index 5b799942b243..1269543d0634 100644 --- a/include/linux/module_symbol.h +++ b/include/linux/module_symbol.h @@ -3,25 +3,13 @@ #define _LINUX_MODULE_SYMBOL_H /* This ignores the intensely annoying "mapping symbols" found in ELF files. */ -static inline int is_mapping_symbol(const char *str, int is_riscv) +static inline int is_mapping_symbol(const char *str) { if (str[0] == '.' && str[1] == 'L') return true; if (str[0] == 'L' && str[1] == '0') return true; - /* - * RISC-V defines various special symbols that start with "$".  The - * mapping symbols, which exist to differentiate between incompatible - * instruction encodings when disassembling, show up all over the place - * and are generally not meant to be treated like other symbols.  So - * just ignore any of the special symbols. - */ - if (is_riscv) - return str[0] == '$'; - - return str[0] == '$' && - (str[1] == 'a' || str[1] == 'd' || str[1] == 't' || str[1] == 'x') - && (str[2] == '\0' || str[2] == '.'); + return str[0] == '$'; } #endif /* _LINUX_MODULE_SYMBOL_H */ -- cgit v1.2.3 From 1b0306981e0f1e5a9ad3d08b9c88bfe21b68e3b9 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 9 Jul 2023 18:17:33 -0400 Subject: iov_iter: Add copy_folio_from_iter_atomic() Add a folio wrapper around copy_page_from_iter_atomic(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong --- include/linux/uio.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index ff81e5ccaef2..42bce38a8e87 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -163,7 +163,7 @@ static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs) return ret; } -size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, +size_t copy_page_from_iter_atomic(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); @@ -184,6 +184,13 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, { return copy_page_to_iter(&folio->page, offset, bytes, i); } + +static inline size_t copy_folio_from_iter_atomic(struct folio *folio, + size_t offset, size_t bytes, struct iov_iter *i) +{ + return copy_page_from_iter_atomic(&folio->page, offset, bytes, i); +} + size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i); -- cgit v1.2.3 From ffc143db63eeea7c8a27deb3c56d090a220a1ace Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 26 May 2023 16:43:23 -0400 Subject: filemap: Add fgf_t typedef Similarly to gfp_t, define fgf_t as its own type to prevent various misuses and confusion. Leave the flags as FGP_* for now to reduce the size of this patch; they will be converted to FGF_* later. Move the documentation to the definition of the type insted of burying it in the __filemap_get_folio() documentation. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Reviewed-by: Kent Overstreet --- include/linux/pagemap.h | 48 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 716953ee1ebd..911201fc41fc 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -501,22 +501,48 @@ pgoff_t page_cache_next_miss(struct address_space *mapping, pgoff_t page_cache_prev_miss(struct address_space *mapping, pgoff_t index, unsigned long max_scan); -#define FGP_ACCESSED 0x00000001 -#define FGP_LOCK 0x00000002 -#define FGP_CREAT 0x00000004 -#define FGP_WRITE 0x00000008 -#define FGP_NOFS 0x00000010 -#define FGP_NOWAIT 0x00000020 -#define FGP_FOR_MMAP 0x00000040 -#define FGP_STABLE 0x00000080 +/** + * typedef fgf_t - Flags for getting folios from the page cache. + * + * Most users of the page cache will not need to use these flags; + * there are convenience functions such as filemap_get_folio() and + * filemap_lock_folio(). For users which need more control over exactly + * what is done with the folios, these flags to __filemap_get_folio() + * are available. + * + * * %FGP_ACCESSED - The folio will be marked accessed. + * * %FGP_LOCK - The folio is returned locked. + * * %FGP_CREAT - If no folio is present then a new folio is allocated, + * added to the page cache and the VM's LRU list. The folio is + * returned locked. + * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the + * folio is already in cache. If the folio was allocated, unlock it + * before returning so the caller can do the same dance. + * * %FGP_WRITE - The folio will be written to by the caller. + * * %FGP_NOFS - __GFP_FS will get cleared in gfp. + * * %FGP_NOWAIT - Don't block on the folio lock. + * * %FGP_STABLE - Wait for the folio to be stable (finished writeback) + * * %FGP_WRITEBEGIN - The flags to use in a filesystem write_begin() + * implementation. + */ +typedef unsigned int __bitwise fgf_t; + +#define FGP_ACCESSED ((__force fgf_t)0x00000001) +#define FGP_LOCK ((__force fgf_t)0x00000002) +#define FGP_CREAT ((__force fgf_t)0x00000004) +#define FGP_WRITE ((__force fgf_t)0x00000008) +#define FGP_NOFS ((__force fgf_t)0x00000010) +#define FGP_NOWAIT ((__force fgf_t)0x00000020) +#define FGP_FOR_MMAP ((__force fgf_t)0x00000040) +#define FGP_STABLE ((__force fgf_t)0x00000080) #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgf_t fgp_flags, gfp_t gfp); struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, - int fgp_flags, gfp_t gfp); + fgf_t fgp_flags, gfp_t gfp); /** * filemap_get_folio - Find and get a folio. @@ -590,7 +616,7 @@ static inline struct page *find_get_page(struct address_space *mapping, } static inline struct page *find_get_page_flags(struct address_space *mapping, - pgoff_t offset, int fgp_flags) + pgoff_t offset, fgf_t fgp_flags) { return pagecache_get_page(mapping, offset, fgp_flags, 0); } -- cgit v1.2.3 From 4f66170119107f1452d2438ba4606e105e9e3afe Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 19 May 2023 16:10:37 -0400 Subject: filemap: Allow __filemap_get_folio to allocate large folios Allow callers of __filemap_get_folio() to specify a preferred folio order in the FGP flags. This is only honoured in the FGP_CREATE path; if there is already a folio in the page cache that covers the index, we will return it, no matter what its order is. No create-around is attempted; we will only create folios which start at the specified index. Unmodified callers will continue to allocate order 0 folios. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/pagemap.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 911201fc41fc..d87840acbfb2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -470,6 +470,19 @@ static inline void *detach_page_private(struct page *page) return folio_detach_private(page_folio(page)); } +/* + * There are some parts of the kernel which assume that PMD entries + * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, + * limit the maximum allocation order to PMD size. I'm not aware of any + * assumptions about maximum order if THP are disabled, but 8 seems like + * a good order (that's 1MB if you're using 4kB pages) + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER +#else +#define MAX_PAGECACHE_ORDER 8 +#endif + #ifdef CONFIG_NUMA struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order); #else @@ -535,9 +548,30 @@ typedef unsigned int __bitwise fgf_t; #define FGP_NOWAIT ((__force fgf_t)0x00000020) #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) #define FGP_STABLE ((__force fgf_t)0x00000080) +#define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) +/** + * fgf_set_order - Encode a length in the fgf_t flags. + * @size: The suggested size of the folio to create. + * + * The caller of __filemap_get_folio() can use this to suggest a preferred + * size for the folio that is created. If there is already a folio at + * the index, it will be returned, no matter what its size. If a folio + * is freshly created, it may be of a different size than requested + * due to alignment constraints, memory pressure, or the presence of + * other folios at nearby indices. + */ +static inline fgf_t fgf_set_order(size_t size) +{ + unsigned int shift = ilog2(size); + + if (shift <= PAGE_SHIFT) + return 0; + return (__force fgf_t)((shift - PAGE_SHIFT) << 26); +} + void *filemap_get_entry(struct address_space *mapping, pgoff_t index); struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, fgf_t fgp_flags, gfp_t gfp); -- cgit v1.2.3 From d6bb59a9444d218dc60dee3b45fd93c0d4f5e123 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 19 May 2023 16:18:05 -0400 Subject: iomap: Create large folios in the buffered write path Use the size of the write as a hint for the size of the folio to create. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/iomap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..80facb9c9e5b 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -261,7 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode, int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, -- cgit v1.2.3 From a097627dcaddb90f55c1780da348bf9b56f6b4bd Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Fri, 21 Jul 2023 16:58:08 +0200 Subject: net: add missing net_device::xdp_zc_max_segs description Cited commit under 'Fixes' tag introduced new member to struct net_device without providing description of it - fix it. Reported-by: Stephen Rothwell Closes: https://lore.kernel.org/all/20230720141613.61488b9e@canb.auug.org.au/ Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Maciej Fijalkowski Reviewed-by: Simon Horman Tested-by: Simon Horman # build-tested Link: https://lore.kernel.org/r/20230721145808.596298-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3800d0479698..11652e464f5d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2043,6 +2043,8 @@ enum netdev_ml_priv_type { * receive offload (GRO) * @gro_ipv4_max_size: Maximum size of aggregated packet in generic * receive offload (GRO), for IPv4. + * @xdp_zc_max_segs: Maximum number of segments supported by AF_XDP + * zero copy driver * * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. * @linkwatch_dev_tracker: refcount tracker used by linkwatch. -- cgit v1.2.3 From 4ce02c67972211be488408c275c8fbf19faf29b3 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Mon, 10 Jul 2023 14:12:43 -0700 Subject: iomap: Add per-block dirty state tracking to improve performance When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when the folio is uptodate in pagecache, then even a byte write can cause an entire folio to be written to disk during writeback. This happens because we currently don't have a mechanism to track per-block dirty state within struct iomap_folio_state. We currently only track uptodate state. This patch implements support for tracking per-block dirty state in iomap_folio_state->state bitmap. This should help improve the filesystem write performance and help reduce write amplification. Performance testing of below fio workload reveals ~16x performance improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. 1. [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1 [fio-run] 2. Also our internal performance team reported that this patch improves their database workload performance by around ~83% (with XFS on Power) Reported-by: Aravinda Herle Reported-by: Brian Foster Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong --- include/linux/iomap.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 80facb9c9e5b..fdc6e64f49d6 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -264,6 +264,7 @@ bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, -- cgit v1.2.3 From c1ed39ec116272935528ca9b348b8ee79b0791da Mon Sep 17 00:00:00 2001 From: Winston Wen Date: Mon, 24 Jul 2023 10:10:56 +0800 Subject: fs/nls: make load_nls() take a const parameter load_nls() take a char * parameter, use it to find nls module in list or construct the module name to load it. This change make load_nls() take a const parameter, so we don't need do some cast like this: ses->local_nls = load_nls((char *)ctx->local_nls->charset); Suggested-by: Stephen Rothwell Signed-off-by: Winston Wen Reviewed-by: Paulo Alcantara Reviewed-by: Christian Brauner Signed-off-by: Steve French --- include/linux/nls.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nls.h b/include/linux/nls.h index 499e486b3722..e0bf8367b274 100644 --- a/include/linux/nls.h +++ b/include/linux/nls.h @@ -47,7 +47,7 @@ enum utf16_endian { /* nls_base.c */ extern int __register_nls(struct nls_table *, struct module *); extern int unregister_nls(struct nls_table *); -extern struct nls_table *load_nls(char *); +extern struct nls_table *load_nls(const char *charset); extern void unload_nls(struct nls_table *); extern struct nls_table *load_nls_default(void); #define register_nls(nls) __register_nls((nls), THIS_MODULE) -- cgit v1.2.3 From 269cb04b601dd8c35bbee180a9800335b93111fb Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Fri, 14 Jul 2023 16:14:07 +0800 Subject: regulator: Use bitfield values for range selectors Right now the regulator helpers expect raw register values for the range selectors. This is different from the voltage selectors, which are normalized as bitfield values. This leads to a bit of confusion. Also, raw values are harder to copy from datasheets or match up with them, as datasheets will typically have bitfield values. Make the helpers expect bitfield values, and convert existing users. The field in regulator_desc is renamed to |linear_range_selectors_bitfield|. This is intended to cause drivers added in the same merge window and out-of-tree drivers using the incorrect variable and values to break, preventing incorrect values being used on actual hardware and potentially producing magic smoke. Also include bitops.h explicitly for ffs(), and reorder the header include statements. While at it, also replace module.h with export.h, since the only use is EXPORT_SYMBOL_GPL. Signed-off-by: Chen-Yu Tsai Link: https://lore.kernel.org/r/20230714081408.274567-1-wenst@chromium.org Signed-off-by: Mark Brown --- include/linux/regulator/driver.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h index c6ef7d68eb9a..4b7eceb3828b 100644 --- a/include/linux/regulator/driver.h +++ b/include/linux/regulator/driver.h @@ -292,11 +292,12 @@ enum regulator_type { * @ramp_delay: Time to settle down after voltage change (unit: uV/us) * @min_dropout_uV: The minimum dropout voltage this regulator can handle * @linear_ranges: A constant table of possible voltage ranges. - * @linear_range_selectors: A constant table of voltage range selectors. - * If pickable ranges are used each range must - * have corresponding selector here. + * @linear_range_selectors_bitfield: A constant table of voltage range + * selectors as bitfield values. If + * pickable ranges are used each range + * must have corresponding selector here. * @n_linear_ranges: Number of entries in the @linear_ranges (and in - * linear_range_selectors if used) table(s). + * linear_range_selectors_bitfield if used) table(s). * @volt_table: Voltage mapping table (if table based mapping) * @curr_table: Current limit mapping table (if table based mapping) * @@ -384,7 +385,7 @@ struct regulator_desc { int min_dropout_uV; const struct linear_range *linear_ranges; - const unsigned int *linear_range_selectors; + const unsigned int *linear_range_selectors_bitfield; int n_linear_ranges; -- cgit v1.2.3 From 4d72c3bb60dd9d5ea180f157bac72b4458112282 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 22 Jul 2023 21:32:59 +0100 Subject: net: phylink: strip out pre-March 2020 legacy code Strip out all the pre-March 2020 legacy code from phylink now that the last user of it is gone. Reviewed-by: Daniel Golle Tested-by: Daniel Golle Tested-by: Frank Wunderlich Signed-off-by: Russell King (Oracle) Signed-off-by: Paolo Abeni --- include/linux/phylink.h | 45 ++++++--------------------------------------- 1 file changed, 6 insertions(+), 39 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 9e861c8316d0..789c516c6b4a 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -201,8 +201,6 @@ enum phylink_op_type { * struct phylink_config - PHYLINK configuration structure * @dev: a pointer to a struct device associated with the MAC * @type: operation type of PHYLINK instance - * @legacy_pre_march2020: driver has not been updated for March 2020 updates - * (See commit 7cceb599d15d ("net: phylink: avoid mac_config calls") * @poll_fixed_state: if true, starts link_poll, * if MAC link is at %MLO_AN_FIXED mode. * @mac_managed_pm: if true, indicate the MAC driver is responsible for PHY PM. @@ -216,7 +214,6 @@ enum phylink_op_type { struct phylink_config { struct device *dev; enum phylink_op_type type; - bool legacy_pre_march2020; bool poll_fixed_state; bool mac_managed_pm; bool ovr_an_inband; @@ -230,7 +227,6 @@ struct phylink_config { * struct phylink_mac_ops - MAC operations structure. * @validate: Validate and update the link configuration. * @mac_select_pcs: Select a PCS for the interface mode. - * @mac_pcs_get_state: Read the current link state from the hardware. * @mac_prepare: prepare for a major reconfiguration of the interface. * @mac_config: configure the MAC for the selected mode and state. * @mac_finish: finish a major reconfiguration of the interface. @@ -245,8 +241,6 @@ struct phylink_mac_ops { struct phylink_link_state *state); struct phylink_pcs *(*mac_select_pcs)(struct phylink_config *config, phy_interface_t interface); - void (*mac_pcs_get_state)(struct phylink_config *config, - struct phylink_link_state *state); int (*mac_prepare)(struct phylink_config *config, unsigned int mode, phy_interface_t iface); void (*mac_config)(struct phylink_config *config, unsigned int mode, @@ -312,25 +306,6 @@ void validate(struct phylink_config *config, unsigned long *supported, struct phylink_pcs *mac_select_pcs(struct phylink_config *config, phy_interface_t interface); -/** - * mac_pcs_get_state() - Read the current inband link state from the hardware - * @config: a pointer to a &struct phylink_config. - * @state: a pointer to a &struct phylink_link_state. - * - * Read the current inband link state from the MAC PCS, reporting the - * current speed in @state->speed, duplex mode in @state->duplex, pause - * mode in @state->pause using the %MLO_PAUSE_RX and %MLO_PAUSE_TX bits, - * negotiation completion state in @state->an_complete, and link up state - * in @state->link. If possible, @state->lp_advertising should also be - * populated. - * - * Note: This is a legacy method. This function will not be called unless - * legacy_pre_march2020 is set in &struct phylink_config and there is no - * PCS attached. - */ -void mac_pcs_get_state(struct phylink_config *config, - struct phylink_link_state *state); - /** * mac_prepare() - prepare to change the PHY interface mode * @config: a pointer to a &struct phylink_config. @@ -367,17 +342,9 @@ int mac_prepare(struct phylink_config *config, unsigned int mode, * guaranteed to be correct, and so any mac_config() implementation must * never reference these fields. * - * Note: For legacy March 2020 drivers (drivers with legacy_pre_march2020 set - * in their &phylnk_config and which don't have a PCS), this function will be - * called on each link up event, and to also change the in-band advert. For - * non-legacy drivers, it will only be called to reconfigure the MAC for a - * "major" change in e.g. interface mode. It will not be called for changes - * in speed, duplex or pause modes or to change the in-band advertisement. - * In any case, it is strongly preferred that speed, duplex and pause settings - * are handled in the mac_link_up() method and not in this method. - * - * (this requires a rewrite - please refer to mac_link_up() for situations - * where the PCS and MAC are not tightly integrated.) + * This will only be called to reconfigure the MAC for a "major" change in + * e.g. interface mode. It will not be called for changes in speed, duplex + * or pause modes or to change the in-band advertisement. * * In all negotiation modes, as defined by @mode, @state->pause indicates the * pause settings which should be applied as follows. If %MLO_PAUSE_AN is not @@ -409,7 +376,7 @@ int mac_prepare(struct phylink_config *config, unsigned int mode, * 1000base-X or Cisco SGMII mode depending on the @state->interface * mode). In both cases, link state management (whether the link * is up or not) is performed by the MAC, and reported via the - * mac_pcs_get_state() callback. Changes in link state must be made + * pcs_get_state() callback. Changes in link state must be made * by calling phylink_mac_change(). * * Interface mode specific details are mentioned below. @@ -601,8 +568,8 @@ void pcs_disable(struct phylink_pcs *pcs); * in @state->link. If possible, @state->lp_advertising should also be * populated. * - * When present, this overrides mac_pcs_get_state() in &struct - * phylink_mac_ops. + * When present, this overrides pcs_get_state() in &struct + * phylink_pcs_ops. */ void pcs_get_state(struct phylink_pcs *pcs, struct phylink_link_state *state); -- cgit v1.2.3 From 57266281271add0132bea0b83ef0d6f32704c402 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 19 Jul 2023 12:26:53 +0300 Subject: net/mlx5: Add relevant capabilities bits to support NAT-T Provide an ability to check if flow steering supports UDP encapsulation and decapsulation of IPsec ESP packets. Signed-off-by: Leon Romanovsky Signed-off-by: Paolo Abeni --- include/linux/mlx5/mlx5_ifc.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 33344a71c3e3..b3ad6b9852ec 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -464,10 +464,10 @@ struct mlx5_ifc_flow_table_prop_layout_bits { u8 reformat_add_esp_trasport[0x1]; u8 reformat_l2_to_l3_esp_tunnel[0x1]; - u8 reserved_at_42[0x1]; + u8 reformat_add_esp_transport_over_udp[0x1]; u8 reformat_del_esp_trasport[0x1]; u8 reformat_l3_esp_tunnel_to_l2[0x1]; - u8 reserved_at_45[0x1]; + u8 reformat_del_esp_transport_over_udp[0x1]; u8 execute_aso[0x1]; u8 reserved_at_47[0x19]; @@ -6665,9 +6665,12 @@ enum mlx5_reformat_ctx_type { MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4, MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV4 = 0x5, MLX5_REFORMAT_TYPE_L2_TO_L3_ESP_TUNNEL = 0x6, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV4 = 0x7, MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT = 0x8, MLX5_REFORMAT_TYPE_L3_ESP_TUNNEL_TO_L2 = 0x9, + MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 0xa, MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 = 0xb, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 = 0xc, MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf, MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10, MLX5_REFORMAT_TYPE_ADD_MACSEC = 0x11, -- cgit v1.2.3 From 86b0a96c2952fa07b782b37f6ec783ace63a01a6 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 03:55:36 -0700 Subject: iommufd: Add iommufd_ctx_has_group() This adds the helper to check if any device within the given iommu_group has been bound with the iommufd_ctx. This is helpful for the checking on device ownership for the devices which have not been bound but cannot be bound to any other iommufd_ctx as the iommu_group has been bound. Reviewed-by: Jason Gunthorpe Tested-by: Yanting Jiang Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718105542.4138-5-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/iommufd.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 1129a36a74c4..f241bafa03da 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -16,6 +16,7 @@ struct page; struct iommufd_ctx; struct iommufd_access; struct file; +struct iommu_group; struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, struct device *dev, u32 *id); @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx); #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); void iommufd_ctx_put(struct iommufd_ctx *ictx); +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group); int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova, unsigned long length, struct page **out_pages, -- cgit v1.2.3 From 78d3df457ae5eb53ef1f295a8a704691abea1b1d Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 03:55:37 -0700 Subject: iommufd: Add helper to retrieve iommufd_ctx and devid This is needed by the vfio-pci driver to report affected devices in the hot-reset for a given device. Reviewed-by: Jason Gunthorpe Tested-by: Yanting Jiang Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718105542.4138-6-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/iommufd.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index f241bafa03da..68defed9ea48 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -25,6 +25,9 @@ void iommufd_device_unbind(struct iommufd_device *idev); int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev); +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev); +u32 iommufd_device_to_id(struct iommufd_device *idev); + struct iommufd_access_ops { u8 needs_pin_pages : 1; void (*unmap)(void *data, unsigned long iova, unsigned long length); -- cgit v1.2.3 From af949759bad27934b6f242e8a3b1c394e09fb4a3 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 03:55:38 -0700 Subject: vfio: Mark cdev usage in vfio_device This can be used to differentiate whether to report group_id or devid in the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no cdev path yet, so the vfio_device_cdev_opened() helper always returns false. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Yanting Jiang Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718105542.4138-7-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2c137ea94a3e..2a45853773a6 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id); ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL) #endif +static inline bool vfio_device_cdev_opened(struct vfio_device *device) +{ + return false; +} + /** * struct vfio_migration_ops - VFIO bus device driver migration callbacks * -- cgit v1.2.3 From a80e1de93275fbfba0617e6bbea8522ea5329eb5 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 03:55:39 -0700 Subject: vfio: Add helper to search vfio_device in a dev_set There are drivers that need to search vfio_device within a given dev_set. e.g. vfio-pci. So add a helper. vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e ("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb(). However, it makes more sense to return -ENODEV. Suggested-by: Alex Williamson Reviewed-by: Jason Gunthorpe Tested-by: Yanting Jiang Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718105542.4138-8-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2a45853773a6..ee120d2d530b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -244,6 +244,9 @@ void vfio_unregister_group_dev(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set); +struct vfio_device * +vfio_find_device_in_devset(struct vfio_device_set *dev_set, + struct device *dev); int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state cur_fsm, -- cgit v1.2.3 From 9062ff405b49769c04f00373de2c9cefab91b600 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 03:55:40 -0700 Subject: vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx of the cdev device to check the ownership of the other affected devices. When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate the values returned are IOMMUFD devids rather than group IDs as used when accessing vfio devices through the conventional vfio group interface. Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if all of the devices affected by the hot-reset are owned by either virtue of being directly bound to the same iommufd context as the calling device, or implicitly owned via a shared IOMMU group. Suggested-by: Jason Gunthorpe Suggested-by: Alex Williamson Reviewed-by: Jason Gunthorpe Tested-by: Yanting Jiang Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718105542.4138-9-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ee120d2d530b..7079911edfb1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -114,6 +114,8 @@ struct vfio_device_ops { }; #if IS_ENABLED(CONFIG_IOMMUFD) +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev); +int vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx); int vfio_iommufd_physical_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, u32 *out_device_id); void vfio_iommufd_physical_unbind(struct vfio_device *vdev); @@ -123,6 +125,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev, void vfio_iommufd_emulated_unbind(struct vfio_device *vdev); int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id); #else +static inline struct iommufd_ctx * +vfio_iommufd_device_ictx(struct vfio_device *vdev) +{ + return NULL; +} + +static inline int +vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx) +{ + return VFIO_PCI_DEVID_NOT_OWNED; +} + #define vfio_iommufd_physical_bind \ ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \ u32 *out_device_id)) NULL) -- cgit v1.2.3 From b1a59be8a2b64d00409dc7c9d523572ed32bcff8 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:27 -0700 Subject: vfio: Refine vfio file kAPIs for KVM This prepares for making the below kAPIs to accept both group file and device file instead of only vfio group file. bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); Reviewed-by: Kevin Tian Reviewed-by: Eric Auger Reviewed-by: Jason Gunthorpe Tested-by: Terrence Xu Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-3-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 7079911edfb1..06a5221949c5 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -272,6 +272,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, */ struct iommu_group *vfio_file_iommu_group(struct file *file); bool vfio_file_is_group(struct file *file); +bool vfio_file_is_valid(struct file *file); bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); bool vfio_file_has_dev(struct file *file, struct vfio_device *device); -- cgit v1.2.3 From 9048c7341c4df9cae04c154a8b0f556dbe913358 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:38 -0700 Subject: vfio-iommufd: Add detach_ioas support for physical VFIO devices This prepares for adding DETACH ioctl for physical VFIO devices. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Terrence Xu Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-14-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 06a5221949c5..f2f02273ece1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -73,7 +73,9 @@ struct vfio_device { * @bind_iommufd: Called when binding the device to an iommufd * @unbind_iommufd: Opposite of bind_iommufd * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the - * bound iommufd. Undo in unbind_iommufd. + * bound iommufd. Undo in unbind_iommufd if @detach_ioas is not + * called. + * @detach_ioas: Opposite of attach_ioas * @open_device: Called when the first file descriptor is opened for this device * @close_device: Opposite of open_device * @read: Perform read(2) on device file descriptor @@ -97,6 +99,7 @@ struct vfio_device_ops { struct iommufd_ctx *ictx, u32 *out_device_id); void (*unbind_iommufd)(struct vfio_device *vdev); int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id); + void (*detach_ioas)(struct vfio_device *vdev); int (*open_device)(struct vfio_device *vdev); void (*close_device)(struct vfio_device *vdev); ssize_t (*read)(struct vfio_device *vdev, char __user *buf, @@ -120,6 +123,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, u32 *out_device_id); void vfio_iommufd_physical_unbind(struct vfio_device *vdev); int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id); +void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev); int vfio_iommufd_emulated_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, u32 *out_device_id); void vfio_iommufd_emulated_unbind(struct vfio_device *vdev); @@ -144,6 +148,8 @@ vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx) ((void (*)(struct vfio_device *vdev)) NULL) #define vfio_iommufd_physical_attach_ioas \ ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL) +#define vfio_iommufd_physical_detach_ioas \ + ((void (*)(struct vfio_device *vdev)) NULL) #define vfio_iommufd_emulated_bind \ ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \ u32 *out_device_id)) NULL) -- cgit v1.2.3 From e23a6217f3bb4f6f205d4517782ad49e3533fc1c Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Tue, 18 Jul 2023 06:55:39 -0700 Subject: iommufd/device: Add iommufd_access_detach() API Previously, the detach routine is only done by the destroy(). And it was called by vfio_iommufd_emulated_unbind() when the device runs close(), so all the mappings in iopt were cleaned in that setup, when the call trace reaches this detach() routine. Now, there's a need of a detach uAPI, meaning that it does not only need a new iommufd_access_detach() API, but also requires access->ops->unmap() call as a cleanup. So add one. However, leaving that unprotected can introduce some potential of a race condition during the pin_/unpin_pages() call, where access->ioas->iopt is getting referenced. So, add an ioas_lock to protect the context of iopt referencings. Also, to allow the iommufd_access_unpin_pages() callback to happen via this unmap() call, add an ioas_unpin pointer, so the unpin routine won't be affected by the "access->ioas = NULL" trick. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Terrence Xu Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Zhenzhong Duan Signed-off-by: Nicolin Chen Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-15-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/iommufd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 68defed9ea48..3a3216cb9482 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -48,6 +48,7 @@ iommufd_access_create(struct iommufd_ctx *ictx, const struct iommufd_access_ops *ops, void *data, u32 *id); void iommufd_access_destroy(struct iommufd_access *access); int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id); +void iommufd_access_detach(struct iommufd_access *access); void iommufd_ctx_get(struct iommufd_ctx *ictx); -- cgit v1.2.3 From 8cfa71860233652f8566bcdf55e77aefe0017b4a Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:40 -0700 Subject: vfio-iommufd: Add detach_ioas support for emulated VFIO devices This prepares for adding DETACH ioctl for emulated VFIO devices. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Terrence Xu Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index f2f02273ece1..24091a7c7bdb 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -128,6 +128,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, u32 *out_device_id); void vfio_iommufd_emulated_unbind(struct vfio_device *vdev); int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id); +void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev); #else static inline struct iommufd_ctx * vfio_iommufd_device_ictx(struct vfio_device *vdev) @@ -157,6 +158,8 @@ vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx) ((void (*)(struct vfio_device *vdev)) NULL) #define vfio_iommufd_emulated_attach_ioas \ ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL) +#define vfio_iommufd_emulated_detach_ioas \ + ((void (*)(struct vfio_device *vdev)) NULL) #endif static inline bool vfio_device_cdev_opened(struct vfio_device *device) -- cgit v1.2.3 From 8b6f173a4ce47ef0606124710315560c64f2344e Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:43 -0700 Subject: vfio: Add cdev for vfio_device This adds cdev support for vfio_device. It allows the user to directly open a vfio device w/o using the legacy container/group interface, as a prerequisite for supporting new iommu features like nested translation and etc. The device fd opened in this manner doesn't have the capability to access the device as the fops open() doesn't open the device until the successful VFIO_DEVICE_BIND_IOMMUFD ioctl which will be added in a later patch. With this patch, devices registered to vfio core would have both the legacy group and the new device interfaces created. - group interface : /dev/vfio/$groupID - device interface: /dev/vfio/devices/vfioX - normal device ("X" is a unique number across vfio devices) For a given device, the user can identify the matching vfioX by searching the vfio-dev folder under the sysfs path of the device. Take PCI device (0000:6a:01.0) as an example, /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfioX implies the matching vfioX under /dev/vfio/devices/, and vfio-dev/vfioX/dev contains the major:minor number of the matching /dev/vfio/devices/vfioX. The user can get device fd by opening the /dev/vfio/devices/vfioX. The vfio_device cdev logic in this patch: *) __vfio_register_dev() path ends up doing cdev_device_add() for each vfio_device if VFIO_DEVICE_CDEV configured. *) vfio_unregister_group_dev() path does cdev_device_del(); cdev interface does not support noiommu devices, so VFIO only creates the legacy group interface for the physical devices that do not have IOMMU. noiommu users should use the legacy group interface. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Tested-by: Terrence Xu Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-19-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 24091a7c7bdb..e0069f26488d 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -51,6 +52,9 @@ struct vfio_device { /* Members below here are private, not for driver use */ unsigned int index; struct device device; /* device.kref covers object life circle */ +#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) + struct cdev cdev; +#endif refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; -- cgit v1.2.3 From 1c9dc07487cb0f246075b2d3b305bba91156d376 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:45 -0700 Subject: iommufd: Add iommufd_ctx_from_fd() It's common to get a reference to the iommufd context from a given file descriptor. So adds an API for it. Existing users of this API are compiled only when IOMMUFD is enabled, so no need to have a stub for the IOMMUFD disabled case. Tested-by: Yanting Jiang Reviewed-by: Jason Gunthorpe Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-21-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/iommufd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 3a3216cb9482..9657c58813dc 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -54,6 +54,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx); #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); +struct iommufd_ctx *iommufd_ctx_from_fd(int fd); void iommufd_ctx_put(struct iommufd_ctx *ictx); bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group); -- cgit v1.2.3 From 5fcc26969a164e6a3154bb68badd6712716eb954 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:47 -0700 Subject: vfio: Add VFIO_DEVICE_BIND_IOMMUFD This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20230718135551.6592-23-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e0069f26488d..d6228c839c44 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -64,8 +64,9 @@ struct vfio_device { void (*put_kvm)(struct kvm *kvm); #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; - bool iommufd_attached; + u8 iommufd_attached:1; #endif + u8 cdev_opened:1; }; /** @@ -168,7 +169,7 @@ vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx) static inline bool vfio_device_cdev_opened(struct vfio_device *device) { - return false; + return device->cdev_opened; } /** -- cgit v1.2.3 From c1cce6d079b875396c9a7c6838fc5b024758e540 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Tue, 18 Jul 2023 06:55:50 -0700 Subject: vfio: Compile vfio_group infrastructure optionally vfio_group is not needed for vfio device cdev, so with vfio device cdev introduced, the vfio_group infrastructures can be compiled out if only cdev is needed. Reviewed-by: Jason Gunthorpe Tested-by: Nicolin Chen Tested-by: Matthew Rosato Tested-by: Yanting Jiang Tested-by: Shameer Kolothum Tested-by: Terrence Xu Tested-by: Zhenzhong Duan Signed-off-by: Yi Liu Link: https://lore.kernel.org/r/20230718135551.6592-26-yi.l.liu@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d6228c839c44..5a1dee983f17 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -43,7 +43,11 @@ struct vfio_device { */ const struct vfio_migration_ops *mig_ops; const struct vfio_log_ops *log_ops; +#if IS_ENABLED(CONFIG_VFIO_GROUP) struct vfio_group *group; + struct list_head group_next; + struct list_head iommu_entry; +#endif struct vfio_device_set *dev_set; struct list_head dev_set_list; unsigned int migration_flags; @@ -58,8 +62,6 @@ struct vfio_device { refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; - struct list_head group_next; - struct list_head iommu_entry; struct iommufd_access *iommufd_access; void (*put_kvm)(struct kvm *kvm); #if IS_ENABLED(CONFIG_IOMMUFD) @@ -284,12 +286,29 @@ int vfio_mig_get_next_state(struct vfio_device *device, /* * External user API */ +#if IS_ENABLED(CONFIG_VFIO_GROUP) struct iommu_group *vfio_file_iommu_group(struct file *file); bool vfio_file_is_group(struct file *file); +bool vfio_file_has_dev(struct file *file, struct vfio_device *device); +#else +static inline struct iommu_group *vfio_file_iommu_group(struct file *file) +{ + return NULL; +} + +static inline bool vfio_file_is_group(struct file *file) +{ + return false; +} + +static inline bool vfio_file_has_dev(struct file *file, struct vfio_device *device) +{ + return false; +} +#endif bool vfio_file_is_valid(struct file *file); bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); -bool vfio_file_has_dev(struct file *file, struct vfio_device *device); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -- cgit v1.2.3 From bcb48185eddf72d5e2a9f745aaec030778e3ea35 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 12 Jul 2023 10:18:03 +0200 Subject: tty: sysrq: switch sysrq handlers from int to u8 The passed parameter to sysrq handlers is a key (a character). So change the type from 'int' to 'u8'. Let it specifically be 'u8' for two reasons: * unsigned: unsigned values come from the upper layers (devices) and the tty layer assumes unsigned on most places, and * 8-bit: as that what's supposed to be one day in all the layers built on the top of tty. (Currently, we use mostly 'unsigned char' and somewhere still only 'char'. (But that also translates to the former thanks to -funsigned-char.)) Signed-off-by: Jiri Slaby (SUSE) Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Huacai Chen Cc: WANG Xuerui Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: "David S. Miller" Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: Jason Wessel Cc: Daniel Thompson Cc: Douglas Anderson Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Pavel Machek Cc: "Paul E. McKenney" Cc: Frederic Weisbecker Cc: Neeraj Upadhyay Cc: Joel Fernandes Cc: Josh Triplett Cc: Boqun Feng Cc: Steven Rostedt Cc: Mathieu Desnoyers Cc: Lai Jiangshan Cc: Zqiang Acked-by: Thomas Zimmermann # DRM Acked-by: WANG Xuerui # loongarch Acked-by: Paul E. McKenney Acked-by: Daniel Thompson Link: https://lore.kernel.org/r/20230712081811.29004-3-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/sysrq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h index 3a582ec7a2f1..bb8d07814b0e 100644 --- a/include/linux/sysrq.h +++ b/include/linux/sysrq.h @@ -30,7 +30,7 @@ #define SYSRQ_ENABLE_RTNICE 0x0100 struct sysrq_key_op { - void (* const handler)(int); + void (* const handler)(u8); const char * const help_msg; const char * const action_msg; const int enable_mask; -- cgit v1.2.3 From 8ac20a03da56d56a54b01dd0b62254826a84474d Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 12 Jul 2023 10:18:04 +0200 Subject: tty: sysrq: switch the rest of keys to u8 Propagate u8 more from the bottom to the interface, so that sysrq callers (usually drivers) see that u8 is expected. Signed-off-by: Jiri Slaby (SUSE) Link: https://lore.kernel.org/r/20230712081811.29004-4-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/sysrq.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysrq.h b/include/linux/sysrq.h index bb8d07814b0e..bdca467ebb77 100644 --- a/include/linux/sysrq.h +++ b/include/linux/sysrq.h @@ -43,10 +43,10 @@ struct sysrq_key_op { * are available -- else NULL's). */ -void handle_sysrq(int key); -void __handle_sysrq(int key, bool check_mask); -int register_sysrq_key(int key, const struct sysrq_key_op *op); -int unregister_sysrq_key(int key, const struct sysrq_key_op *op); +void handle_sysrq(u8 key); +void __handle_sysrq(u8 key, bool check_mask); +int register_sysrq_key(u8 key, const struct sysrq_key_op *op); +int unregister_sysrq_key(u8 key, const struct sysrq_key_op *op); extern const struct sysrq_key_op *__sysrq_reboot_op; int sysrq_toggle_support(int enable_mask); @@ -54,20 +54,20 @@ int sysrq_mask(void); #else -static inline void handle_sysrq(int key) +static inline void handle_sysrq(u8 key) { } -static inline void __handle_sysrq(int key, bool check_mask) +static inline void __handle_sysrq(u8 key, bool check_mask) { } -static inline int register_sysrq_key(int key, const struct sysrq_key_op *op) +static inline int register_sysrq_key(u8 key, const struct sysrq_key_op *op) { return -EINVAL; } -static inline int unregister_sysrq_key(int key, const struct sysrq_key_op *op) +static inline int unregister_sysrq_key(u8 key, const struct sysrq_key_op *op) { return -EINVAL; } -- cgit v1.2.3 From 12ae2359eb2f1b880863b266a2e72f65f043eacd Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 12 Jul 2023 10:18:06 +0200 Subject: serial: convert uart sysrq handling to u8 Propagate u8 from the sysrq code further up to serial's uart_handle_sysrq_char() and friends. Signed-off-by: Jiri Slaby (SUSE) Link: https://lore.kernel.org/r/20230712081811.29004-6-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 6d58c57acdaa..14dd85ee849e 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -569,7 +569,7 @@ struct uart_port { struct serial_port_device *port_dev; /* serial core port device */ unsigned long sysrq; /* sysrq timeout */ - unsigned int sysrq_ch; /* char for sysrq */ + u8 sysrq_ch; /* char for sysrq */ unsigned char has_sysrq; unsigned char sysrq_seq; /* index in sysrq_toggle_seq */ @@ -910,9 +910,9 @@ void uart_xchar_out(struct uart_port *uport, int offset); #ifdef CONFIG_MAGIC_SYSRQ_SERIAL #define SYSRQ_TIMEOUT (HZ * 5) -bool uart_try_toggle_sysrq(struct uart_port *port, unsigned int ch); +bool uart_try_toggle_sysrq(struct uart_port *port, u8 ch); -static inline int uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +static inline int uart_handle_sysrq_char(struct uart_port *port, u8 ch) { if (!port->sysrq) return 0; @@ -931,7 +931,7 @@ static inline int uart_handle_sysrq_char(struct uart_port *port, unsigned int ch return 0; } -static inline int uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) +static inline int uart_prepare_sysrq_char(struct uart_port *port, u8 ch) { if (!port->sysrq) return 0; @@ -952,7 +952,7 @@ static inline int uart_prepare_sysrq_char(struct uart_port *port, unsigned int c static inline void uart_unlock_and_check_sysrq(struct uart_port *port) { - int sysrq_ch; + u8 sysrq_ch; if (!port->has_sysrq) { spin_unlock(&port->lock); @@ -971,7 +971,7 @@ static inline void uart_unlock_and_check_sysrq(struct uart_port *port) static inline void uart_unlock_and_check_sysrq_irqrestore(struct uart_port *port, unsigned long flags) { - int sysrq_ch; + u8 sysrq_ch; if (!port->has_sysrq) { spin_unlock_irqrestore(&port->lock, flags); @@ -987,11 +987,11 @@ static inline void uart_unlock_and_check_sysrq_irqrestore(struct uart_port *port handle_sysrq(sysrq_ch); } #else /* CONFIG_MAGIC_SYSRQ_SERIAL */ -static inline int uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +static inline int uart_handle_sysrq_char(struct uart_port *port, u8 ch) { return 0; } -static inline int uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) +static inline int uart_prepare_sysrq_char(struct uart_port *port, u8 ch) { return 0; } -- cgit v1.2.3 From df007fa02560435f7ffbc62147cc0084f8241899 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 12 Jul 2023 10:18:07 +0200 Subject: serial: make uart_insert_char() accept u8s Both the character and flag are 8-bit values. So switch from unsigned ints to u8s. The drivers will be cleaned up in the next round. Signed-off-by: Jiri Slaby (SUSE) Link: https://lore.kernel.org/r/20230712081811.29004-7-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 14dd85ee849e..105d2cdc0126 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -903,7 +903,7 @@ void uart_handle_dcd_change(struct uart_port *uport, bool active); void uart_handle_cts_change(struct uart_port *uport, bool active); void uart_insert_char(struct uart_port *port, unsigned int status, - unsigned int overrun, unsigned int ch, unsigned int flag); + unsigned int overrun, u8 ch, u8 flag); void uart_xchar_out(struct uart_port *uport, int offset); -- cgit v1.2.3 From a08799cf17c22375752abfad3b4a2b34b3acb287 Mon Sep 17 00:00:00 2001 From: Stanley Chang Date: Tue, 25 Jul 2023 11:31:52 +0800 Subject: usb: phy: add usb phy notify port status API In Realtek SoC, the parameter of usb phy is designed to can dynamic tuning base on port status. Therefore, add a notify callback of phy driver when usb port status change. The Realtek phy driver is designed to dynamically adjust disconnection level and calibrate phy parameters. When the device connected bit changes and when the disconnected bit changes, do port status change notification: Check if portstatus is USB_PORT_STAT_CONNECTION and portchange is USB_PORT_STAT_C_CONNECTION. 1. The device is connected, the driver lowers the disconnection level and calibrates the phy parameters. 2. The device disconnects, the driver increases the disconnect level and calibrates the phy parameters. When controller to notify connect that device is already ready. If we adjust the disconnection level in notify_connect, the disconnect may have been triggered at this stage. So we need to change that as early as possible. The status change of connection is before port reset. Therefore, we add an api to notify phy the port status changes. In this stage, the device is not port enable, and it will not trigger disconnection. Signed-off-by: Stanley Chang Link: https://lore.kernel.org/r/20230725033318.8361-1-stanley_chang@realtek.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/phy.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/phy.h b/include/linux/usb/phy.h index e4de6bc1f69b..b513749582d7 100644 --- a/include/linux/usb/phy.h +++ b/include/linux/usb/phy.h @@ -144,6 +144,10 @@ struct usb_phy { */ int (*set_wakeup)(struct usb_phy *x, bool enabled); + /* notify phy port status change */ + int (*notify_port_status)(struct usb_phy *x, int port, + u16 portstatus, u16 portchange); + /* notify phy connect status change */ int (*notify_connect)(struct usb_phy *x, enum usb_device_speed speed); @@ -316,6 +320,15 @@ usb_phy_set_wakeup(struct usb_phy *x, bool enabled) return 0; } +static inline int +usb_phy_notify_port_status(struct usb_phy *x, int port, u16 portstatus, u16 portchange) +{ + if (x && x->notify_port_status) + return x->notify_port_status(x, port, portstatus, portchange); + else + return 0; +} + static inline int usb_phy_notify_connect(struct usb_phy *x, enum usb_device_speed speed) { -- cgit v1.2.3 From 2303fae130640874823ea1bc7ec65c3cd074a7eb Mon Sep 17 00:00:00 2001 From: Peter Seiderer Date: Mon, 24 Jul 2023 18:22:55 +0200 Subject: net: skbuff: remove unused HAVE_HW_TIME_STAMP feature define Remove unused HAVE_HW_TIME_STAMP feature define (introduced by commit ac45f602ee3d ("net: infrastructure for hardware time stamping"). Signed-off-by: Peter Seiderer Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index faaba050f843..16a49ba534e4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -441,8 +441,6 @@ static inline bool skb_frag_must_loop(struct page *p) copied += p_len, p++, p_off = 0, \ p_len = min_t(u32, f_len - copied, PAGE_SIZE)) \ -#define HAVE_HW_TIME_STAMP - /** * struct skb_shared_hwtstamps - hardware time stamps * @hwtstamp: hardware time stamp transformed into duration -- cgit v1.2.3 From 5c6e623f1b8ebca39eeefba4b18d574eb5acf0bd Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Tue, 25 Jul 2023 20:32:05 +0530 Subject: perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_NA is used to set default values. Signed-off-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20230725150206.184-3-ravi.bangoria@amd.com --- include/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2166a69e3bf2..dd92b4f5d370 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1194,7 +1194,8 @@ struct perf_sample_data { PERF_MEM_S(LVL, NA) |\ PERF_MEM_S(SNOOP, NA) |\ PERF_MEM_S(LOCK, NA) |\ - PERF_MEM_S(TLB, NA)) + PERF_MEM_S(TLB, NA) |\ + PERF_MEM_S(LVLNUM, NA)) static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr, u64 period) -- cgit v1.2.3 From 0cb52ad7bbb27bc6700412b055c743d5ae501b29 Mon Sep 17 00:00:00 2001 From: James Clark Date: Mon, 24 Jul 2023 14:44:59 +0100 Subject: perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability Since commit bd2756811766 ("perf: Rewrite core context handling") the relationship between perf_event_context and PMUs has changed so that the error scenario that PERF_PMU_CAP_HETEROGENEOUS_CPUS originally silenced no longer exists. Remove the capability to avoid confusion that it actually influences any perf core behavior and shift down the following capability bits to fill in the unused space. This change should be a no-op. Signed-off-by: James Clark Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Anshuman Khandual Acked-by: Ian Rogers Link: https://lore.kernel.org/r/20230724134500.970496-5-james.clark@arm.com --- include/linux/perf_event.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index dd92b4f5d370..9b1cf3c19c5f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -288,10 +288,9 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_EXTENDED_REGS 0x0008 #define PERF_PMU_CAP_EXCLUSIVE 0x0010 #define PERF_PMU_CAP_ITRACE 0x0020 -#define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x0040 -#define PERF_PMU_CAP_NO_EXCLUDE 0x0080 -#define PERF_PMU_CAP_AUX_OUTPUT 0x0100 -#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0200 +#define PERF_PMU_CAP_NO_EXCLUDE 0x0040 +#define PERF_PMU_CAP_AUX_OUTPUT 0x0080 +#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 struct perf_output_handle; -- cgit v1.2.3 From 62af03223785c11a0916df6a854ef4785d2350a5 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Jul 2023 21:50:38 +0800 Subject: perf: Remove unused extern declaration arch_perf_get_page_size() commit 8af26be06272 ("perf/core: Fix arch_perf_get_page_size()") left behind this. Signed-off-by: YueHaibing Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20230725135038.25060-1-yuehaibing@huawei.com --- include/linux/perf_event.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 9b1cf3c19c5f..e83f13ce4a9f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1860,10 +1860,6 @@ extern void arch_perf_update_userpage(struct perf_event *event, struct perf_event_mmap_page *userpg, u64 now); -#ifdef CONFIG_MMU -extern __weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr); -#endif - /* * Snapshot branch stack on software events. * -- cgit v1.2.3 From fa1ffdb9e2937d04657c33a146aa0d2cd39abba0 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Mon, 17 Jul 2023 15:12:12 -0300 Subject: iommufd/selftest: Test iommufd_device_replace() Allow the selftest to call the function on the mock idev, add some tests to exercise it. Link: https://lore.kernel.org/r/16-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian Tested-by: Nicolin Chen Signed-off-by: Nicolin Chen Signed-off-by: Yi Liu Signed-off-by: Jason Gunthorpe --- include/linux/iommufd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 9657c58813dc..0ac60256b659 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -23,6 +23,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, void iommufd_device_unbind(struct iommufd_device *idev); int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id); +int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev); struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev); -- cgit v1.2.3 From 5a1c7097472fcde5745654e3a59f55140903d9cc Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 10 Jul 2023 11:54:57 +0530 Subject: coresight: etm4x: Drop pid argument from etm4_probe() Coresight device pid can be retrieved from its iomem base address, which is stored in 'struct etm4x_drvdata'. This drops pid argument from etm4_probe() and 'struct etm4_init_arg'. Instead etm4_check_arch_features() derives the coresight device pid with a new helper coresight_get_pid(), right before it is consumed in etm4_hisi_match_pid(). Cc: Mathieu Poirier Cc: Suzuki K Poulose Cc: Mike Leach Cc: Leo Yan Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230710062500.45147-4-anshuman.khandual@arm.com --- include/linux/coresight.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/coresight.h b/include/linux/coresight.h index bf70987240e4..255cfd283883 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -386,6 +386,18 @@ static inline u32 csdev_access_relaxed_read32(struct csdev_access *csa, return csa->read(offset, true, false); } +#define CORESIGHT_PIDRn(i) (0xFE0 + ((i) * 4)) + +static inline u32 coresight_get_pid(struct csdev_access *csa) +{ + u32 i, pid = 0; + + for (i = 0; i < 4; i++) + pid |= csdev_access_relaxed_read32(csa, CORESIGHT_PIDRn(i)) << (i * 8); + + return pid; +} + static inline u64 csdev_access_relaxed_read_pair(struct csdev_access *csa, u32 lo_offset, u32 hi_offset) { -- cgit v1.2.3 From 73d779a03a76ac3fe26832cba3c9ad04194af595 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 10 Jul 2023 11:54:58 +0530 Subject: coresight: etm4x: Change etm4_platform_driver driver for MMIO devices Add support for handling MMIO based devices via platform driver. We need to make sure that : 1) The APB clock, if present is enabled at probe and via runtime_pm ops 2) Use the ETM4x architecture or CoreSight architecture registers to identify a device as CoreSight ETM4x, instead of relying a white list of "Peripheral IDs" The driver doesn't get to handle the devices yet, until we wire the ACPI changes to move the devices to be handled via platform driver than the etm4_amba driver. Cc: Mathieu Poirier Cc: Suzuki K Poulose Cc: Mike Leach Cc: Leo Yan Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Sudeep Holla Signed-off-by: Anshuman Khandual Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230710062500.45147-5-anshuman.khandual@arm.com --- include/linux/coresight.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) (limited to 'include/linux') diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 255cfd283883..a269fffaf991 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -6,6 +6,8 @@ #ifndef _LINUX_CORESIGHT_H #define _LINUX_CORESIGHT_H +#include +#include #include #include #include @@ -386,6 +388,51 @@ static inline u32 csdev_access_relaxed_read32(struct csdev_access *csa, return csa->read(offset, true, false); } +#define CORESIGHT_CIDRn(i) (0xFF0 + ((i) * 4)) + +static inline u32 coresight_get_cid(void __iomem *base) +{ + u32 i, cid = 0; + + for (i = 0; i < 4; i++) + cid |= readl(base + CORESIGHT_CIDRn(i)) << (i * 8); + + return cid; +} + +static inline bool is_coresight_device(void __iomem *base) +{ + u32 cid = coresight_get_cid(base); + + return cid == CORESIGHT_CID; +} + +/* + * Attempt to find and enable "APB clock" for the given device + * + * Returns: + * + * clk - Clock is found and enabled + * NULL - clock is not found + * ERROR - Clock is found but failed to enable + */ +static inline struct clk *coresight_get_enable_apb_pclk(struct device *dev) +{ + struct clk *pclk; + int ret; + + pclk = clk_get(dev, "apb_pclk"); + if (IS_ERR(pclk)) + return NULL; + + ret = clk_prepare_enable(pclk); + if (ret) { + clk_put(pclk); + return ERR_PTR(ret); + } + return pclk; +} + #define CORESIGHT_PIDRn(i) (0xFE0 + ((i) * 4)) static inline u32 coresight_get_pid(struct csdev_access *csa) -- cgit v1.2.3 From 2582c531203d224f3d18cf582deb65383a05c038 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Wed, 22 Mar 2023 18:00:32 +1100 Subject: of: fix htmldocs build warnings Fix these htmldoc build warnings: include/linux/of.h:115: warning: cannot understand function prototype: 'const struct kobj_type of_node_ktype; ' include/linux/of.h:118: warning: Excess function parameter 'phandle_name' description in 'of_node_init' Reported-by: Stephen Rothwell Reported-by: Randy Dunlap Fixes: d9194e009efe ("of: dynamic: add lifecycle docbook info to node creation functions") Signed-off-by: Stephen Rothwell Reviewed-by: Randy Dunlap Link: https://lore.kernel.org/r/20230322180032.1badd132@canb.auug.org.au Signed-off-by: Rob Herring --- include/linux/of.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 6ecde0515677..7656cc637746 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -96,10 +96,12 @@ struct of_reconfig_data { struct property *old_prop; }; +extern const struct kobj_type of_node_ktype; +extern const struct fwnode_operations of_fwnode_ops; + /** * of_node_init - initialize a devicetree node * @node: Pointer to device node that has been created by kzalloc() - * @phandle_name: Name of property holding a phandle value * * On return the device_node refcount is set to one. Use of_node_put() * on @node when done to free the memory allocated for it. If the node @@ -107,9 +109,6 @@ struct of_reconfig_data { * whether to free the memory will be done by node->release(), which is * of_node_release(). */ -/* initialize a node */ -extern const struct kobj_type of_node_ktype; -extern const struct fwnode_operations of_fwnode_ops; static inline void of_node_init(struct device_node *node) { #if defined(CONFIG_OF_KOBJ) -- cgit v1.2.3 From 8b305ee2a91c3c4c89cb82ea940265b247eb0a13 Mon Sep 17 00:00:00 2001 From: Tristram Ha Date: Tue, 25 Jul 2023 16:54:30 -0700 Subject: net: phy: smsc: add WoL support to LAN8740/LAN8742 PHYs Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and Magic Packet WoL. They have one pattern filter matching up to 128 bytes of frame data, which can be used to implement ARP or multicast WoL. ARP WoL matches any ARP frame with broadcast address. Multicast WoL matches any multicast frame. Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/1690329270-2873-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski --- include/linux/smscphy.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/smscphy.h b/include/linux/smscphy.h index e1c88627755a..1a6a851d2cf8 100644 --- a/include/linux/smscphy.h +++ b/include/linux/smscphy.h @@ -38,4 +38,38 @@ int smsc_phy_set_tunable(struct phy_device *phydev, struct ethtool_tunable *tuna, const void *data); int smsc_phy_probe(struct phy_device *phydev); +#define MII_LAN874X_PHY_MMD_WOL_WUCSR 0x8010 +#define MII_LAN874X_PHY_MMD_WOL_WUF_CFGA 0x8011 +#define MII_LAN874X_PHY_MMD_WOL_WUF_CFGB 0x8012 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK0 0x8021 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK1 0x8022 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK2 0x8023 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK3 0x8024 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK4 0x8025 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK5 0x8026 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK6 0x8027 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK7 0x8028 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRA 0x8061 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRB 0x8062 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRC 0x8063 +#define MII_LAN874X_PHY_MMD_MCFGR 0x8064 + +#define MII_LAN874X_PHY_PME1_SET (2 << 13) +#define MII_LAN874X_PHY_PME2_SET (2 << 11) +#define MII_LAN874X_PHY_PME_SELF_CLEAR BIT(9) +#define MII_LAN874X_PHY_WOL_PFDA_FR BIT(7) +#define MII_LAN874X_PHY_WOL_WUFR BIT(6) +#define MII_LAN874X_PHY_WOL_MPR BIT(5) +#define MII_LAN874X_PHY_WOL_BCAST_FR BIT(4) +#define MII_LAN874X_PHY_WOL_PFDAEN BIT(3) +#define MII_LAN874X_PHY_WOL_WUEN BIT(2) +#define MII_LAN874X_PHY_WOL_MPEN BIT(1) +#define MII_LAN874X_PHY_WOL_BCSTEN BIT(0) + +#define MII_LAN874X_PHY_WOL_FILTER_EN BIT(15) +#define MII_LAN874X_PHY_WOL_FILTER_MCASTTEN BIT(9) +#define MII_LAN874X_PHY_WOL_FILTER_BCSTEN BIT(8) + +#define MII_LAN874X_PHY_PME_SELF_CLEAR_DELAY 0x1000 /* 81 milliseconds */ + #endif /* __LINUX_SMSCPHY_H__ */ -- cgit v1.2.3 From fb3bd914b3ec28f5fb697ac55c4846ac2d542855 Mon Sep 17 00:00:00 2001 From: "Borislav Petkov (AMD)" Date: Wed, 28 Jun 2023 11:02:39 +0200 Subject: x86/srso: Add a Speculative RAS Overflow mitigation Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6e6e57ec69e8..23ac87be1ff1 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -70,6 +70,8 @@ extern ssize_t cpu_show_mmio_stale_data(struct device *dev, char *buf); extern ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_spec_rstack_overflow(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit v1.2.3 From 09da082b07bbae1c11d9560c8502800039aebcea Mon Sep 17 00:00:00 2001 From: Alexey Gladkov Date: Tue, 11 Jul 2023 18:16:04 +0200 Subject: fs: Add fchmodat2() On the userspace side fchmodat(3) is implemented as a wrapper function which implements the POSIX-specified interface. This interface differs from the underlying kernel system call, which does not have a flags argument. Most implementations require procfs [1][2]. There doesn't appear to be a good userspace workaround for this issue but the implementation in the kernel is pretty straight-forward. The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag, unlike existing fchmodat. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35 [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28 Co-developed-by: Palmer Dabbelt Signed-off-by: Palmer Dabbelt Signed-off-by: Alexey Gladkov Acked-by: Arnd Bergmann Message-Id: [brauner: pre reviews, do flag conversion in do_fchmodat() directly] Signed-off-by: Christian Brauner --- include/linux/syscalls.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 584f404bf868..5690aef7b3a8 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -440,6 +440,8 @@ asmlinkage long sys_chroot(const char __user *filename); asmlinkage long sys_fchmod(unsigned int fd, umode_t mode); asmlinkage long sys_fchmodat(int dfd, const char __user *filename, umode_t mode); +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename, + umode_t mode, unsigned int flags); asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group, int flag); asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group); -- cgit v1.2.3 From 09eadda27ca4afc3f560efea265bbe7a93ef786d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 19 Mar 2023 22:29:28 +0100 Subject: backlight: corgi_lcd: fix missing prototype The corgi_lcd_limit_intensity() function is called from platform and defined in a driver, but the driver does not see the declaration: drivers/video/backlight/corgi_lcd.c:434:6: error: no previous prototype for 'corgi_lcd_limit_intensity' [-Werror=missing-prototypes] 434 | void corgi_lcd_limit_intensity(int limit) Move the prototype into a header that can be included from both sides to shut up the warning. Reviewed-by: Daniel Thompson Signed-off-by: Arnd Bergmann --- include/linux/spi/corgi_lcd.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/corgi_lcd.h b/include/linux/spi/corgi_lcd.h index 0b857616919c..fc6c1515dc54 100644 --- a/include/linux/spi/corgi_lcd.h +++ b/include/linux/spi/corgi_lcd.h @@ -15,4 +15,6 @@ struct corgi_lcd_platform_data { void (*kick_battery)(void); }; +void corgi_lcd_limit_intensity(int limit); + #endif /* __LINUX_SPI_CORGI_LCD_H */ -- cgit v1.2.3 From 71c8f9cf2623d0db79665f876b95afcdd8214aec Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jul 2023 21:00:25 +0200 Subject: mtd: spi-nor: avoid holes in struct spi_mem_op gcc gets confused when -ftrivial-auto-var-init=pattern is used on sparse bit fields such as 'struct spi_mem_op', which caused the previous false positive warning about an uninitialized variable: drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized] In fact, the variable is fully initialized and gcc does not see it being used, so the warning is entirely bogus. The problem appears to be a misoptimization in the initialization of single bit fields when the rest of the bytes are not initialized. A previous workaround added another initialization, which ended up shutting up the warning in spansion.c, though it apparently still happens in other files as reported by Peter Foley in the gcc bugzilla. The workaround of adding a fake initialization seems particularly bad because it would set values that can never be correct but prevent the compiler from warning about actually missing initializations. Revert the broken workaround and instead pad the structure to only have bitfields that add up to full bytes, which should avoid this behavior in all drivers. I also filed a new bug against gcc with what I found, so this can hopefully be addressed in future gcc releases. At the moment, only gcc-12 and gcc-13 are affected. Cc: Peter Foley Cc: Pedro Falcato Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110743 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108402 Link: https://godbolt.org/z/efMMsG1Kx Fixes: 420c4495b5e56 ("mtd: spi-nor: spansion: make sure local struct does not contain garbage") Signed-off-by: Arnd Bergmann Acked-by: Mark Brown Acked-by: Tudor Ambarus Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20230719190045.4007391-1-arnd@kernel.org --- include/linux/spi/spi-mem.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h index 8e984d75f5b6..6b0a7dc48a4b 100644 --- a/include/linux/spi/spi-mem.h +++ b/include/linux/spi/spi-mem.h @@ -101,6 +101,7 @@ struct spi_mem_op { u8 nbytes; u8 buswidth; u8 dtr : 1; + u8 __pad : 7; u16 opcode; } cmd; @@ -108,6 +109,7 @@ struct spi_mem_op { u8 nbytes; u8 buswidth; u8 dtr : 1; + u8 __pad : 7; u64 val; } addr; @@ -115,12 +117,14 @@ struct spi_mem_op { u8 nbytes; u8 buswidth; u8 dtr : 1; + u8 __pad : 7; } dummy; struct { u8 buswidth; u8 dtr : 1; u8 ecc : 1; + u8 __pad : 6; enum spi_mem_data_dir dir; unsigned int nbytes; union { -- cgit v1.2.3 From 630fdd592912614a72d00026fdadad72d9ef62eb Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 26 Jul 2023 14:59:57 -0700 Subject: seq_file: seq_show_option_n() is used for precise sizes When seq_show_option_n() is used, it is for non-string memory that happens to be printable bytes. As such, we must use memcpy() to copy the bytes and then explicitly NUL-terminate the result. Cc: Andy Shevchenko Cc: Andrew Morton Cc: Al Viro Cc: Muchun Song Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230726215957.never.619-kees@kernel.org Signed-off-by: Kees Cook --- include/linux/seq_file.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index bd023dd38ae6..386ab580b839 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -249,18 +249,19 @@ static inline void seq_show_option(struct seq_file *m, const char *name, /** * seq_show_option_n - display mount options with appropriate escapes - * where @value must be a specific length. + * where @value must be a specific length (i.e. + * not NUL-terminated). * @m: the seq_file handle * @name: the mount option name * @value: the mount option name's value, cannot be NULL - * @length: the length of @value to display + * @length: the exact length of @value to display, must be constant expression * * This is a macro since this uses "length" to define the size of the * stack buffer. */ #define seq_show_option_n(m, name, value, length) { \ char val_buf[length + 1]; \ - strncpy(val_buf, value, length); \ + memcpy(val_buf, value, length); \ val_buf[length] = '\0'; \ seq_show_option(m, name, val_buf); \ } -- cgit v1.2.3 From 88d162b479815f5d6b6a4ff5fdb07aec9dc6280c Mon Sep 17 00:00:00 2001 From: Roi Dayan Date: Thu, 4 May 2023 12:14:00 +0300 Subject: net/mlx5: Devcom, Infrastructure changes Update devcom infrastructure to be more generic, without depending on max supported ports definition or a device guid, and also more encapsulated so callers don't need to pass the register devcom component id per event call. Signed-off-by: Eli Cohen Signed-off-by: Roi Dayan Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 25d0528f9219..56dd3dfe2304 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -501,7 +501,7 @@ struct mlx5_events; struct mlx5_mpfs; struct mlx5_eswitch; struct mlx5_lag; -struct mlx5_devcom; +struct mlx5_devcom_dev; struct mlx5_fw_reset; struct mlx5_eq_table; struct mlx5_irq_table; @@ -618,7 +618,7 @@ struct mlx5_priv { struct mlx5_core_sriov sriov; struct mlx5_lag *lag; u32 flags; - struct mlx5_devcom *devcom; + struct mlx5_devcom_dev *devc; struct mlx5_fw_reset *fw_reset; struct mlx5_core_roce roce; struct mlx5_fc_stats fc_stats; -- cgit v1.2.3 From 58db72869a9f8e01910844ca145efc2ea91bbbf9 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Wed, 18 Jan 2023 16:52:17 +0200 Subject: net/mlx5: Re-organize mlx5_cmd struct Downstream patch will split mlx5_cmd_init() to probe and reload routines. As a preparation, organize mlx5_cmd struct so that any field that will be used in the reload routine are grouped at new nested struct. Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 56dd3dfe2304..39c5f4087c39 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -287,18 +287,23 @@ struct mlx5_cmd_stats { struct mlx5_cmd { struct mlx5_nb nb; + /* members which needs to be queried or reinitialized each reload */ + struct { + u16 cmdif_rev; + u8 log_sz; + u8 log_stride; + int max_reg_cmds; + unsigned long bitmask; + struct semaphore sem; + struct semaphore pages_sem; + struct semaphore throttle_sem; + } vars; enum mlx5_cmdif_state state; void *cmd_alloc_buf; dma_addr_t alloc_dma; int alloc_size; void *cmd_buf; dma_addr_t dma; - u16 cmdif_rev; - u8 log_sz; - u8 log_stride; - int max_reg_cmds; - int events; - u32 __iomem *vector; /* protect command queue allocations */ @@ -308,12 +313,8 @@ struct mlx5_cmd { */ spinlock_t token_lock; u8 token; - unsigned long bitmask; char wq_name[MLX5_CMD_WQ_MAX_NAME]; struct workqueue_struct *wq; - struct semaphore sem; - struct semaphore pages_sem; - struct semaphore throttle_sem; int mode; u16 allowed_opcode; struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS]; -- cgit v1.2.3 From b90ebfc018b087ba1e4981b298b58733236ff296 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Thu, 19 Jan 2023 09:10:50 +0200 Subject: net/mlx5: Allocate command stats with xarray Command stats is an array with more than 2K entries, which amounts to ~180KB. This is way more than actually needed, as only ~190 entries are being used. Therefore, replace the array with xarray. Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 39c5f4087c39..f21703fb75fd 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -322,7 +322,7 @@ struct mlx5_cmd { struct mlx5_cmd_debug dbg; struct cmd_msg_cache cache[MLX5_NUM_COMMAND_CACHES]; int checksum_disabled; - struct mlx5_cmd_stats stats[MLX5_CMD_OP_MAX]; + struct xarray stats; }; struct mlx5_cmd_mailbox { -- cgit v1.2.3 From b1f02b95758d05b799731d939e76a0bd6da312db Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Sat, 22 Jul 2023 00:51:07 +0200 Subject: mm: fix memory ordering for mm_lock_seq and vm_lock_seq mm->mm_lock_seq effectively functions as a read/write lock; therefore it must be used with acquire/release semantics. A specific example is the interaction between userfaultfd_register() and lock_vma_under_rcu(). userfaultfd_register() does the following from the point where it changes a VMA's flags to the point where concurrent readers are permitted again (in a simple scenario where only a single private VMA is accessed and no merging/splitting is involved): userfaultfd_register userfaultfd_set_vm_flags vm_flags_reset vma_start_write down_write(&vma->vm_lock->lock) vma->vm_lock_seq = mm_lock_seq [marks VMA as busy] up_write(&vma->vm_lock->lock) vm_flags_init [sets VM_UFFD_* in __vm_flags] vma->vm_userfaultfd_ctx.ctx = ctx mmap_write_unlock vma_end_write_all WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA] There are no memory barriers in between the __vm_flags update and the mm->mm_lock_seq update that unlocks the VMA, so the unlock can be reordered to above the `vm_flags_init()` call, which means from the perspective of a concurrent reader, a VMA can be marked as a userfaultfd VMA while it is not VMA-locked. That's bad, we definitely need a store-release for the unlock operation. The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly fine because all accesses to vma->vm_lock_seq that matter are always protected by the VMA lock. There is a racy read in vma_start_read() though that can tolerate false-positives, so we should be using WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN). On the other side, lock_vma_under_rcu() works as follows in the relevant region for locking and userfaultfd check: lock_vma_under_rcu vma_start_read vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout] down_read_trylock(&vma->vm_lock->lock) vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check] userfaultfd_armed checks vma->vm_flags & __VM_UFFD_FLAGS Here, the interesting aspect is how far down the mm->mm_lock_seq read can be reordered - if this read is reordered down below the vma->vm_flags access, this could cause lock_vma_under_rcu() to partly operate on information that was read while the VMA was supposed to be locked. To prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we need to read it with a load-acquire. Some of the comment wording is based on suggestions by Suren. BACKPORT WARNING: One of the functions changed by this patch (which I've written against Linus' tree) is vma_try_start_write(), but this function no longer exists in mm/mm-everything. I don't know whether the merged version of this patch will be ordered before or after the patch that removes vma_try_start_write(). If you're backporting this patch to a tree with vma_try_start_write(), make sure this patch changes that function. Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn Reviewed-by: Suren Baghdasaryan Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 29 +++++++++++++++++++++++------ include/linux/mm_types.h | 28 ++++++++++++++++++++++++++++ include/linux/mmap_lock.h | 10 ++++++++-- 3 files changed, 59 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..406ab9ea818f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -641,8 +641,14 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} */ static inline bool vma_start_read(struct vm_area_struct *vma) { - /* Check before locking. A race might cause false locked result. */ - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + /* + * Check before locking. A race might cause false locked result. + * We can use READ_ONCE() for the mm_lock_seq here, and don't need + * ACQUIRE semantics, because this is just a lockless check whose result + * we don't rely on for anything - the mm_lock_seq read against which we + * need ordering is below. + */ + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) @@ -653,8 +659,13 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * False unlocked result is impossible because we modify and check * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. + * + * We must use ACQUIRE semantics for the mm_lock_seq so that if we are + * racing with vma_end_write_all(), we only start reading from the VMA + * after it has been unlocked. + * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) { up_read(&vma->vm_lock->lock); return false; } @@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) * current task is holding mmap_write_lock, both vma->vm_lock_seq and * mm->mm_lock_seq can't be concurrently modified. */ - *mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + *mm_lock_seq = vma->vm_mm->mm_lock_seq; return (vma->vm_lock_seq == *mm_lock_seq); } @@ -688,7 +699,13 @@ static inline void vma_start_write(struct vm_area_struct *vma) return; down_write(&vma->vm_lock->lock); - vma->vm_lock_seq = mm_lock_seq; + /* + * We should use WRITE_ONCE() here because we can have concurrent reads + * from the early lockless pessimistic check in vma_start_read(). + * We don't really care about the correctness of that early check, but + * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. + */ + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); up_write(&vma->vm_lock->lock); } @@ -702,7 +719,7 @@ static inline bool vma_try_start_write(struct vm_area_struct *vma) if (!down_write_trylock(&vma->vm_lock->lock)) return false; - vma->vm_lock_seq = mm_lock_seq; + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); up_write(&vma->vm_lock->lock); return true; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de10fc797c8e..5e74ce4a28cd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -514,6 +514,20 @@ struct vm_area_struct { }; #ifdef CONFIG_PER_VMA_LOCK + /* + * Can only be written (using WRITE_ONCE()) while holding both: + * - mmap_lock (in write mode) + * - vm_lock->lock (in write mode) + * Can be read reliably while holding one of: + * - mmap_lock (in read or write mode) + * - vm_lock->lock (in read or write mode) + * Can be read unreliably (using READ_ONCE()) for pessimistic bailout + * while holding nothing (except RCU to keep the VMA struct allocated). + * + * This sequence counter is explicitly allowed to overflow; sequence + * counter reuse can only lead to occasional unnecessary use of the + * slowpath. + */ int vm_lock_seq; struct vma_lock *vm_lock; @@ -679,6 +693,20 @@ struct mm_struct { * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK + /* + * This field has lock-like semantics, meaning it is sometimes + * accessed with ACQUIRE/RELEASE semantics. + * Roughly speaking, incrementing the sequence number is + * equivalent to releasing locks on VMAs; reading the sequence + * number can be part of taking a read lock on a VMA. + * + * Can be modified under write mmap_lock using RELEASE + * semantics. + * Can be read with no other protection when holding write + * mmap_lock. + * Can be read with ACQUIRE semantics if not holding write + * mmap_lock. + */ int mm_lock_seq; #endif diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..e05e167dbd16 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -76,8 +76,14 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) static inline void vma_end_write_all(struct mm_struct *mm) { mmap_assert_write_locked(mm); - /* No races during update due to exclusive mmap_lock being held */ - WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); + /* + * Nobody can concurrently modify mm->mm_lock_seq due to exclusive + * mmap_lock being held. + * We need RELEASE semantics here to ensure that preceding stores into + * the VMA take effect before we unlock it with this store. + * Pairs with ACQUIRE semantics in vma_start_read(). + */ + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); } #else static inline void vma_end_write_all(struct mm_struct *mm) {} -- cgit v1.2.3 From d0358c1a37db4c46b9a1cd6c1b36e5a24ff970f9 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 26 Jul 2023 22:37:15 +0800 Subject: net: Remove unused declaration dev_restart() This is not used, so can remove it. Signed-off-by: YueHaibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230726143715.24700-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 11652e464f5d..32a4cdf37dd4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3130,8 +3130,6 @@ struct net_device *netdev_get_by_name(struct net *net, const char *name, netdevice_tracker *tracker, gfp_t gfp); struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); struct net_device *dev_get_by_napi_id(unsigned int napi_id); -int dev_restart(struct net_device *dev); - static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, -- cgit v1.2.3 From 1f9a1ea821ff25353a0e80d971e7958cd55b47a3 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jul 2023 18:11:56 -0700 Subject: bpf: Support new sign-extension load insns Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index f69114083ec7..a93242b5516b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -69,6 +69,9 @@ struct ctl_table_header; /* unused opcode to mark special load instruction. Same as BPF_ABS */ #define BPF_PROBE_MEM 0x20 +/* unused opcode to mark special ldsx instruction. Same as BPF_IND */ +#define BPF_PROBE_MEMSX 0x40 + /* unused opcode to mark call to interpreter with arguments */ #define BPF_CALL_ARGS 0xe0 -- cgit v1.2.3 From 7058e3a31ee4b9240cccab5bc13c1afbfa3d16a0 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jul 2023 18:12:25 -0700 Subject: bpf: Fix jit blinding with new sdiv/smov insns Handle new insns properly in bpf_jit_blind_insn() function. Acked-by: Eduard Zingerman Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index a93242b5516b..f5eabe3fa5e8 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -93,22 +93,28 @@ struct ctl_table_header; /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ -#define BPF_ALU64_REG(OP, DST, SRC) \ +#define BPF_ALU64_REG_OFF(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ - .off = 0, \ + .off = OFF, \ .imm = 0 }) -#define BPF_ALU32_REG(OP, DST, SRC) \ +#define BPF_ALU64_REG(OP, DST, SRC) \ + BPF_ALU64_REG_OFF(OP, DST, SRC, 0) + +#define BPF_ALU32_REG_OFF(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ - .off = 0, \ + .off = OFF, \ .imm = 0 }) +#define BPF_ALU32_REG(OP, DST, SRC) \ + BPF_ALU32_REG_OFF(OP, DST, SRC, 0) + /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ #define BPF_ALU64_IMM(OP, DST, IMM) \ -- cgit v1.2.3 From d5d9bca2219d78c652d340079945f0f2071e1219 Mon Sep 17 00:00:00 2001 From: Guru Das Srinagesh Date: Thu, 27 Jul 2023 17:42:49 -0700 Subject: firmware: qcom_scm: Add missing extern specifier Commit 3a99f121fe0b ("firmware: qcom: scm: Introduce pas_metadata context") left out the `extern` specifier for the API it introduced, so add it. Signed-off-by: Guru Das Srinagesh Link: https://lore.kernel.org/r/bce25c8e215f7cfc7b0780d6965d09f5efe1cc5f.1690503893.git.quic_gurus@quicinc.com Signed-off-by: Bjorn Andersson --- include/linux/firmware/qcom/qcom_scm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/firmware/qcom/qcom_scm.h b/include/linux/firmware/qcom/qcom_scm.h index 250ea4efb7cb..0c091a3f6d49 100644 --- a/include/linux/firmware/qcom/qcom_scm.h +++ b/include/linux/firmware/qcom/qcom_scm.h @@ -75,7 +75,7 @@ struct qcom_scm_pas_metadata { extern int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, struct qcom_scm_pas_metadata *ctx); -void qcom_scm_pas_metadata_release(struct qcom_scm_pas_metadata *ctx); +extern void qcom_scm_pas_metadata_release(struct qcom_scm_pas_metadata *ctx); extern int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr, phys_addr_t size); extern int qcom_scm_pas_auth_and_reset(u32 peripheral); -- cgit v1.2.3 From d928d14be6514af9da37009c2091270c5c714366 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Tue, 25 Jul 2023 16:04:25 -0500 Subject: net: stmmac: Make ptp_clk_freq_config variable type explicit The priv variable is _always_ of type (struct stmmac_priv *), so let's stop using (void *) since it isn't abstracting anything. Reviewed-by: Simon Horman Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20230725211853.895832-3-ahalaney@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index ef67dba775d0..3d0702510224 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -76,6 +76,8 @@ | DMA_AXI_BLEN_32 | DMA_AXI_BLEN_64 \ | DMA_AXI_BLEN_128 | DMA_AXI_BLEN_256) +struct stmmac_priv; + /* Platfrom data for platform device structure's platform_data field */ struct stmmac_mdio_bus_data { @@ -258,7 +260,7 @@ struct plat_stmmacenet_data { int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); void (*speed_mode_2500)(struct net_device *ndev, void *priv); - void (*ptp_clk_freq_config)(void *priv); + void (*ptp_clk_freq_config)(struct stmmac_priv *priv); int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); struct mac_device_info *(*setup)(void *priv); -- cgit v1.2.3 From 2e3df4a3b3178006d530f4c4d0e91f3d96cddb3c Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Fri, 2 Jun 2023 09:36:38 +0200 Subject: can: rx-offload: rename rx_offload_get_echo_skb() -> can_rx_offload_get_echo_skb_queue_timestamp() Rename the rx_offload_get_echo_skb() function to can_rx_offload_get_echo_skb_queue_timestamp(), since it inserts the echo skb into the rx-offload queue sorted by timestamp. This is a preparation for adding can_rx_offload_get_echo_skb_queue_tail(), which adds the echo skb to the end of the queue. This is intended for devices that do not support timestamps. Link: https://lore.kernel.org/all/20230718-gs_usb-rx-offload-v2-1-716e542d14d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/rx-offload.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/can/rx-offload.h b/include/linux/can/rx-offload.h index c205c51d79c9..e3b4199732c6 100644 --- a/include/linux/can/rx-offload.h +++ b/include/linux/can/rx-offload.h @@ -44,9 +44,9 @@ int can_rx_offload_irq_offload_timestamp(struct can_rx_offload *offload, int can_rx_offload_irq_offload_fifo(struct can_rx_offload *offload); int can_rx_offload_queue_timestamp(struct can_rx_offload *offload, struct sk_buff *skb, u32 timestamp); -unsigned int can_rx_offload_get_echo_skb(struct can_rx_offload *offload, - unsigned int idx, u32 timestamp, - unsigned int *frame_len_ptr); +unsigned int can_rx_offload_get_echo_skb_queue_timestamp(struct can_rx_offload *offload, + unsigned int idx, u32 timestamp, + unsigned int *frame_len_ptr); int can_rx_offload_queue_tail(struct can_rx_offload *offload, struct sk_buff *skb); void can_rx_offload_irq_finish(struct can_rx_offload *offload); -- cgit v1.2.3 From 8e0e2950c9ef48f7f40a7175048744ec2390b16e Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Mon, 3 Jul 2023 18:18:19 +0200 Subject: can: rx-offload: add can_rx_offload_get_echo_skb_queue_tail() Add can_rx_offload_get_echo_skb_queue_tail(). This function addds the echo skb at the end of rx-offload the queue. This is intended for devices without timestamp support. Link: https://lore.kernel.org/all/20230718-gs_usb-rx-offload-v2-2-716e542d14d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/rx-offload.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/can/rx-offload.h b/include/linux/can/rx-offload.h index e3b4199732c6..d29bb4521947 100644 --- a/include/linux/can/rx-offload.h +++ b/include/linux/can/rx-offload.h @@ -3,7 +3,7 @@ * linux/can/rx-offload.h * * Copyright (c) 2014 David Jander, Protonic Holland - * Copyright (c) 2014-2017 Pengutronix, Marc Kleine-Budde + * Copyright (c) 2014-2017, 2023 Pengutronix, Marc Kleine-Budde */ #ifndef _CAN_RX_OFFLOAD_H @@ -49,6 +49,9 @@ unsigned int can_rx_offload_get_echo_skb_queue_timestamp(struct can_rx_offload * unsigned int *frame_len_ptr); int can_rx_offload_queue_tail(struct can_rx_offload *offload, struct sk_buff *skb); +unsigned int can_rx_offload_get_echo_skb_queue_tail(struct can_rx_offload *offload, + unsigned int idx, + unsigned int *frame_len_ptr); void can_rx_offload_irq_finish(struct can_rx_offload *offload); void can_rx_offload_threaded_irq_finish(struct can_rx_offload *offload); void can_rx_offload_del(struct can_rx_offload *offload); -- cgit v1.2.3 From 3f9169196be55590a794b52f49637561ddd1ba4f Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 5 Jul 2023 16:51:35 +0200 Subject: cpu/SMT: Move SMT prototypes into cpu_smt.h In order to export the cpuhp_smt_control enum as part of the interface between generic and architecture code, the architecture code needs to include asm/topology.h. But that leads to circular header dependencies. So split the enum and related declarations into a separate header. [ ldufour: Reworded the commit's description ] Signed-off-by: Michael Ellerman Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-3-ldufour@linux.ibm.com --- include/linux/cpu.h | 25 +------------------------ include/linux/cpu_smt.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 24 deletions(-) create mode 100644 include/linux/cpu_smt.h (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6e6e57ec69e8..6b326a9e8191 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -18,6 +18,7 @@ #include #include #include +#include struct device; struct device_node; @@ -204,30 +205,6 @@ void cpuhp_report_idle_dead(void); static inline void cpuhp_report_idle_dead(void) { } #endif /* #ifdef CONFIG_HOTPLUG_CPU */ -enum cpuhp_smt_control { - CPU_SMT_ENABLED, - CPU_SMT_DISABLED, - CPU_SMT_FORCE_DISABLED, - CPU_SMT_NOT_SUPPORTED, - CPU_SMT_NOT_IMPLEMENTED, -}; - -#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) -extern enum cpuhp_smt_control cpu_smt_control; -extern void cpu_smt_disable(bool force); -extern void cpu_smt_check_topology(void); -extern bool cpu_smt_possible(void); -extern int cpuhp_smt_enable(void); -extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); -#else -# define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) -static inline void cpu_smt_disable(bool force) { } -static inline void cpu_smt_check_topology(void) { } -static inline bool cpu_smt_possible(void) { return false; } -static inline int cpuhp_smt_enable(void) { return 0; } -static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } -#endif - extern bool cpu_mitigations_off(void); extern bool cpu_mitigations_auto_nosmt(void); diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h new file mode 100644 index 000000000000..722c2e306fef --- /dev/null +++ b/include/linux/cpu_smt.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CPU_SMT_H_ +#define _LINUX_CPU_SMT_H_ + +enum cpuhp_smt_control { + CPU_SMT_ENABLED, + CPU_SMT_DISABLED, + CPU_SMT_FORCE_DISABLED, + CPU_SMT_NOT_SUPPORTED, + CPU_SMT_NOT_IMPLEMENTED, +}; + +#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) +extern enum cpuhp_smt_control cpu_smt_control; +extern void cpu_smt_disable(bool force); +extern void cpu_smt_check_topology(void); +extern bool cpu_smt_possible(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); +#else +# define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) +static inline void cpu_smt_disable(bool force) { } +static inline void cpu_smt_check_topology(void) { } +static inline bool cpu_smt_possible(void) { return false; } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } +#endif + +#endif /* _LINUX_CPU_SMT_H_ */ -- cgit v1.2.3 From 447ae4ac41130a7f127c2581a5e816bb0800b560 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 5 Jul 2023 16:51:37 +0200 Subject: cpu/SMT: Store the current/max number of threads Some architectures allow partial SMT states at boot time, ie. when not all SMT threads are brought online. To support that the SMT code needs to know the maximum number of SMT threads, and also the currently configured number. The architecture code knows the max number of threads, so have the architecture code pass that value to cpu_smt_set_num_threads(). Note that although topology_max_smt_threads() exists, it is not configured early enough to be used here. As architecture, like PowerPC, allows the threads number to be set through the kernel command line, also pass that value. [ ldufour: Slightly reword the commit message ] [ ldufour: Rename cpu_smt_check_topology and add a num_threads argument ] Signed-off-by: Michael Ellerman Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-5-ldufour@linux.ibm.com --- include/linux/cpu_smt.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpu_smt.h b/include/linux/cpu_smt.h index 722c2e306fef..0c1664294b57 100644 --- a/include/linux/cpu_smt.h +++ b/include/linux/cpu_smt.h @@ -12,15 +12,19 @@ enum cpuhp_smt_control { #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) extern enum cpuhp_smt_control cpu_smt_control; +extern unsigned int cpu_smt_num_threads; extern void cpu_smt_disable(bool force); -extern void cpu_smt_check_topology(void); +extern void cpu_smt_set_num_threads(unsigned int num_threads, + unsigned int max_threads); extern bool cpu_smt_possible(void); extern int cpuhp_smt_enable(void); extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) +# define cpu_smt_num_threads 1 static inline void cpu_smt_disable(bool force) { } -static inline void cpu_smt_check_topology(void) { } +static inline void cpu_smt_set_num_threads(unsigned int num_threads, + unsigned int max_threads) { } static inline bool cpu_smt_possible(void) { return false; } static inline int cpuhp_smt_enable(void) { return 0; } static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } -- cgit v1.2.3 From 81e4fc67415607fbd969ff518cc5cbd75e895738 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 3 Jul 2023 21:52:11 +0300 Subject: lib/string_choices: Add str_write_read() helper Add an inversed variant of str_read_write(), i.e. str_write_read(). Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230703185222.50554-2-andriy.shevchenko@linux.intel.com Signed-off-by: Benjamin Tissoires --- include/linux/string_choices.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/string_choices.h b/include/linux/string_choices.h index 48120222b9b2..3c1091941eb8 100644 --- a/include/linux/string_choices.h +++ b/include/linux/string_choices.h @@ -30,6 +30,7 @@ static inline const char *str_read_write(bool v) { return v ? "read" : "write"; } +#define str_write_read(v) str_read_write(!(v)) static inline const char *str_on_off(bool v) { -- cgit v1.2.3 From 70c16123d865046899c36e3655b2d611f38f51a3 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Thu, 27 Jul 2023 23:33:27 -0700 Subject: iommufd: Add iommufd_access_replace() API Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for the VFIO emulated pathway to use. Link: https://lore.kernel.org/r/a3267b924fd5f45e0d3a1dd13a9237e923563862.1690523699.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe Reviewed-by: Kevin Tian Signed-off-by: Nicolin Chen Reviewed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe --- include/linux/iommufd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 0ac60256b659..ffc3a949f837 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -49,6 +49,7 @@ iommufd_access_create(struct iommufd_ctx *ictx, const struct iommufd_access_ops *ops, void *data, u32 *id); void iommufd_access_destroy(struct iommufd_access *access); int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id); +int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id); void iommufd_access_detach(struct iommufd_access *access); void iommufd_ctx_get(struct iommufd_ctx *ictx); -- cgit v1.2.3 From 84e00d9bd4e472bd9b145ed40dbd132dd7a15462 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jul 2023 11:55:30 -0700 Subject: net: convert some netlink netdev iterators to depend on the xarray Reap the benefits of easier iteration thanks to the xarray. Convert just the genetlink ones, those are easier to test. Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 32a4cdf37dd4..84c36a7f873f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3016,6 +3016,9 @@ extern rwlock_t dev_base_lock; /* Device list lock */ if (netdev_master_upper_dev_get_rcu(slave) == (bond)) #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) +#define for_each_netdev_dump(net, d, ifindex) \ + xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex)) + static inline struct net_device *next_net_device(struct net_device *dev) { struct list_head *lh; -- cgit v1.2.3 From 5027d54a9c30bc7ec808360378e2b4753f053f25 Mon Sep 17 00:00:00 2001 From: Patrick Rohr Date: Wed, 26 Jul 2023 16:07:01 -0700 Subject: net: change accept_ra_min_rtr_lft to affect all RA lifetimes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti Signed-off-by: Patrick Rohr Reviewed-by: Maciej Żenczykowski Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski --- include/linux/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 0295b47c10a3..5883551b1ee8 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -33,7 +33,7 @@ struct ipv6_devconf { __s32 accept_ra_defrtr; __u32 ra_defrtr_metric; __s32 accept_ra_min_hop_limit; - __s32 accept_ra_min_rtr_lft; + __s32 accept_ra_min_lft; __s32 accept_ra_pinfo; __s32 ignore_routes_with_linkdown; #ifdef CONFIG_IPV6_ROUTER_PREF -- cgit v1.2.3 From 9abddac583d68e16258d5e0b95dc1b3ca1886173 Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Fri, 21 Jul 2023 14:22:45 -0600 Subject: netfilter: defrag: Add glue hooks for enabling/disabling defrag We want to be able to enable/disable IP packet defrag from core bpf/netfilter code. In other words, execute code from core that could possibly be built as a module. To help avoid symbol resolution errors, use glue hooks that the modules will register callbacks with during module init. Signed-off-by: Daniel Xu Reviewed-by: Florian Westphal Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov --- include/linux/netfilter.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index d4fed4c508ca..d68644b7c299 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -481,6 +482,15 @@ struct nfnl_ct_hook { }; extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook; +struct nf_defrag_hook { + struct module *owner; + int (*enable)(struct net *net); + void (*disable)(struct net *net); +}; + +extern const struct nf_defrag_hook __rcu *nf_defrag_v4_hook; +extern const struct nf_defrag_hook __rcu *nf_defrag_v6_hook; + /* * nf_skb_duplicated - TEE target has sent a packet * -- cgit v1.2.3 From 800959e697de0c55613a2ce40bca52520a421c9f Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Jul 2023 21:48:08 +0800 Subject: ftrace: Remove unused extern declarations commit 6a9c981b1e96 ("ftrace: Remove unused function ftrace_arch_read_dyn_info()") left ftrace_arch_read_dyn_info() extern declaration. And commit 1d74f2a0f64b ("ftrace: remove ftrace_ip_converted()") leave ftrace_ip_converted() declaration. Link: https://lore.kernel.org/linux-trace-kernel/20230725134808.9716-1-yuehaibing@huawei.com Cc: Cc: Signed-off-by: YueHaibing Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index ce156c7704ee..aad9cf8876b5 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -684,7 +684,6 @@ void __init ftrace_set_early_filter(struct ftrace_ops *ops, char *buf, int enable); /* defined in arch */ -extern int ftrace_ip_converted(unsigned long ip); extern int ftrace_dyn_arch_init(void); extern void ftrace_replace_code(int enable); extern int ftrace_update_ftrace_func(ftrace_func_t func); @@ -859,9 +858,6 @@ static inline int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_a } #endif -/* May be defined in arch */ -extern int ftrace_arch_read_dyn_info(char *buf, int size); - extern int skip_trace(unsigned long ip); extern void ftrace_module_init(struct module *mod); extern void ftrace_module_enable(struct module *mod); -- cgit v1.2.3 From 687fe7dfb736b03ab820d172ea5dbfc1ec447135 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Sun, 23 Jul 2023 22:30:18 -0700 Subject: Input: tca6416-keypad - always expect proper IRQ number in i2c client Remove option having i2c client contain raw gpio number instead of proper IRQ number. There are no users of this facility in mainline and it will allow cleaning up the driver code with regard to wakeup handling, etc. Link: https://lore.kernel.org/r/20230724053024.352054-1-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov --- include/linux/tca6416_keypad.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tca6416_keypad.h b/include/linux/tca6416_keypad.h index b0d36a9934cc..5cf6f6f82aa7 100644 --- a/include/linux/tca6416_keypad.h +++ b/include/linux/tca6416_keypad.h @@ -25,7 +25,6 @@ struct tca6416_keys_platform_data { unsigned int rep:1; /* enable input subsystem auto repeat */ uint16_t pinmask; uint16_t invert; - int irq_is_gpio; int use_polling; /* use polling if Interrupt is not connected*/ }; #endif -- cgit v1.2.3 From 3029ad91335353a70feb42acd24d580d70ab258b Mon Sep 17 00:00:00 2001 From: Jiaqing Zhao Date: Mon, 24 Jul 2023 08:39:31 +0000 Subject: can: ems_pci: move ASIX AX99100 ids to pci_ids.h Move PCI Vendor and Device ID of ASIX AX99100 PCIe to Multi I/O Controller to pci_ids.h for its serial and parallel port driver support in subsequent patches. Signed-off-by: Jiaqing Zhao Reviewed-by: Andy Shevchenko Acked-by: Bjorn Helgaas Acked-by: Marc Kleine-Budde Link: https://lore.kernel.org/r/20230724083933.3173513-3-jiaqing.zhao@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/pci_ids.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 2dc75df1437f..16608ce4fd0f 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -1760,6 +1760,10 @@ #define PCI_SUBDEVICE_ID_AT_2700FX 0x2701 #define PCI_SUBDEVICE_ID_AT_2701FX 0x2703 +#define PCI_VENDOR_ID_ASIX 0x125b +#define PCI_DEVICE_ID_ASIX_AX99100 0x9100 +#define PCI_DEVICE_ID_ASIX_AX99100_LB 0x9110 + #define PCI_VENDOR_ID_ESS 0x125d #define PCI_DEVICE_ID_ESS_ESS1968 0x1968 #define PCI_DEVICE_ID_ESS_ESS1978 0x1978 -- cgit v1.2.3 From c1504e5102384762bd06b068e9928b624a8a88c6 Mon Sep 17 00:00:00 2001 From: Ajay Kaher Date: Fri, 28 Jul 2023 23:50:46 +0530 Subject: eventfs: Implement eventfs dir creation functions Add eventfs_file structure which will hold the properties of the eventfs files and directories. Add following functions to create the directories in eventfs: eventfs_create_events_dir() will create the top level "events" directory within the tracefs file system. eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the given name of the subsystem. eventfs_add_dir() creates an eventfs_file descriptor with the given name of the directory and attached to a eventfs_file of a subsystem. Add tracefs_inode structure to hold the inodes, flags and pointers to private data used by eventfs. Link: https://lkml.kernel.org/r/1690568452-46553-5-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher Co-developed-by: Steven Rostedt (VMware) Signed-off-by: Steven Rostedt (VMware) Tested-by: Ching-lin Yu Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com Signed-off-by: Steven Rostedt (Google) --- include/linux/tracefs.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 99912445974c..432e5e6f7901 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -21,6 +21,17 @@ struct file_operations; #ifdef CONFIG_TRACING +struct eventfs_file; + +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); -- cgit v1.2.3 From 88f349b4a83aab6a0fabd4230a1b6796f26bc546 Mon Sep 17 00:00:00 2001 From: Ajay Kaher Date: Fri, 28 Jul 2023 23:50:47 +0530 Subject: eventfs: Implement eventfs file add functions Add the following functions to add files to evenfs: eventfs_add_events_file() to add the data needed to create a specific file located at the top level events directory. The dentry/inode will be created when the events directory is scanned. eventfs_add_file() to add the data needed for files within the directories below the top level events directory. The dentry/inode of the file will be created when the directory that the file is in is scanned. Link: https://lkml.kernel.org/r/1690568452-46553-6-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher Co-developed-by: Steven Rostedt (VMware) Signed-off-by: Steven Rostedt (VMware) Tested-by: Ching-lin Yu Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com Signed-off-by: Steven Rostedt (Google) --- include/linux/tracefs.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 432e5e6f7901..54c9cbd0389b 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, struct eventfs_file *eventfs_add_dir(const char *name, struct eventfs_file *ef_parent); +int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, void *data, + const struct file_operations *fops); + +int eventfs_add_events_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); -- cgit v1.2.3 From 5bdcd5f5331a276a3ddf1fba8605986d0d15298a Mon Sep 17 00:00:00 2001 From: Ajay Kaher Date: Fri, 28 Jul 2023 23:50:50 +0530 Subject: eventfs: Implement removal of meta data from eventfs When events are removed from tracefs, the eventfs must be aware of this. The eventfs_remove() removes the meta data from eventfs so that it will no longer create the files associated with that event. When an instance is removed from tracefs, eventfs_remove_events_dir() will remove and clean up the entire "events" directory. The helper function eventfs_remove_rec() is used to clean up and free the associated data from eventfs for both of the added functions. SRCU is used to protect the lists of meta data stored in the eventfs. The eventfs_mutex is used to protect the content of the items in the list. As lookups may be happening as deletions of events are made, the freeing of dentry/inodes and relative information is done after the SRCU grace period has passed. Link: https://lkml.kernel.org/r/1690568452-46553-9-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher Co-developed-by: Steven Rostedt (VMware) Signed-off-by: Steven Rostedt (VMware) Tested-by: Ching-lin Yu Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) --- include/linux/tracefs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 54c9cbd0389b..009072792fa3 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -40,6 +40,10 @@ int eventfs_add_events_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); +void eventfs_remove(struct eventfs_file *ef); + +void eventfs_remove_events_dir(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); -- cgit v1.2.3 From b8af77951941e943434a76ef40fdc372bf95030e Mon Sep 17 00:00:00 2001 From: "xingtong.wu" Date: Mon, 31 Jul 2023 15:14:24 +0800 Subject: platform/x86/siemens: simatic-ipc: add new models BX-56A/BX-59A This adds support for the Siemens Simatic IPC models BX-56A/BX-59A, led/watchdog/battery on these models are same, actual drivers for models will be sent in separate patches. Signed-off-by: xingtong.wu Link: https://lore.kernel.org/r/20230731071424.4663-2-xingtong_wu@163.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc-base.h | 1 + include/linux/platform_data/x86/simatic-ipc.h | 2 ++ 2 files changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc-base.h b/include/linux/platform_data/x86/simatic-ipc-base.h index 4ca21065c862..2d7f7120ba6b 100644 --- a/include/linux/platform_data/x86/simatic-ipc-base.h +++ b/include/linux/platform_data/x86/simatic-ipc-base.h @@ -22,6 +22,7 @@ #define SIMATIC_IPC_DEVICE_227G 5 #define SIMATIC_IPC_DEVICE_BX_21A 6 #define SIMATIC_IPC_DEVICE_BX_39A 7 +#define SIMATIC_IPC_DEVICE_BX_59A 8 struct simatic_ipc_platform { u8 devmode; diff --git a/include/linux/platform_data/x86/simatic-ipc.h b/include/linux/platform_data/x86/simatic-ipc.h index f2eafa43a605..8d8b3b919674 100644 --- a/include/linux/platform_data/x86/simatic-ipc.h +++ b/include/linux/platform_data/x86/simatic-ipc.h @@ -36,6 +36,8 @@ enum simatic_ipc_station_ids { SIMATIC_IPC_IPCBX_39A = 0x00001001, SIMATIC_IPC_IPCPX_39A = 0x00001002, SIMATIC_IPC_IPCBX_21A = 0x00001101, + SIMATIC_IPC_IPCBX_56A = 0x00001201, + SIMATIC_IPC_IPCBX_59A = 0x00001202, }; static inline u32 simatic_ipc_get_station_id(u8 *data, int max_len) -- cgit v1.2.3 From 59bbe86bb212b618ec2b50434f54bb4cfc704f44 Mon Sep 17 00:00:00 2001 From: Praveen Talari Date: Fri, 14 Jul 2023 09:52:02 +0530 Subject: soc: qcom: geni-se: Add SPI Device mode support for GENI based QuPv3 Add device mode supported registers and masks. Signed-off-by: Praveen Talari Reviewed-by: Vijaya Krishna Nivarthi Link: https://lore.kernel.org/r/20230714042203.14251-2-quic_ptalari@quicinc.com Signed-off-by: Mark Brown --- include/linux/soc/qcom/geni-se.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/geni-se.h b/include/linux/soc/qcom/geni-se.h index 821a19135bb6..29e06905bc1f 100644 --- a/include/linux/soc/qcom/geni-se.h +++ b/include/linux/soc/qcom/geni-se.h @@ -35,6 +35,7 @@ enum geni_se_protocol_type { GENI_SE_UART, GENI_SE_I2C, GENI_SE_I3C, + GENI_SE_SPI_SLAVE, }; struct geni_wrapper; @@ -73,12 +74,14 @@ struct geni_se { /* Common SE registers */ #define GENI_FORCE_DEFAULT_REG 0x20 +#define GENI_OUTPUT_CTRL 0x24 #define SE_GENI_STATUS 0x40 #define GENI_SER_M_CLK_CFG 0x48 #define GENI_SER_S_CLK_CFG 0x4c #define GENI_IF_DISABLE_RO 0x64 #define GENI_FW_REVISION_RO 0x68 #define SE_GENI_CLK_SEL 0x7c +#define SE_GENI_CFG_SEQ_START 0x84 #define SE_GENI_DMA_MODE_EN 0x258 #define SE_GENI_M_CMD0 0x600 #define SE_GENI_M_CMD_CTRL_REG 0x604 @@ -111,6 +114,9 @@ struct geni_se { /* GENI_FORCE_DEFAULT_REG fields */ #define FORCE_DEFAULT BIT(0) +/* GENI_OUTPUT_CTRL fields */ +#define GENI_IO_MUX_0_EN BIT(0) + /* GENI_STATUS fields */ #define M_GENI_CMD_ACTIVE BIT(0) #define S_GENI_CMD_ACTIVE BIT(12) @@ -130,6 +136,9 @@ struct geni_se { /* GENI_CLK_SEL fields */ #define CLK_SEL_MSK GENMASK(2, 0) +/* SE_GENI_CFG_SEQ_START fields */ +#define START_TRIGGER BIT(0) + /* SE_GENI_DMA_MODE_EN */ #define GENI_DMA_MODE_EN BIT(0) -- cgit v1.2.3 From 3d6f126b15d9fd8435455fffc912d976973a7a09 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 27 Jul 2023 14:25:42 +0200 Subject: dma-mapping: move arch_dma_set_mask() declaration to header This function has a __weak definition and an override that is only used on freescale powerpc chips. The powerpc definition however does not see the declaration that is in a .c file: arch/powerpc/kernel/dma-mask.c:7:6: error: no previous prototype for 'arch_dma_set_mask' [-Werror=missing-prototypes] Move it into the linux/dma-map-ops.h header where the other arch_dma_* functions are declared. Signed-off-by: Arnd Bergmann Signed-off-by: Christoph Hellwig --- include/linux/dma-map-ops.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 9bf19b5bf755..bb5e06fd359d 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -343,6 +343,12 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); +#ifdef CONFIG_ARCH_HAS_DMA_SET_MASK +void arch_dma_set_mask(struct device *dev, u64 mask); +#else +#define arch_dma_set_mask(dev, mask) do { } while (0) +#endif + #ifdef CONFIG_MMU /* * Page protection so that devices that can't snoop CPU caches can use the -- cgit v1.2.3 From 22e4a348f87c59df2c02f1efb7ba9a56b622c7b8 Mon Sep 17 00:00:00 2001 From: Yajun Deng Date: Fri, 12 May 2023 17:42:10 +0800 Subject: dma-contiguous: support per-numa CMA for all architectures In the commit b7176c261cdb ("dma-contiguous: provide the ability to reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64. But this feature is architecture independent, so support per-numa CMA for all architectures, and enable it by default if NUMA. Signed-off-by: Yajun Deng Tested-by: Yicong Yang Signed-off-by: Christoph Hellwig --- include/linux/dma-map-ops.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index bb5e06fd359d..f2fc203fb8a1 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -169,12 +169,6 @@ static inline void dma_free_contiguous(struct device *dev, struct page *page, } #endif /* CONFIG_DMA_CMA*/ -#ifdef CONFIG_DMA_PERNUMA_CMA -void dma_pernuma_cma_reserve(void); -#else -static inline void dma_pernuma_cma_reserve(void) { } -#endif /* CONFIG_DMA_PERNUMA_CMA */ - #ifdef CONFIG_DMA_DECLARE_COHERENT int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr, dma_addr_t device_addr, size_t size); -- cgit v1.2.3 From 27152bceea1df27ffebb12ac9cd9adbf2c4c3f35 Mon Sep 17 00:00:00 2001 From: Ajay Kaher Date: Fri, 28 Jul 2023 23:50:51 +0530 Subject: eventfs: Move tracing/events to eventfs Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances//events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources. Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes. Running the following: ~# cat /proc/meminfo /proc/slabinfo > before.out ~# mkdir /sys/kernel/tracing/instances/foo ~# cat /proc/meminfo /proc/slabinfo > after.out to test the changes produces the following deltas: Before this change: Before after deltas for meminfo: MemFree: -32260 MemAvailable: -21496 KReclaimable: 21528 Slab: 22440 SReclaimable: 21528 SUnreclaim: 912 VmallocUsed: 16 Before after deltas for slabinfo: : [ * = ] tracefs_inode_cache: 14472 [* 1184 = 17134848] buffer_head: 24 [* 168 = 4032] hmem_inode_cache: 28 [* 1480 = 41440] dentry: 14450 [* 312 = 4508400] lsm_inode_cache: 14453 [* 32 = 462496] vma_lock: 11 [* 152 = 1672] vm_area_struct: 2 [* 184 = 368] trace_event_file: 1748 [* 88 = 153824] kmalloc-256: 1072 [* 256 = 274432] kmalloc-64: 2842 [* 64 = 181888] Total slab additions in size: 22,763,400 bytes With this change: Before after deltas for meminfo: MemFree: -12600 MemAvailable: -12580 Cached: 24 Active: 12 Inactive: 68 Inactive(anon): 48 Active(file): 12 Inactive(file): 20 Dirty: -4 AnonPages: 68 KReclaimable: 12 Slab: 1856 SReclaimable: 12 SUnreclaim: 1844 KernelStack: 16 PageTables: 36 VmallocUsed: 16 Before after deltas for slabinfo: : [ * = ] tracefs_inode_cache: 108 [* 1184 = 127872] buffer_head: 24 [* 168 = 4032] hmem_inode_cache: 18 [* 1480 = 26640] dentry: 127 [* 312 = 39624] lsm_inode_cache: 152 [* 32 = 4864] vma_lock: 67 [* 152 = 10184] vm_area_struct: -12 [* 184 = -2208] trace_event_file: 1764 [* 96 = 169344] kmalloc-96: 14322 [* 96 = 1374912] kmalloc-64: 2814 [* 64 = 180096] kmalloc-32: 1103 [* 32 = 35296] kmalloc-16: 2308 [* 16 = 36928] kmalloc-8: 12800 [* 8 = 102400] Total slab additions in size: 2,109,984 bytes Which is a savings of 20,653,416 bytes (20 MB) per tracing instance. Link: https://lkml.kernel.org/r/1690568452-46553-10-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher Co-developed-by: Steven Rostedt (VMware) Signed-off-by: Steven Rostedt (VMware) Tested-by: Ching-lin Yu Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 3930e676436c..c17623c78029 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -638,6 +638,7 @@ struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; + struct eventfs_file *ef; struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system; -- cgit v1.2.3 From 079082c60affefeb9d2bd4176a4f2b390a9ccfda Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 28 Jul 2023 23:47:17 +0200 Subject: tcx: Fix splat during dev unregister During unregister_netdevice_many_notify(), the ordering of our concerned function calls is like this: unregister_netdevice_many_notify dev_shutdown qdisc_put clsact_destroy tcx_uninstall The syzbot reproducer triggered a case that the qdisc refcnt is not zero during dev_shutdown(). tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active) because the miniq is still active and the entry should not be freed. The latter assumed that qdisc destruction happens before tcx teardown. This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the miniq is still alive and let the clsact_destroy() do the free later, so that we do not assume any specific ordering for either of them. If still active, tcx_uninstall() does clear the entry when flushing out the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()" and then does the tcx_entry_free() eventually. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Reported-by: Leon Romanovsky Signed-off-by: Martin KaFai Lau Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Tested-by: Leon Romanovsky Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net Signed-off-by: Jakub Kicinski --- include/linux/bpf_mprog.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h index 2b429488f840..929225f7b095 100644 --- a/include/linux/bpf_mprog.h +++ b/include/linux/bpf_mprog.h @@ -256,6 +256,22 @@ static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); } +static inline void bpf_mprog_entry_clear(struct bpf_mprog_entry *dst) +{ + memset(dst->fp_items, 0, sizeof(dst->fp_items)); +} + +static inline void bpf_mprog_clear_all(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new) +{ + struct bpf_mprog_entry *peer; + + peer = bpf_mprog_peer(entry); + bpf_mprog_entry_clear(peer); + peer->parent->count = 0; + *entry_new = peer; +} + static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) { int total = bpf_mprog_total(entry); -- cgit v1.2.3 From e302f8d9f799af57a61a7456451c28f2647e9751 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 16:37:44 -0500 Subject: ASoC: SOF: imx: remove error checks on NULL ipc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit imx_dsp_get_data() can return NULL, but the value is not checked before being usd, leading to static analysis warnings. sound/soc/sof/imx/imx8.c:85:32: error: dereference of NULL ‘0’ [CWE-476] [-Werror=analyzer-null-dereference] 85 | spin_lock_irqsave(&priv->sdev->ipc_lock, flags); | ~~~~^~~~~~ It appears this is not really a possible problem, so remove those checks. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Reviewed-by: Daniel Baluta Reviewed-by: Yaochun Hung Link: https://lore.kernel.org/r/20230731213748.440285-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown --- include/linux/firmware/imx/dsp.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/imx/dsp.h b/include/linux/firmware/imx/dsp.h index 4f7895a3b73c..1f176a2683fe 100644 --- a/include/linux/firmware/imx/dsp.h +++ b/include/linux/firmware/imx/dsp.h @@ -37,17 +37,11 @@ struct imx_dsp_ipc { static inline void imx_dsp_set_data(struct imx_dsp_ipc *ipc, void *data) { - if (!ipc) - return; - ipc->private_data = data; } static inline void *imx_dsp_get_data(struct imx_dsp_ipc *ipc) { - if (!ipc) - return NULL; - return ipc->private_data; } -- cgit v1.2.3 From 8cf5286216dcfb942f0e4d7c23ebe06c2ebc1bed Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 16:37:45 -0500 Subject: ASoC: SOF: mediatek: remove error checks on NULL ipc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit mtk_adsp_ipc_get_data() can return NULL, but the value is not checked before being used, leading to static analysis warnings. sound/soc/sof/mediatek/mt8195/mt8195.c:90:32: error: dereference of NULL ‘0’ [CWE-476] [-Werror=analyzer-null-dereference] 90 | spin_lock_irqsave(&priv->sdev->ipc_lock, flags); | ~~~~^~~~~~ It appears this is not really a possible problem, so remove those checks. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Reviewed-by: Daniel Baluta Reviewed-by: Yaochun Hung Link: https://lore.kernel.org/r/20230731213748.440285-6-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown --- include/linux/firmware/mediatek/mtk-adsp-ipc.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/mediatek/mtk-adsp-ipc.h b/include/linux/firmware/mediatek/mtk-adsp-ipc.h index 28fd313340b8..5b1d16fa3f56 100644 --- a/include/linux/firmware/mediatek/mtk-adsp-ipc.h +++ b/include/linux/firmware/mediatek/mtk-adsp-ipc.h @@ -46,17 +46,11 @@ struct mtk_adsp_ipc { static inline void mtk_adsp_ipc_set_data(struct mtk_adsp_ipc *ipc, void *data) { - if (!ipc) - return; - ipc->private_data = data; } static inline void *mtk_adsp_ipc_get_data(struct mtk_adsp_ipc *ipc) { - if (!ipc) - return NULL; - return ipc->private_data; } -- cgit v1.2.3 From bb29a33c4b4da9c11e021b9a257ae2944ccaff01 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 16:32:38 -0500 Subject: ASoC: soc-acpi: move link_slaves_found() Move existing function in common library to make sure the code can be reused by other SoC vendors. No functionality change outside of the move and added prefix. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Reviewed-by: Bard Liao Link: https://lore.kernel.org/r/20230731213242.434594-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown --- include/linux/soundwire/sdw.h | 5 +++++ include/linux/soundwire/sdw_intel.h | 7 +------ 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index f523ceabd059..f248f9a6cd55 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -482,6 +482,11 @@ struct sdw_slave_id { __u8 sdw_version:4; }; +struct sdw_extended_slave_id { + int link_id; + struct sdw_slave_id id; +}; + /* * Helper macros to extract the MIPI-defined IDs * diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 11fc88fb0d78..fa67fad4ef51 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -264,11 +264,6 @@ struct sdw_intel_link_dev; */ #define SDW_INTEL_CLK_STOP_BUS_RESET BIT(3) -struct sdw_intel_slave_id { - int link_id; - struct sdw_slave_id id; -}; - struct hdac_bus; /** @@ -298,7 +293,7 @@ struct sdw_intel_ctx { int num_slaves; acpi_handle handle; struct sdw_intel_link_dev **ldev; - struct sdw_intel_slave_id *ids; + struct sdw_extended_slave_id *ids; struct list_head link_list; struct mutex shim_lock; /* lock for access to shared SHIM registers */ u32 shim_mask; -- cgit v1.2.3 From 83c35180abfdfb22f3d7703b0c85ad2d442ed2c5 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Tue, 25 Jul 2023 08:42:10 +0300 Subject: serial: core: Controller id cannot be negative The controller id cannot be negative. Let's fix the ctrl_id in preparation for adding port_id to fix the device name. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Andy Shevchenko Reviewed-by: Andy Shevchenko Signed-off-by: Tony Lindgren Link: https://lore.kernel.org/r/20230725054216.45696-2-tony@atomide.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 6d58c57acdaa..201813d888df 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -459,7 +459,7 @@ struct uart_port { struct serial_rs485 *rs485); int (*iso7816_config)(struct uart_port *, struct serial_iso7816 *iso7816); - int ctrl_id; /* optional serial core controller id */ + unsigned int ctrl_id; /* optional serial core controller id */ unsigned int irq; /* irq number */ unsigned long irqflags; /* irq flags */ unsigned int uartclk; /* base uart clock */ -- cgit v1.2.3 From d962de6ae51f9b76ad736220077cda83084090b1 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Tue, 25 Jul 2023 08:42:11 +0300 Subject: serial: core: Fix serial core port id to not use port->line The serial core port id should be serial core controller specific port instance, which is not always the port->line index. For example, 8250 driver maps a number of legacy ports, and when a hardware specific device driver takes over, we typically have one driver instance for each port. Let's instead add port->port_id to keep track serial ports mapped to each serial core controller instance. Currently this is only a cosmetic issue for the serial core port device names. The issue can be noticed looking at /sys/bus/serial-base/devices for example though. Let's fix the issue to avoid port addressing issues later on. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reviewed-by: Andy Shevchenko Signed-off-by: Tony Lindgren Link: https://lore.kernel.org/r/20230725054216.45696-3-tony@atomide.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 201813d888df..a156d2ed8d9e 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -460,6 +460,7 @@ struct uart_port { int (*iso7816_config)(struct uart_port *, struct serial_iso7816 *iso7816); unsigned int ctrl_id; /* optional serial core controller id */ + unsigned int port_id; /* optional serial core port id */ unsigned int irq; /* irq number */ unsigned long irqflags; /* irq flags */ unsigned int uartclk; /* base uart clock */ -- cgit v1.2.3 From 16e95a62eed18864aecac404f1e4eed764c363f2 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Tue, 25 Jul 2023 13:39:12 +0800 Subject: powercap: intel_rapl: Fix a sparse warning in TPMI interface Depends on the interface used, the RAPL registers can be either MSR indexes or memory mapped IO addresses. Current RAPL common code uses u64 to save both MSR and memory mapped IO registers. With this, when handling register address with an __iomem annotation, it triggers a sparse warning like below: sparse warnings: (new ones prefixed by >>) >> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected unsigned long long [usertype] *tpmi_rapl_regs @@ got void [noderef] __iomem * @@ drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: expected unsigned long long [usertype] *tpmi_rapl_regs drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: got void [noderef] __iomem * Fix the problem by using a union to save the registers instead. Suggested-by: David Laight Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/ Tested-by: Wang Wendy Signed-off-by: Zhang Rui [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki --- include/linux/intel_rapl.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index e6936cb25047..33f21bd85dbf 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -100,10 +100,16 @@ struct rapl_package; #define RAPL_DOMAIN_NAME_LENGTH 16 +union rapl_reg { + void __iomem *mmio; + u32 msr; + u64 val; +}; + struct rapl_domain { char name[RAPL_DOMAIN_NAME_LENGTH]; enum rapl_domain_type id; - u64 regs[RAPL_DOMAIN_REG_MAX]; + union rapl_reg regs[RAPL_DOMAIN_REG_MAX]; struct powercap_zone power_zone; struct rapl_domain_data rdd; struct rapl_power_limit rpl[NR_POWER_LIMITS]; @@ -116,7 +122,7 @@ struct rapl_domain { }; struct reg_action { - u64 reg; + union rapl_reg reg; u64 mask; u64 value; int err; @@ -143,8 +149,8 @@ struct rapl_if_priv { enum rapl_if_type type; struct powercap_control_type *control_type; enum cpuhp_state pcap_rapl_online; - u64 reg_unit; - u64 regs[RAPL_DOMAIN_MAX][RAPL_DOMAIN_REG_MAX]; + union rapl_reg reg_unit; + union rapl_reg regs[RAPL_DOMAIN_MAX][RAPL_DOMAIN_REG_MAX]; int limits[RAPL_DOMAIN_MAX]; int (*read_raw)(int id, struct reg_action *ra); int (*write_raw)(int id, struct reg_action *ra); -- cgit v1.2.3 From 05ee774122bd4a2f298668d6d5fc9e7b685a5e31 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:23:57 +0200 Subject: swiotlb: make io_tlb_default_mem local to swiotlb.c SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code. To avoid breaking existing users, provide helper functions for the few required fields. As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core. Signed-off-by: Petr Tesarik Reviewed-by: Greg Kroah-Hartman Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 4e52cd5e0bdc..2d453b3e7771 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -110,7 +110,6 @@ struct io_tlb_mem { atomic_long_t used_hiwater; #endif }; -extern struct io_tlb_mem io_tlb_default_mem; static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { @@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct device *dev) void swiotlb_init(bool addressing_limited, unsigned int flags); void __init swiotlb_exit(void); +void swiotlb_dev_init(struct device *dev); size_t swiotlb_max_mapping_size(struct device *dev); +bool is_swiotlb_allocated(void); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +phys_addr_t default_swiotlb_base(void); +phys_addr_t default_swiotlb_limit(void); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { } + +static inline void swiotlb_dev_init(struct device *dev) +{ +} + static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; @@ -151,6 +159,11 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev) return SIZE_MAX; } +static inline bool is_swiotlb_allocated(void) +{ + return false; +} + static inline bool is_swiotlb_active(struct device *dev) { return false; @@ -159,6 +172,16 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +static inline phys_addr_t default_swiotlb_base(void) +{ + return 0; +} + +static inline phys_addr_t default_swiotlb_limit(void) +{ + return 0; +} #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); -- cgit v1.2.3 From fea18777a78e845c6031128b40aaa2cf2cb46874 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:23:58 +0200 Subject: swiotlb: add documentation and rename swiotlb_do_find_slots() Add some kernel-doc comments and move the existing documentation of struct io_tlb_slot to its correct location. The latter was forgotten in commit 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c"). Use the opportunity to give swiotlb_do_find_slots() a more descriptive name and make it clear how it differs from swiotlb_find_slots(). Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 2d453b3e7771..31625ae507ea 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. - * @list: The free list describing the number of free entries available - * from each index. - * @orig_addr: The original address corresponding to a mapped entry. - * @alloc_size: Size of the allocated buffer. * @debugfs: The dentry to debugfs. * @late_alloc: %true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced @@ -111,6 +107,17 @@ struct io_tlb_mem { #endif }; +/** + * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb + * @dev: Device which has mapped the buffer. + * @paddr: Physical address within the DMA buffer. + * + * Check if @paddr points into a bounce buffer. + * + * Return: + * * %true if @paddr points into a bounce buffer + * * %false otherwise + */ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; -- cgit v1.2.3 From 158dbe9c9a3d36da824139e5304169326550c6c9 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:23:59 +0200 Subject: swiotlb: separate memory pool data from other allocator data Carve out memory pool specific fields from struct io_tlb_mem. The original struct now contains shared data for the whole allocator, while the new struct io_tlb_pool contains data that is specific to one memory pool of (potentially) many. Signed-off-by: Petr Tesarik Reviewed-by: Greg Kroah-Hartman Signed-off-by: Christoph Hellwig --- include/linux/device.h | 2 +- include/linux/swiotlb.h | 45 ++++++++++++++++++++++++++++----------------- 2 files changed, 29 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index bbaeabd04b0d..d9754a68ba95 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -625,7 +625,7 @@ struct device_physical_location { * @dma_pools: Dma pools (if dma'ble device). * @dma_mem: Internal for coherent mem override. * @cma_area: Contiguous memory area for dma allocations - * @dma_io_tlb_mem: Pointer to the swiotlb pool used. Not for driver use. + * @dma_io_tlb_mem: Software IO TLB allocator. Not for driver use. * @archdata: For arch-specific additions. * @of_node: Associated device tree node. * @fwnode: Associated device node supplied by platform firmware. diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 31625ae507ea..247f0ab8795a 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB /** - * struct io_tlb_mem - IO TLB Memory Pool Descriptor - * + * struct io_tlb_pool - IO TLB memory pool descriptor * @start: The start address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. @@ -73,15 +72,34 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool * may be remapped in the memory encrypted case and store virtual * address for bounce buffer operation. - * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and - * @end. For default swiotlb, this is command line adjustable via - * setup_io_tlb_npages. + * @nslabs: The number of IO TLB slots between @start and @end. For the + * default swiotlb, this can be adjusted with a boot parameter, + * see setup_io_tlb_npages(). + * @late_alloc: %true if allocated using the page allocator. + * @nareas: Number of areas in the pool. + * @area_nslabs: Number of slots in each area. + * @areas: Array of memory area descriptors. + * @slots: Array of slot descriptors. + */ +struct io_tlb_pool { + phys_addr_t start; + phys_addr_t end; + void *vaddr; + unsigned long nslabs; + bool late_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; + struct io_tlb_slot *slots; +}; + +/** + * struct io_tlb_mem - Software IO TLB allocator + * @defpool: Default (initial) IO TLB memory pool descriptor. + * @nslabs: Total number of IO TLB slabs in all pools. * @debugfs: The dentry to debugfs. - * @late_alloc: %true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation - * @nareas: The area number in the pool. - * @area_nslabs: The slot number in the area. * @total_used: The total number of slots in the pool that are currently used * across all areas. Used only for calculating used_hiwater in * debugfs. @@ -89,18 +107,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * in debugfs. */ struct io_tlb_mem { - phys_addr_t start; - phys_addr_t end; - void *vaddr; + struct io_tlb_pool defpool; unsigned long nslabs; struct dentry *debugfs; - bool late_alloc; bool force_bounce; bool for_alloc; - unsigned int nareas; - unsigned int area_nslabs; - struct io_tlb_area *areas; - struct io_tlb_slot *slots; #ifdef CONFIG_DEBUG_FS atomic_long_t total_used; atomic_long_t used_hiwater; @@ -122,7 +133,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; - return mem && paddr >= mem->start && paddr < mem->end; + return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end; } static inline bool is_swiotlb_force_bounce(struct device *dev) -- cgit v1.2.3 From 62708b2ba4055cad43a95754e939566b56dde5b6 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:24:00 +0200 Subject: swiotlb: add a flag whether SWIOTLB is allowed to grow Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic allocation of additional bounce buffers. If this option is set, mark the default SWIOTLB as able to grow and restricted DMA pools as unable. However, if the address of the default memory pool is explicitly queried, make the default SWIOTLB also unable to grow. This is currently used to set up PCI BAR movable regions on some Octeon MIPS boards which may not be able to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup() for more details. If a remap function is specified, it must be also called on any dynamically allocated pools, but there are some issues: - The remap function may block, so it should not be called from an atomic context. - There is no corresponding unremap() function if the memory pool is freed. - The only in-tree implementation (xen_swiotlb_fixup) requires that the number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE. Keep it simple for now and disable growing the SWIOTLB if a remap function was specified. Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 247f0ab8795a..57be2a0a9fbf 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -100,6 +100,7 @@ struct io_tlb_pool { * @debugfs: The dentry to debugfs. * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @can_grow: %true if more pools can be allocated dynamically. * @total_used: The total number of slots in the pool that are currently used * across all areas. Used only for calculating used_hiwater in * debugfs. @@ -112,6 +113,9 @@ struct io_tlb_mem { struct dentry *debugfs; bool force_bounce; bool for_alloc; +#ifdef CONFIG_SWIOTLB_DYNAMIC + bool can_grow; +#endif #ifdef CONFIG_DEBUG_FS atomic_long_t total_used; atomic_long_t used_hiwater; -- cgit v1.2.3 From 79636caad3618e2b38457f6e298c9b31ba82b489 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:24:01 +0200 Subject: swiotlb: if swiotlb is full, fall back to a transient memory pool Try to allocate a transient memory pool if no suitable slots can be found and the respective SWIOTLB is allowed to grow. The transient pool is just enough big for this one bounce buffer. It is inserted into a per-device list of transient memory pools, and it is freed again when the bounce buffer is unmapped. Transient memory pools are kept in an RCU list. A memory barrier is required after adding a new entry, because any address within a transient buffer must be immediately recognized as belonging to the SWIOTLB, even if it is passed to another CPU. Deletion does not require any synchronization beyond RCU ordering guarantees. After a buffer is unmapped, its physical addresses may no longer be passed to the DMA API, so the memory range of the corresponding stale entry in the RCU list never matches. If the memory range gets allocated again, then it happens only after a RCU quiescent state. Since bounce buffers can now be allocated from different pools, add a parameter to swiotlb_alloc_pool() to let the caller know which memory pool is used. Add swiotlb_find_pool() to find the memory pool corresponding to an address. This function is now also used by is_swiotlb_buffer(), because a simple boundary check is no longer sufficient. The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(), simplified and enhanced to use coherent memory pools if needed. Note that this is not the most efficient way to provide a bounce buffer, but when a DMA buffer can't be mapped, something may (and will) actually break. At that point it is better to make an allocation, even if it may be an expensive operation. Signed-off-by: Petr Tesarik Reviewed-by: Greg Kroah-Hartman Signed-off-by: Christoph Hellwig --- include/linux/device.h | 6 ++++++ include/linux/dma-mapping.h | 2 ++ include/linux/swiotlb.h | 29 ++++++++++++++++++++++++++++- 3 files changed, 36 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index d9754a68ba95..5fd89c9d005c 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -626,6 +626,8 @@ struct device_physical_location { * @dma_mem: Internal for coherent mem override. * @cma_area: Contiguous memory area for dma allocations * @dma_io_tlb_mem: Software IO TLB allocator. Not for driver use. + * @dma_io_tlb_pools: List of transient swiotlb memory pools. + * @dma_io_tlb_lock: Protects changes to the list of active pools. * @archdata: For arch-specific additions. * @of_node: Associated device tree node. * @fwnode: Associated device node supplied by platform firmware. @@ -731,6 +733,10 @@ struct device { #endif #ifdef CONFIG_SWIOTLB struct io_tlb_mem *dma_io_tlb_mem; +#endif +#ifdef CONFIG_SWIOTLB_DYNAMIC + struct list_head dma_io_tlb_pools; + spinlock_t dma_io_tlb_lock; #endif /* arch specific additions */ struct dev_archdata archdata; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index e13050eb9777..f0ccca16a0ac 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct device *dev, #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0) #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0) +bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size); + static inline void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp) { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 57be2a0a9fbf..66867d2188ba 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -80,6 +80,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * @area_nslabs: Number of slots in each area. * @areas: Array of memory area descriptors. * @slots: Array of slot descriptors. + * @node: Member of the IO TLB memory pool list. + * @rcu: RCU head for swiotlb_dyn_free(). + * @transient: %true if transient memory pool. */ struct io_tlb_pool { phys_addr_t start; @@ -91,6 +94,11 @@ struct io_tlb_pool { unsigned int area_nslabs; struct io_tlb_area *areas; struct io_tlb_slot *slots; +#ifdef CONFIG_SWIOTLB_DYNAMIC + struct list_head node; + struct rcu_head rcu; + bool transient; +#endif }; /** @@ -122,6 +130,20 @@ struct io_tlb_mem { #endif }; +#ifdef CONFIG_SWIOTLB_DYNAMIC + +struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr); + +#else + +static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev, + phys_addr_t paddr) +{ + return &dev->dma_io_tlb_mem->defpool; +} + +#endif + /** * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb * @dev: Device which has mapped the buffer. @@ -137,7 +159,12 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; - return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end; + if (!mem) + return false; + + if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) + return swiotlb_find_pool(dev, paddr); + return paddr >= mem->defpool.start && paddr < mem->defpool.end; } static inline bool is_swiotlb_force_bounce(struct device *dev) -- cgit v1.2.3 From ad96ce3252dbab773cb343220662df3d84dd8e80 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:24:02 +0200 Subject: swiotlb: determine potential physical address limit The value returned by default_swiotlb_limit() should be constant, because it is used to decide whether DMA can be used. To allow allocating memory pools on the fly, use the maximum possible physical address rather than the highest address used by the default pool. For swiotlb_init_remap(), this is either an arch-specific limit used by memblock_alloc_low(), or the highest directly mapped physical address if the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the highest address is determined by the GFP flags. Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 66867d2188ba..9825fa14abe4 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -109,6 +109,7 @@ struct io_tlb_pool { * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation * @can_grow: %true if more pools can be allocated dynamically. + * @phys_limit: Maximum allowed physical address. * @total_used: The total number of slots in the pool that are currently used * across all areas. Used only for calculating used_hiwater in * debugfs. @@ -123,6 +124,7 @@ struct io_tlb_mem { bool for_alloc; #ifdef CONFIG_SWIOTLB_DYNAMIC bool can_grow; + u64 phys_limit; #endif #ifdef CONFIG_DEBUG_FS atomic_long_t total_used; -- cgit v1.2.3 From 1aaa736815eb04f4dae3f0b3e977b2a0677a4cfb Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:24:03 +0200 Subject: swiotlb: allocate a new memory pool when existing pools are full When swiotlb_find_slots() cannot find suitable slots, schedule the allocation of a new memory pool. It is not possible to allocate the pool immediately, because this code may run in interrupt context, which is not suitable for large memory allocations. This means that the memory pool will be available too late for the currently requested mapping, but the stress on the software IO TLB allocator is likely to continue, and subsequent allocations will benefit from the additional pool eventually. Keep all memory pools for an allocator in an RCU list to avoid locking on the read side. For modifications, add a new spinlock to struct io_tlb_mem. The spinlock also protects updates to the total number of slabs (nslabs in struct io_tlb_mem), but not reads of the value. Readers may therefore encounter a stale value, but this is not an issue: - swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero value. This is ensured by the existence of the default memory pool, allocated at boot. - The exact value is used only for non-critical purposes (debugfs, kernel messages). Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 9825fa14abe4..8371c92a0271 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -8,6 +8,7 @@ #include #include #include +#include struct device; struct page; @@ -104,12 +105,16 @@ struct io_tlb_pool { /** * struct io_tlb_mem - Software IO TLB allocator * @defpool: Default (initial) IO TLB memory pool descriptor. + * @pool: IO TLB memory pool descriptor (if not dynamic). * @nslabs: Total number of IO TLB slabs in all pools. * @debugfs: The dentry to debugfs. * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation * @can_grow: %true if more pools can be allocated dynamically. * @phys_limit: Maximum allowed physical address. + * @lock: Lock to synchronize changes to the list. + * @pools: List of IO TLB memory pool descriptors (if dynamic). + * @dyn_alloc: Dynamic IO TLB pool allocation work. * @total_used: The total number of slots in the pool that are currently used * across all areas. Used only for calculating used_hiwater in * debugfs. @@ -125,6 +130,9 @@ struct io_tlb_mem { #ifdef CONFIG_SWIOTLB_DYNAMIC bool can_grow; u64 phys_limit; + spinlock_t lock; + struct list_head pools; + struct work_struct dyn_alloc; #endif #ifdef CONFIG_DEBUG_FS atomic_long_t total_used; -- cgit v1.2.3 From 1395706a14904f2593debecf20f827e72d7392a7 Mon Sep 17 00:00:00 2001 From: Petr Tesarik Date: Tue, 1 Aug 2023 08:24:04 +0200 Subject: swiotlb: search the software IO TLB only if the device makes use of it Skip searching the software IO TLB if a device has never used it, making sure these devices are not affected by the introduction of multiple IO TLB memory pools. Additional memory barrier is required to ensure that the new value of the flag is visible to other CPUs after mapping a new bounce buffer. For efficiency, the flag check should be inlined, and then the memory barrier must be moved to is_swiotlb_buffer(). However, it can replace the existing barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer() first to verify that the buffer address belongs to the software IO TLB. Signed-off-by: Petr Tesarik Signed-off-by: Christoph Hellwig --- include/linux/device.h | 2 ++ include/linux/swiotlb.h | 7 ++++++- 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index 5fd89c9d005c..6fc808d22bfd 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -628,6 +628,7 @@ struct device_physical_location { * @dma_io_tlb_mem: Software IO TLB allocator. Not for driver use. * @dma_io_tlb_pools: List of transient swiotlb memory pools. * @dma_io_tlb_lock: Protects changes to the list of active pools. + * @dma_uses_io_tlb: %true if device has used the software IO TLB. * @archdata: For arch-specific additions. * @of_node: Associated device tree node. * @fwnode: Associated device node supplied by platform firmware. @@ -737,6 +738,7 @@ struct device { #ifdef CONFIG_SWIOTLB_DYNAMIC struct list_head dma_io_tlb_pools; spinlock_t dma_io_tlb_lock; + bool dma_uses_io_tlb; #endif /* arch specific additions */ struct dev_archdata archdata; diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 8371c92a0271..b4536626f8ff 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) if (!mem) return false; - if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) + if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) { + /* Pairs with smp_wmb() in swiotlb_find_slots() and + * swiotlb_dyn_alloc(), which modify the RCU lists. + */ + smp_rmb(); return swiotlb_find_pool(dev, paddr); + } return paddr >= mem->defpool.start && paddr < mem->defpool.end; } -- cgit v1.2.3 From a488bc16225efa7e3d25977ac1472c3d4cf0e958 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Jul 2023 21:55:28 +0800 Subject: fanotify: Remove unused extern declaration fsnotify_get_conn_fsid() This is never used, so can remove it. Signed-off-by: YueHaibing Signed-off-by: Jan Kara Message-Id: <20230725135528.25996-1-yuehaibing@huawei.com> --- include/linux/fsnotify_backend.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d7d96c806bff..c0892d75ce33 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -760,9 +760,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, /* Find mark belonging to given group in the list of marks */ extern struct fsnotify_mark *fsnotify_find_mark(fsnotify_connp_t *connp, struct fsnotify_group *group); -/* Get cached fsid of filesystem containing object */ -extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, - __kernel_fsid_t *fsid); /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, -- cgit v1.2.3 From 9cf3516c29e6dba95d36d7d86c03b06495107859 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Sat, 8 Jul 2023 09:55:27 -0600 Subject: fs: add IOCB flags related to passing back dio completions Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_CALLER_COMP, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Dave Chinner Signed-off-by: Jens Axboe --- include/linux/fs.h | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..1e6dbe309d52 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,20 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the bio completion handling may set iocb->dio_complete to a + * handler function and iocb->private to context information for that handler. + * The issuer should call the handler with that context information from task + * context to complete the processing of the iocb. Note that while this + * provides a task context for the dio_complete() callback, it should only be + * used on the completion side for non-IO generating completions. It's fine to + * call blocking functions from this callback, but they should not wait for + * unrelated IO (like cache flushing, new IO generation, etc). + */ +#define IOCB_DIO_CALLER_COMP (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +365,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_CALLER_COMP, "CALLER_COMP" } struct kiocb { struct file *ki_filp; @@ -360,7 +375,23 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_CALLER_COMP is set by the issuer, and the issuer + * must then check for presence of this handler when ki_complete + * is invoked. The data passed in to this handler must be + * assigned to ->private when dio_complete is assigned. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) -- cgit v1.2.3 From 63b93099359eca37c11e4d5db5ea2c3a375f6026 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Sat, 29 Jul 2023 23:17:46 +0300 Subject: ata: libata: fix parameter type of ata_deadline() ata_deadline() passes its 'unsigned long timeout_msecs' parameter verbatim to msecs_to_jiffies() which takes just 'unsigned int' -- eliminate unneeded implicit cast... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 820f7a3a2749..d9edb3d62a42 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1876,7 +1876,7 @@ static inline int ata_check_ready(u8 status) } static inline unsigned long ata_deadline(unsigned long from_jiffies, - unsigned long timeout_msecs) + unsigned int timeout_msecs) { return from_jiffies + msecs_to_jiffies(timeout_msecs); } -- cgit v1.2.3 From 84abed36d7de7145d393f9d5d05b36717e0cb49d Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Sat, 29 Jul 2023 23:17:47 +0300 Subject: ata: libata-core: fix parameter types of ata_wait_register() ata_wait_register() passes its 'unsigned long {interval|timeout}' params verbatim to ata_{msleep|deadline}() that just take 'unsigned int' param for the time intervals in ms -- eliminate unneeded implicit casts... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index d9edb3d62a42..4772f64af734 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1116,7 +1116,7 @@ static inline void ata_sas_port_resume(struct ata_port *ap) extern int ata_ratelimit(void); extern void ata_msleep(struct ata_port *ap, unsigned int msecs); extern u32 ata_wait_register(struct ata_port *ap, void __iomem *reg, u32 mask, - u32 val, unsigned long interval, unsigned long timeout); + u32 val, unsigned int interval, unsigned int timeout); extern int atapi_cmd_type(u8 opcode); extern unsigned int ata_pack_xfermask(unsigned int pio_mask, unsigned int mwdma_mask, -- cgit v1.2.3 From d14d41cc5aaef138face9d5a145b460e2b63697a Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Sat, 29 Jul 2023 23:17:49 +0300 Subject: ata: fix debounce timings type sata_deb_timing_{hotplug|long|normal}[] store 'unsigned long' debounce timeouts in ms, while sata_link_debounce() eventually uses those timeouts by calling ata_{deadline|msleep}( which take just 'unsigned int'. Change the debounce timeout table element's type to 'unsigned int' -- all these timeouts happily fit into 'unsigned int'... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 4772f64af734..8d510fb00591 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1166,11 +1166,11 @@ extern void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port * * SATA specific code - drivers/ata/libata-sata.c */ #ifdef CONFIG_SATA_HOST -extern const unsigned long sata_deb_timing_normal[]; -extern const unsigned long sata_deb_timing_hotplug[]; -extern const unsigned long sata_deb_timing_long[]; +extern const unsigned int sata_deb_timing_normal[]; +extern const unsigned int sata_deb_timing_hotplug[]; +extern const unsigned int sata_deb_timing_long[]; -static inline const unsigned long * +static inline const unsigned int * sata_ehc_deb_timing(struct ata_eh_context *ehc) { if (ehc->i.flags & ATA_EHI_HOTPLUGGED) @@ -1185,14 +1185,14 @@ extern int sata_scr_write(struct ata_link *link, int reg, u32 val); extern int sata_scr_write_flush(struct ata_link *link, int reg, u32 val); extern int sata_set_spd(struct ata_link *link); extern int sata_link_hardreset(struct ata_link *link, - const unsigned long *timing, unsigned long deadline, + const unsigned int *timing, unsigned long deadline, bool *online, int (*check_ready)(struct ata_link *)); -extern int sata_link_resume(struct ata_link *link, const unsigned long *params, +extern int sata_link_resume(struct ata_link *link, const unsigned int *params, unsigned long deadline); extern int ata_eh_read_sense_success_ncq_log(struct ata_link *link); extern void ata_eh_analyze_ncq_error(struct ata_link *link); #else -static inline const unsigned long * +static inline const unsigned int * sata_ehc_deb_timing(struct ata_eh_context *ehc) { return NULL; @@ -1212,7 +1212,7 @@ static inline int sata_scr_write_flush(struct ata_link *link, int reg, u32 val) } static inline int sata_set_spd(struct ata_link *link) { return -EOPNOTSUPP; } static inline int sata_link_hardreset(struct ata_link *link, - const unsigned long *timing, + const unsigned int *timing, unsigned long deadline, bool *online, int (*check_ready)(struct ata_link *)) @@ -1222,7 +1222,7 @@ static inline int sata_link_hardreset(struct ata_link *link, return -EOPNOTSUPP; } static inline int sata_link_resume(struct ata_link *link, - const unsigned long *params, + const unsigned int *params, unsigned long deadline) { return -EOPNOTSUPP; @@ -1234,7 +1234,7 @@ static inline int ata_eh_read_sense_success_ncq_log(struct ata_link *link) static inline void ata_eh_analyze_ncq_error(struct ata_link *link) { } #endif extern int sata_link_debounce(struct ata_link *link, - const unsigned long *params, unsigned long deadline); + const unsigned int *params, unsigned long deadline); extern int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, bool spm_wakeup); extern int ata_slave_link_init(struct ata_port *ap); -- cgit v1.2.3 From ff8072d589dcff7c1f0345a6ec98b5fc1e9ee2a1 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 31 Jul 2023 16:34:12 +0200 Subject: ata: libata: remove references to non-existing error_handler() With commit 65a15d6560df ("scsi: ipr: Remove SATA support") all libata drivers now have the error_handler() callback provided, so we can stop checking for non-existing error_handler callback. Signed-off-by: Hannes Reinecke [niklas: fixed review comments, rebased, solved conflicts during rebase, fixed bug that unconditionally dumped all QCs, removed the now unused function ata_dump_status(), removed the now unreachable failure paths in atapi_qc_complete(), removed the non-EH function to request ATAPI sense] Signed-off-by: Niklas Cassel Reviewed-by: John Garry Reviewed-by: Jason Yan Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 8d510fb00591..d61e5465076e 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1785,7 +1785,7 @@ static inline struct ata_queued_cmd *ata_qc_from_tag(struct ata_port *ap, { struct ata_queued_cmd *qc = __ata_qc_from_tag(ap, tag); - if (unlikely(!qc) || !ap->ops->error_handler) + if (unlikely(!qc)) return qc; if ((qc->flags & (ATA_QCFLAG_ACTIVE | -- cgit v1.2.3 From 43aa43351bb551a7732c1b9f6e8ebb9a6f30b063 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 31 Jul 2023 16:34:13 +0200 Subject: ata,scsi: remove ata_sas_port_{start,stop} callbacks Callbacks are empty now, so remove them. Also, remove the call to ap->ops->port_start() in ata_sas_port_init(), as this would otherwise cause a NULL pointer dereference, now when the callback is gone. Signed-off-by: Hannes Reinecke [niklas: remove the call to ap->ops->port_start() in ata_sas_port_init()] Signed-off-by: Niklas Cassel Reviewed-by: Jason Yan Reviewed-by: John Garry Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index d61e5465076e..f8f8b558a8be 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1244,10 +1244,8 @@ extern struct ata_port *ata_sas_port_alloc(struct ata_host *, extern void ata_sas_async_probe(struct ata_port *ap); extern int ata_sas_sync_probe(struct ata_port *ap); extern int ata_sas_port_init(struct ata_port *); -extern int ata_sas_port_start(struct ata_port *ap); extern int ata_sas_tport_add(struct device *parent, struct ata_port *ap); extern void ata_sas_tport_delete(struct ata_port *ap); -extern void ata_sas_port_stop(struct ata_port *ap); extern int ata_sas_slave_configure(struct scsi_device *, struct ata_port *); extern int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap); extern void ata_tf_to_fis(const struct ata_taskfile *tf, -- cgit v1.2.3 From 6c2fe21e08c28bbdb53ada668edea7df1aa9fb1e Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 31 Jul 2023 16:34:14 +0200 Subject: ata,scsi: remove ata_sas_port_destroy() Is now a wrapper around kfree(), so call it directly. Signed-off-by: Hannes Reinecke Signed-off-by: Niklas Cassel Reviewed-by: John Garry Reviewed-by: Jason Yan Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index f8f8b558a8be..1d1fd16a1492 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1238,7 +1238,6 @@ extern int sata_link_debounce(struct ata_link *link, extern int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, bool spm_wakeup); extern int ata_slave_link_init(struct ata_port *ap); -extern void ata_sas_port_destroy(struct ata_port *); extern struct ata_port *ata_sas_port_alloc(struct ata_host *, struct ata_port_info *, struct Scsi_Host *); extern void ata_sas_async_probe(struct ata_port *ap); -- cgit v1.2.3 From 8ac161ea2b3709b789c192de7da31a58aec9d7ee Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 31 Jul 2023 16:34:15 +0200 Subject: ata: libata-sata: remove ata_sas_sync_probe() Unused. Signed-off-by: Hannes Reinecke Signed-off-by: Niklas Cassel Reviewed-by: Jason Yan Reviewed-by: John Garry Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 1d1fd16a1492..d08dde8c8ad2 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1241,7 +1241,6 @@ extern int ata_slave_link_init(struct ata_port *ap); extern struct ata_port *ata_sas_port_alloc(struct ata_host *, struct ata_port_info *, struct Scsi_Host *); extern void ata_sas_async_probe(struct ata_port *ap); -extern int ata_sas_sync_probe(struct ata_port *ap); extern int ata_sas_port_init(struct ata_port *); extern int ata_sas_tport_add(struct device *parent, struct ata_port *ap); extern void ata_sas_tport_delete(struct ata_port *ap); -- cgit v1.2.3 From a76f1b637ce92c2b8d5da912826079010b11f138 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 31 Jul 2023 16:34:17 +0200 Subject: ata,scsi: cleanup __ata_port_probe() Rename __ata_port_probe() to ata_port_probe() and drop the wrapper ata_sas_async_probe(). Signed-off-by: Hannes Reinecke Signed-off-by: Niklas Cassel Reviewed-by: Jason Yan Reviewed-by: John Garry Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index d08dde8c8ad2..7468a330fc77 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1240,7 +1240,7 @@ extern int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, extern int ata_slave_link_init(struct ata_port *ap); extern struct ata_port *ata_sas_port_alloc(struct ata_host *, struct ata_port_info *, struct Scsi_Host *); -extern void ata_sas_async_probe(struct ata_port *ap); +extern void ata_port_probe(struct ata_port *ap); extern int ata_sas_port_init(struct ata_port *); extern int ata_sas_tport_add(struct device *parent, struct ata_port *ap); extern void ata_sas_tport_delete(struct ata_port *ap); -- cgit v1.2.3 From 541528170a5cb1342c378dfd46b04dfe024dbc7a Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Mon, 31 Jul 2023 16:34:18 +0200 Subject: ata,scsi: remove ata_sas_port_init() ata_sas_port_init() now only contains a single initialization. Move this single initialization to ata_sas_port_alloc(), since: 1) ata_sas_port_alloc() already initializes some of the struct members. 2) ata_sas_port_alloc() is only used by libsas. Suggested-by: John Garry Signed-off-by: Niklas Cassel Reviewed-by: John Garry Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 7468a330fc77..0980992c54c2 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1241,7 +1241,6 @@ extern int ata_slave_link_init(struct ata_port *ap); extern struct ata_port *ata_sas_port_alloc(struct ata_host *, struct ata_port_info *, struct Scsi_Host *); extern void ata_port_probe(struct ata_port *ap); -extern int ata_sas_port_init(struct ata_port *); extern int ata_sas_tport_add(struct device *parent, struct ata_port *ap); extern void ata_sas_tport_delete(struct ata_port *ap); extern int ata_sas_slave_configure(struct scsi_device *, struct ata_port *); -- cgit v1.2.3 From 89329c7384ef56c407269157e30e781f55c3c4d2 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Mon, 31 Jul 2023 16:34:20 +0200 Subject: ata: libata-core: remove ata_bus_probe() Remove ata_bus_probe() as it is unused. Also, remove references to ata_bus_probe and port_disable in Documentation/driver-api/libata.rst, as neither exist anymore. Signed-off-by: Niklas Cassel Reviewed-by: John Garry Reviewed-by: Jason Yan Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 0980992c54c2..e176d832467d 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -344,7 +344,6 @@ enum { ATA_LINK_RESUME_TRIES = 5, /* how hard are we gonna try to probe/recover devices */ - ATA_PROBE_MAX_TRIES = 3, ATA_EH_DEV_TRIES = 3, ATA_EH_PMP_TRIES = 5, ATA_EH_PMP_LINK_TRIES = 3, -- cgit v1.2.3 From 6b4f165e0858016f7bb5f360966882a819519f07 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Mon, 31 Jul 2023 16:34:21 +0200 Subject: ata: libata: remove deprecated EH callbacks Now that all libata drivers have migrated to use the error_handler callback, remove the deprecated phy_reset and eng_timeout callbacks. Also remove references to non-existent functions sata_phy_reset and ata_qc_timeout from Documentation/driver-api/libata.rst. Signed-off-by: Niklas Cassel Reviewed-by: John Garry Reviewed-by: Sergey Shtylyov Reviewed-by: Jason Yan Reviewed-by: Martin K. Petersen Signed-off-by: Damien Le Moal --- include/linux/libata.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index e176d832467d..52d58b13e5ee 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -975,12 +975,6 @@ struct ata_port_operations { ssize_t (*transmit_led_message)(struct ata_port *ap, u32 state, ssize_t size); - /* - * Obsolete - */ - void (*phy_reset)(struct ata_port *ap); - void (*eng_timeout)(struct ata_port *ap); - /* * ->inherits must be the last field and all the preceding * fields must be pointers. -- cgit v1.2.3 From b65413768abd27a55af74945aec58127a52b30a8 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 11 Jul 2023 10:50:58 +0900 Subject: x86/kprobes: Prohibit probing on compiler generated CFI checking code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prohibit probing on the compiler generated CFI typeid checking code because it is used for decoding typeid when CFI error happens. The compiler generates the following instruction sequence for indirect call checks on x86;   movl -, %r10d ; 6 bytes addl -4(%reg), %r10d ; 4 bytes je .Ltmp1 ; 2 bytes ud2 ; <- regs->ip And handle_cfi_failure() decodes these instructions (movl and addl) for the typeid and the target address. Thus if we put a kprobe on those instructions, the decode will fail and report a wrong typeid and target address. Signed-off-by: Masami Hiramatsu (Google) Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/168904025785.116016.12766408611437534723.stgit@devnote2 --- include/linux/cfi.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 5e134f4ce8b7..3552ec82b725 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -19,11 +19,13 @@ static inline enum bug_trap_type report_cfi_failure_noaddr(struct pt_regs *regs, { return report_cfi_failure(regs, addr, NULL, 0); } +#endif /* CONFIG_CFI_CLANG */ #ifdef CONFIG_ARCH_USES_CFI_TRAPS bool is_cfi_trap(unsigned long addr); +#else +static inline bool is_cfi_trap(unsigned long addr) { return false; } #endif -#endif /* CONFIG_CFI_CLANG */ #ifdef CONFIG_MODULES #ifdef CONFIG_ARCH_USES_CFI_TRAPS -- cgit v1.2.3 From 2ba39cc46bfe463cb9673bf62a04c4c21942f1f2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 1 Aug 2023 19:21:57 +0200 Subject: fs: rename and move block_page_mkwrite_return block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig Reviewed-by: Luis Chamberlain Reviewed-by: Hannes Reinecke Reviewed-by: Johannes Thumshirn Reviewed-by: Christian Brauner Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/buffer_head.h | 12 ------------ include/linux/mm.h | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6cb3e9af78c9..7002a9ff63a3 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -291,18 +291,6 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size); int block_commit_write(struct page *page, unsigned from, unsigned to); int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, get_block_t get_block); -/* Convert errno to return value from ->page_mkwrite() call */ -static inline vm_fault_t block_page_mkwrite_return(int err) -{ - if (err == 0) - return VM_FAULT_LOCKED; - if (err == -EFAULT || err == -EAGAIN) - return VM_FAULT_NOPAGE; - if (err == -ENOMEM) - return VM_FAULT_OOM; - /* -ENOSPC, -EDQUOT, -EIO ... */ - return VM_FAULT_SIGBUS; -} sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); int block_truncate_page(struct address_space *, loff_t, get_block_t *); diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..75777eae1c9c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3386,6 +3386,24 @@ static inline vm_fault_t vmf_error(int err) return VM_FAULT_SIGBUS; } +/* + * Convert errno to return value for ->page_mkwrite() calls. + * + * This should eventually be merged with vmf_error() above, but will need a + * careful audit of all vmf_error() callers. + */ +static inline vm_fault_t vmf_fs_error(int err) +{ + if (err == 0) + return VM_FAULT_LOCKED; + if (err == -EFAULT || err == -EAGAIN) + return VM_FAULT_NOPAGE; + if (err == -ENOMEM) + return VM_FAULT_OOM; + /* -ENOSPC, -EDQUOT, -EIO ... */ + return VM_FAULT_SIGBUS; +} + struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); -- cgit v1.2.3 From 925c86a19bacf8ce10eb666328fb3fa5aff7b951 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 1 Aug 2023 19:22:01 +0200 Subject: fs: add CONFIG_BUFFER_HEAD Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig Reviewed-by: Luis Chamberlain Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/buffer_head.h | 32 ++++++++++++++++---------------- include/linux/iomap.h | 4 ++++ 2 files changed, 20 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 7002a9ff63a3..c89ef50d5112 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -16,8 +16,6 @@ #include #include -#ifdef CONFIG_BLOCK - enum bh_state_bits { BH_Uptodate, /* Contains valid data */ BH_Dirty, /* Is dirty */ @@ -198,7 +196,6 @@ void set_bh_page(struct buffer_head *bh, struct page *page, unsigned long offset); void folio_set_bh(struct buffer_head *bh, struct folio *folio, unsigned long offset); -bool try_to_free_buffers(struct folio *); struct buffer_head *folio_alloc_buffers(struct folio *folio, unsigned long size, bool retry); struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, @@ -213,10 +210,6 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate); /* Things to do with buffers at mapping->private_list */ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode); -int inode_has_buffers(struct inode *); -void invalidate_inode_buffers(struct inode *); -int remove_inode_buffers(struct inode *inode); -int sync_mapping_buffers(struct address_space *mapping); int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end, bool datasync); int generic_buffers_fsync(struct file *file, loff_t start, loff_t end, @@ -240,9 +233,6 @@ void __bforget(struct buffer_head *); void __breadahead(struct block_device *, sector_t block, unsigned int size); struct buffer_head *__bread_gfp(struct block_device *, sector_t block, unsigned size, gfp_t gfp); -void invalidate_bh_lrus(void); -void invalidate_bh_lrus_cpu(void); -bool has_bh_in_lru(int cpu, void *dummy); struct buffer_head *alloc_buffer_head(gfp_t gfp_flags); void free_buffer_head(struct buffer_head * bh); void unlock_buffer(struct buffer_head *bh); @@ -258,8 +248,6 @@ int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait); void __bh_read_batch(int nr, struct buffer_head *bhs[], blk_opf_t op_flags, bool force_lock); -extern int buffer_heads_over_limit; - /* * Generic address_space_operations implementations for buffer_head-backed * address_spaces. @@ -304,8 +292,6 @@ extern int buffer_migrate_folio_norefs(struct address_space *, #define buffer_migrate_folio_norefs NULL #endif -void buffer_init(void); - /* * inline definitions */ @@ -465,7 +451,20 @@ __bread(struct block_device *bdev, sector_t block, unsigned size) bool block_dirty_folio(struct address_space *mapping, struct folio *folio); -#else /* CONFIG_BLOCK */ +#ifdef CONFIG_BUFFER_HEAD + +void buffer_init(void); +bool try_to_free_buffers(struct folio *folio); +int inode_has_buffers(struct inode *inode); +void invalidate_inode_buffers(struct inode *inode); +int remove_inode_buffers(struct inode *inode); +int sync_mapping_buffers(struct address_space *mapping); +void invalidate_bh_lrus(void); +void invalidate_bh_lrus_cpu(void); +bool has_bh_in_lru(int cpu, void *dummy); +extern int buffer_heads_over_limit; + +#else /* CONFIG_BUFFER_HEAD */ static inline void buffer_init(void) {} static inline bool try_to_free_buffers(struct folio *folio) { return true; } @@ -473,9 +472,10 @@ static inline int inode_has_buffers(struct inode *inode) { return 0; } static inline void invalidate_inode_buffers(struct inode *inode) {} static inline int remove_inode_buffers(struct inode *inode) { return 1; } static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; } +static inline void invalidate_bh_lrus(void) {} static inline void invalidate_bh_lrus_cpu(void) {} static inline bool has_bh_in_lru(int cpu, void *dummy) { return false; } #define buffer_heads_over_limit 0 -#endif /* CONFIG_BLOCK */ +#endif /* CONFIG_BUFFER_HEAD */ #endif /* _LINUX_BUFFER_HEAD_H */ diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..54f50d34fd9d 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -58,7 +58,11 @@ struct vm_fault; #define IOMAP_F_DIRTY (1U << 1) #define IOMAP_F_SHARED (1U << 2) #define IOMAP_F_MERGED (1U << 3) +#ifdef CONFIG_BUFFER_HEAD #define IOMAP_F_BUFFER_HEAD (1U << 4) +#else +#define IOMAP_F_BUFFER_HEAD 0 +#endif /* CONFIG_BUFFER_HEAD */ #define IOMAP_F_XATTR (1U << 5) /* -- cgit v1.2.3 From 6a5a148aaf14747570cc634f9cdfcb0393f5617f Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 1 Aug 2023 13:13:58 +0200 Subject: bpf: fix bpf_probe_read_kernel prototype mismatch bpf_probe_read_kernel() has a __weak definition in core.c and another definition with an incompatible prototype in kernel/trace/bpf_trace.c, when CONFIG_BPF_EVENTS is enabled. Since the two are incompatible, there cannot be a shared declaration in a header file, but the lack of a prototype causes a W=1 warning: kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes] On 32-bit architectures, the local prototype u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) passes arguments in other registers as the one in bpf_trace.c BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) which uses 64-bit arguments in pairs of registers. As both versions of the function are fairly simple and only really differ in one line, just move them into a header file as an inline function that does not add any overhead for the bpf_trace.c callers and actually avoids a function call for the other one. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/ Signed-off-by: Arnd Bergmann Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ceaa8c23287f..abe75063630b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2661,6 +2661,18 @@ static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) } #endif /* CONFIG_BPF_SYSCALL */ +static __always_inline int +bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) +{ + int ret = -EFAULT; + + if (IS_ENABLED(CONFIG_BPF_EVENTS)) + ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + void __bpf_free_used_btfs(struct bpf_prog_aux *aux, struct btf_mod_pair *used_btfs, u32 len); -- cgit v1.2.3 From c35559f94ebc3e3bc82e56e07161bb5986cd9761 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:11:00 -0700 Subject: x86/shstk: Introduce map_shadow_stack syscall When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stacks is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other threads could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple of downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup its own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(addr, stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-35-rick.p.edgecombe%40intel.com --- include/linux/syscalls.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 03e3d0121d5e..7f6dc0988197 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -953,6 +953,7 @@ asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l asmlinkage long sys_cachestat(unsigned int fd, struct cachestat_range __user *cstat_range, struct cachestat __user *cstat, unsigned int flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls -- cgit v1.2.3 From 0ee44885fe9cf19eb3870947c8f3c275017e48a7 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Mon, 12 Jun 2023 17:11:02 -0700 Subject: x86: Expose thread features in /proc/$PID/status Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. [Switched to CET, added to commit log] Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-37-rick.p.edgecombe%40intel.com --- include/linux/proc_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 253f2676d93a..de407e7c3b55 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -159,6 +159,7 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, #endif /* CONFIG_PROC_PID_ARCH_STATUS */ void arch_report_meminfo(struct seq_file *m); +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); #else /* CONFIG_PROC_FS */ -- cgit v1.2.3 From 87f0df7828899c552bcdde639c045983d5aeeed9 Mon Sep 17 00:00:00 2001 From: Rick Edgecombe Date: Thu, 6 Jul 2023 16:32:48 -0700 Subject: x86/shstk: Move arch detail comment out of core mm The comment around VM_SHADOW_STACK in mm.h refers to a lot of x86 specific details that don't belong in a cross arch file. Remove these out of core mm, and just leave the non-arch details. Since the comment includes some useful details that would be good to retain in the source somewhere, put the arch specifics parts in arch/x86/shstk.c near alloc_shstk(), where memory of this type is allocated. Include a reference to the existence of the x86 details near the VM_SHADOW_STACK definition mm.h. Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Mark Brown Link: https://lore.kernel.org/all/20230706233248.445713-1-rick.p.edgecombe%40intel.com --- include/linux/mm.h | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 97eddc83d19c..8c0350c1134a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -343,33 +343,13 @@ extern unsigned int kobjsize(const void *objp); #ifdef CONFIG_X86_USER_SHADOW_STACK /* - * This flag should not be set with VM_SHARED because of lack of support - * core mm. It will also get a guard page. This helps userspace protect - * itself from attacks. The reasoning is as follows: + * VM_SHADOW_STACK should not be set with VM_SHARED because of lack of + * support core mm. * - * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The - * INCSSP instruction can increment the shadow stack pointer. It is the - * shadow stack analog of an instruction like: - * - * addq $0x80, %rsp - * - * However, there is one important difference between an ADD on %rsp - * and INCSSP. In addition to modifying SSP, INCSSP also reads from the - * memory of the first and last elements that were "popped". It can be - * thought of as acting like this: - * - * READ_ONCE(ssp); // read+discard top element on stack - * ssp += nr_to_pop * 8; // move the shadow stack - * READ_ONCE(ssp-8); // read+discard last popped stack element - * - * The maximum distance INCSSP can move the SSP is 2040 bytes, before - * it would read the memory. Therefore a single page gap will be enough - * to prevent any operation from shifting the SSP to an adjacent stack, - * since it would have to land in the gap at least once, causing a - * fault. - * - * Prevent using INCSSP to move the SSP between shadow stacks by - * having a PAGE_SIZE guard gap. + * These VMAs will get a single end guard page. This helps userspace protect + * itself from attacks. A single page is enough for current shadow stack archs + * (x86). See the comments near alloc_shstk() in arch/x86/kernel/shstk.c + * for more details on the guard size. */ # define VM_SHADOW_STACK VM_HIGH_ARCH_5 #else -- cgit v1.2.3 From 6ad0f2f91ad14ba0a3c2990c054fd6fbe8100429 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Jul 2023 22:21:08 +0800 Subject: Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer() Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function vmbus_ontimer()") this is not used anymore, so can remove it. Signed-off-by: YueHaibing Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20230725142108.27280-1-yuehaibing@huawei.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index bfbc37ce223b..3ac3974b3c78 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1239,9 +1239,6 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel *channel, u32 *buffer_actual_len, u64 *requestid); - -extern void vmbus_ontimer(unsigned long data); - /* Base driver object */ struct hv_driver { const char *name; -- cgit v1.2.3 From 1762f132d54200ffa008e86f9f6c96ab4ee3fb71 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:16 +0300 Subject: net/mlx5e: Support IPsec packet offload for RX in switchdev mode As decryption must be done first, add new prio for IPsec offload in FDB, and put it just lower than BYPASS prio and higher than TC prio. Three levels are added for RX. The first one is for ip xfrm policy. SA table is created in the second level for ip xfrm state. The status table is created in the last to check the decryption result. If success, packets continue with the next process, or dropped otherwise. For now, the set of reg c1 is removed for swtichdev mode, and the datapath process will be added in the next patch. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/c91063554cf643fb50b99cf093e8a9bf11729de5.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 2cb404c7ea13..6b1fa94f69c8 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -109,6 +109,7 @@ enum mlx5_flow_namespace_type { enum { FDB_BYPASS_PATH, + FDB_CRYPTO_INGRESS, FDB_TC_OFFLOAD, FDB_FT_OFFLOAD, FDB_TC_MISS, -- cgit v1.2.3 From 91bafc638ed4128eaca074fe7e88a5444db14325 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:17 +0300 Subject: net/mlx5e: Handle IPsec offload for RX datapath in switchdev mode Reuse tun opts bits in reg c1, to pass IPsec obj id to datapath. As this is only for RX SA and there are only 11 bits, xarray is used to map IPsec obj id to an index, which is between 1 and 0x7ff, and replace obj id to write to reg c1. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/43d60fbcc9cd672a97d7e2a2f7fe6a3d9e9a776d.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/eswitch.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h index e2701ed0200e..950d2431a53c 100644 --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -144,6 +144,9 @@ u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw, GENMASK(31 - ESW_TUN_ID_BITS - ESW_RESERVED_BITS, \ ESW_TUN_OPTS_OFFSET + 1) +/* reuse tun_opts for the mapped ipsec obj id when tun_id is 0 (invalid) */ +#define ESW_IPSEC_RX_MAPPED_ID_MASK GENMASK(ESW_TUN_OPTS_BITS - 1, 0) + u8 mlx5_eswitch_mode(const struct mlx5_core_dev *dev); u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev); struct mlx5_core_dev *mlx5_eswitch_get_core_dev(struct mlx5_eswitch *esw); -- cgit v1.2.3 From c6c2bf5db4ea14b316af1fd03cc6c5c61f751f79 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:19 +0300 Subject: net/mlx5e: Support IPsec packet offload for TX in switchdev mode The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 6b1fa94f69c8..c302ec34255b 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -115,6 +115,7 @@ enum { FDB_TC_MISS, FDB_BR_OFFLOAD, FDB_SLOW_PATH, + FDB_CRYPTO_EGRESS, FDB_PER_VPORT, }; -- cgit v1.2.3 From c8e350e62fc51f3fda28f166fc402f4fb539f528 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:24 +0300 Subject: net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev For IPsec packet offload mode, the order of TC offload and IPsec offload on the same netdevice is not aligned with the order in the non-offload software. For example, for RX, the software performs TC first and then IPsec transformation, but the implementation for offload does that in the opposite way. To resolve the difference for now, either IPsec offload or TC offload, not both, is allowed for a specific interface. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f21703fb75fd..fa70c25423b2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -806,6 +806,8 @@ struct mlx5_core_dev { u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; struct mlx5_thermal *thermal; + u64 num_block_tc; + u64 num_block_ipsec; }; struct mlx5_db { -- cgit v1.2.3 From 66f7223039c0cef81221e9779520479895995815 Mon Sep 17 00:00:00 2001 From: Maxim Georgiev Date: Tue, 1 Aug 2023 17:28:13 +0300 Subject: net: add NDOs for configuring hardware timestamping Current hardware timestamping API for NICs requires implementing .ndo_eth_ioctl() for SIOCGHWTSTAMP and SIOCSHWTSTAMP. That API has some boilerplate such as request parameter translation between user and kernel address spaces, handling possible translation failures correctly, etc. Since it is the same all across the board, it would be desirable to handle it through generic code. Here we introduce .ndo_hwtstamp_get() and .ndo_hwtstamp_set(), which implement that boilerplate and allow drivers to just act upon requests. Suggested-by: Jakub Kicinski Signed-off-by: Maxim Georgiev Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Tested-by: Horatiu Vultur Link: https://lore.kernel.org/r/20230801142824.1772134-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 8 ++++++++ include/linux/netdevice.h | 16 ++++++++++++++++ 2 files changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index fd67f3cc0c4b..7c59824f43f5 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -30,4 +30,12 @@ static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kern kernel_cfg->rx_filter = cfg->rx_filter; } +static inline void hwtstamp_config_from_kernel(struct hwtstamp_config *cfg, + const struct kernel_hwtstamp_config *kernel_cfg) +{ + cfg->flags = kernel_cfg->flags; + cfg->tx_type = kernel_cfg->tx_type; + cfg->rx_filter = kernel_cfg->rx_filter; +} + #endif /* _LINUX_NET_TIMESTAMPING_H_ */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 84c36a7f873f..08a0b8d45dc9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -57,6 +57,7 @@ struct netpoll_info; struct device; struct ethtool_ops; +struct kernel_hwtstamp_config; struct phy_device; struct dsa_port; struct ip_tunnel_parm; @@ -1418,6 +1419,16 @@ struct netdev_net_notifier { * Get hardware timestamp based on normal/adjustable time or free running * cycle counter. This function is required if physical clock supports a * free running cycle counter. + * + * int (*ndo_hwtstamp_get)(struct net_device *dev, + * struct kernel_hwtstamp_config *kernel_config); + * Get the currently configured hardware timestamping parameters for the + * NIC device. + * + * int (*ndo_hwtstamp_set)(struct net_device *dev, + * struct kernel_hwtstamp_config *kernel_config, + * struct netlink_ext_ack *extack); + * Change the hardware timestamping parameters for NIC device. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1652,6 +1663,11 @@ struct net_device_ops { ktime_t (*ndo_get_tstamp)(struct net_device *dev, const struct skb_shared_hwtstamps *hwtstamps, bool cycles); + int (*ndo_hwtstamp_get)(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_config); + int (*ndo_hwtstamp_set)(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_config, + struct netlink_ext_ack *extack); }; struct xdp_metadata_ops { -- cgit v1.2.3 From e47d01fea663b1fe58d0a493efc9ed667f70242e Mon Sep 17 00:00:00 2001 From: Maxim Georgiev Date: Tue, 1 Aug 2023 17:28:14 +0300 Subject: net: add hwtstamping helpers for stackable net devices The stackable net devices with hwtstamping support (vlan, macvlan, bonding) only pass the hwtstamping ops to the lower (real) device. These drivers are the first that need to be converted to the new timestamping API, because if they aren't prepared to handle that, then no real device driver cannot be converted to the new API either. After studying what vlan_dev_ioctl(), macvlan_eth_ioctl() and bond_eth_ioctl() have in common, here we propose two generic implementations of ndo_hwtstamp_get() and ndo_hwtstamp_set() which can be called by those 3 drivers, with "dev" being their lower device. These helpers cover both cases, when the lower driver is converted to the new API or unconverted. We need some hacks in case of an unconverted driver, namely to stuff some pointers in struct kernel_hwtstamp_config which shouldn't have been there (since the new API isn't supposed to need it). These will be removed when all drivers will have been converted to the new API. Signed-off-by: Maxim Georgiev Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Link: https://lore.kernel.org/r/20230801142824.1772134-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 6 ++++++ include/linux/netdevice.h | 5 +++++ 2 files changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index 7c59824f43f5..03e922814851 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -11,6 +11,10 @@ * @flags: see struct hwtstamp_config * @tx_type: see struct hwtstamp_config * @rx_filter: see struct hwtstamp_config + * @ifr: pointer to ifreq structure from the original ioctl request, to pass to + * a legacy implementation of a lower driver + * @copied_to_user: request was passed to a legacy implementation which already + * copied the ioctl request back to user space * * Prefer using this structure for in-kernel processing of hardware * timestamping configuration, over the inextensible struct hwtstamp_config @@ -20,6 +24,8 @@ struct kernel_hwtstamp_config { int flags; int tx_type; int rx_filter; + struct ifreq *ifr; + bool copied_to_user; }; static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kernel_cfg, diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 08a0b8d45dc9..23e335f245cf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3950,6 +3950,11 @@ int put_user_ifreq(struct ifreq *ifr, void __user *arg); int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr, void __user *data, bool *need_copyout); int dev_ifconf(struct net *net, struct ifconf __user *ifc); +int generic_hwtstamp_get_lower(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_cfg); +int generic_hwtstamp_set_lower(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_cfg, + struct netlink_ext_ack *extack); int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *userdata); unsigned int dev_get_flags(const struct net_device *); int __dev_change_flags(struct net_device *dev, unsigned int flags, -- cgit v1.2.3 From 60495b6622ca67f5180343b89bd932d28d23f63a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 1 Aug 2023 17:28:23 +0300 Subject: net: phy: provide phylib stubs for hardware timestamping operations net/core/dev_ioctl.c (built-in code) will want to call phy_mii_ioctl() for hardware timestamping purposes. This is not directly possible, because phy_mii_ioctl() is a symbol provided under CONFIG_PHYLIB. Do something similar to what was done in DSA in commit 5a17818682cf ("net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub"), and arrange some indirect calls to phy_mii_ioctl() through a stub structure containing function pointers, that's provided by phylib as built-in even when CONFIG_PHYLIB=m, and which phy_init() populates at runtime (module insertion). Note: maybe the ownership of the ethtool_phy_ops singleton is backwards, and the methods exposed by that should be later merged into phylib_stubs. Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230801142824.1772134-12-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 7 +++++ include/linux/phylib_stubs.h | 68 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 include/linux/phylib_stubs.h (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index b254848a9c99..ba08b0e60279 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -298,6 +298,7 @@ static inline const char *phy_modes(phy_interface_t interface) #define MII_BUS_ID_SIZE 61 struct device; +struct kernel_hwtstamp_config; struct phylink; struct sfp_bus; struct sfp_upstream_ops; @@ -1955,6 +1956,12 @@ int phy_ethtool_set_plca_cfg(struct phy_device *phydev, int phy_ethtool_get_plca_status(struct phy_device *phydev, struct phy_plca_status *plca_st); +int __phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config); +int __phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); + static inline int phy_package_read(struct phy_device *phydev, u32 regnum) { struct phy_package_shared *shared = phydev->shared; diff --git a/include/linux/phylib_stubs.h b/include/linux/phylib_stubs.h new file mode 100644 index 000000000000..1279f48c8a70 --- /dev/null +++ b/include/linux/phylib_stubs.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Stubs for the Network PHY library + */ + +#include + +struct kernel_hwtstamp_config; +struct netlink_ext_ack; +struct phy_device; + +#if IS_ENABLED(CONFIG_PHYLIB) + +extern const struct phylib_stubs *phylib_stubs; + +struct phylib_stubs { + int (*hwtstamp_get)(struct phy_device *phydev, + struct kernel_hwtstamp_config *config); + int (*hwtstamp_set)(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); +}; + +static inline int phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config) +{ + /* phylib_register_stubs() and phylib_unregister_stubs() + * also run under rtnl_lock(). + */ + ASSERT_RTNL(); + + if (!phylib_stubs) + return -EOPNOTSUPP; + + return phylib_stubs->hwtstamp_get(phydev, config); +} + +static inline int phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) +{ + /* phylib_register_stubs() and phylib_unregister_stubs() + * also run under rtnl_lock(). + */ + ASSERT_RTNL(); + + if (!phylib_stubs) + return -EOPNOTSUPP; + + return phylib_stubs->hwtstamp_set(phydev, config, extack); +} + +#else + +static inline int phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config) +{ + return -EOPNOTSUPP; +} + +static inline int phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +#endif -- cgit v1.2.3 From fd770e856e226f80fe6e1dc9d1861bcb135cdf0b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 1 Aug 2023 17:28:24 +0300 Subject: net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers It is desirable that the new .ndo_hwtstamp_set() API gives more uniformity, less overhead and future flexibility w.r.t. the PHY timestamping behavior. Currently there are some drivers which allow PHY timestamping through the procedure mentioned in Documentation/networking/timestamping.rst. They don't do anything locally if phy_has_hwtstamp() is set, except for lan966x which installs PTP packet traps. Centralize that behavior in a new dev_set_hwtstamp_phylib() code function, which calls either phy_mii_ioctl() for the phylib PHY, or .ndo_hwtstamp_set() of the netdev, based on a single policy (currently simplistic: phy_has_hwtstamp()). Any driver converted to .ndo_hwtstamp_set() will automatically opt into the centralized phylib timestamping policy. Unconverted drivers still get to choose whether they let the PHY handle timestamping or not. Netdev drivers with integrated PHY drivers that don't use phylib presumably don't set dev->phydev, and those will always see HWTSTAMP_SOURCE_NETDEV requests even when converted. The timestamping policy will remain 100% up to them. Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Tested-by: Horatiu Vultur Link: https://lore.kernel.org/r/20230801142824.1772134-13-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 16 ++++++++++++++++ include/linux/netdevice.h | 4 ++++ 2 files changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index 03e922814851..eb01c37e71e0 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -5,6 +5,11 @@ #include +enum hwtstamp_source { + HWTSTAMP_SOURCE_NETDEV, + HWTSTAMP_SOURCE_PHYLIB, +}; + /** * struct kernel_hwtstamp_config - Kernel copy of struct hwtstamp_config * @@ -15,6 +20,8 @@ * a legacy implementation of a lower driver * @copied_to_user: request was passed to a legacy implementation which already * copied the ioctl request back to user space + * @source: indication whether timestamps should come from the netdev or from + * an attached phylib PHY * * Prefer using this structure for in-kernel processing of hardware * timestamping configuration, over the inextensible struct hwtstamp_config @@ -26,6 +33,7 @@ struct kernel_hwtstamp_config { int rx_filter; struct ifreq *ifr; bool copied_to_user; + enum hwtstamp_source source; }; static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kernel_cfg, @@ -44,4 +52,12 @@ static inline void hwtstamp_config_from_kernel(struct hwtstamp_config *cfg, cfg->rx_filter = kernel_cfg->rx_filter; } +static inline bool kernel_hwtstamp_config_changed(const struct kernel_hwtstamp_config *a, + const struct kernel_hwtstamp_config *b) +{ + return a->flags != b->flags || + a->tx_type != b->tx_type || + a->rx_filter != b->rx_filter; +} + #endif /* _LINUX_NET_TIMESTAMPING_H_ */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 23e335f245cf..85d594460c66 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1724,6 +1724,9 @@ struct xdp_metadata_ops { * @IFF_TX_SKB_NO_LINEAR: device/driver is capable of xmitting frames with * skb_headlen(skb) == 0 (data starts from frag0) * @IFF_CHANGE_PROTO_DOWN: device supports setting carrier via IFLA_PROTO_DOWN + * @IFF_SEE_ALL_HWTSTAMP_REQUESTS: device wants to see calls to + * ndo_hwtstamp_set() for all timestamp requests regardless of source, + * even if those aren't HWTSTAMP_SOURCE_NETDEV. */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1759,6 +1762,7 @@ enum netdev_priv_flags { IFF_NO_ADDRCONF = BIT_ULL(30), IFF_TX_SKB_NO_LINEAR = BIT_ULL(31), IFF_CHANGE_PROTO_DOWN = BIT_ULL(32), + IFF_SEE_ALL_HWTSTAMP_REQUESTS = BIT_ULL(33), }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN -- cgit v1.2.3 From f11e5bd159b08976db9e7a9eabbf0318dfe5429d Mon Sep 17 00:00:00 2001 From: Mateusz Kowalski Date: Tue, 1 Aug 2023 14:37:50 +0200 Subject: bonding: support balance-alb with openvswitch Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") introduced a support for balance-alb mode for interfaces connected to the linux bridge by fixing missing matching of MAC entry in FDB. In our testing we discovered that it still does not work when the bond is connected to the OVS bridge as show in diagram below: eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac) | bond0.150(mac:eth0_mac) | ovs_bridge(ip:bridge_ip,mac:eth0_mac) This patch fixes it by checking not only if the device is a bridge but also if it is an openvswitch. Signed-off-by: Mateusz Kowalski Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/9fe7297c-609e-208b-c77b-3ceef6eb51a4@redhat.com Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 85d594460c66..4176a738177b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5128,6 +5128,11 @@ static inline bool netif_is_ovs_port(const struct net_device *dev) return dev->priv_flags & IFF_OVS_DATAPATH; } +static inline bool netif_is_any_bridge_master(const struct net_device *dev) +{ + return netif_is_bridge_master(dev) || netif_is_ovs_master(dev); +} + static inline bool netif_is_any_bridge_port(const struct net_device *dev) { return netif_is_bridge_port(dev) || netif_is_ovs_port(dev); -- cgit v1.2.3 From 8a132ecb6bc38f9ed0d2927acbcabeb1443613b2 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Jul 2023 22:03:27 +0800 Subject: efi: Remove unused extern declaration efi_lookup_mapped_addr() Since commit 50a0cb565246 ("x86/efi-bgrt: Fix kernel panic when mapping BGRT data") this extern declaration is not used anymore. Signed-off-by: YueHaibing Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index ab088c662e88..e9004358f7bd 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -726,7 +726,6 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, return EFI_SUCCESS; } #endif -extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr); extern int __init __efi_memmap_init(struct efi_memory_map_data *data); extern int __init efi_memmap_init_early(struct efi_memory_map_data *data); -- cgit v1.2.3 From 49e47a5b6145d86c30022fe0e949bbb24bae28ba Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 18:02:29 -0700 Subject: net: move struct netdev_rx_queue out of netdevice.h struct netdev_rx_queue is touched in only a few places and having it defined in netdevice.h brings in the dependency on xdp.h, because struct xdp_rxq_info gets embedded in struct netdev_rx_queue. In prep for removal of xdp.h from netdevice.h move all the netdev_rx_queue stuff to a new header. We could technically break the new header up to avoid the sysfs.h include but it's so rarely included it doesn't seem to be worth it at this point. Reviewed-by: Amritha Nambiar Signed-off-by: Jakub Kicinski Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/netdevice.h | 44 -------------------------------------------- 1 file changed, 44 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3800d0479698..5563c8a210b5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -782,32 +782,6 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id, #endif #endif /* CONFIG_RPS */ -/* This structure contains an instance of an RX queue. */ -struct netdev_rx_queue { - struct xdp_rxq_info xdp_rxq; -#ifdef CONFIG_RPS - struct rps_map __rcu *rps_map; - struct rps_dev_flow_table __rcu *rps_flow_table; -#endif - struct kobject kobj; - struct net_device *dev; - netdevice_tracker dev_tracker; - -#ifdef CONFIG_XDP_SOCKETS - struct xsk_buff_pool *pool; -#endif -} ____cacheline_aligned_in_smp; - -/* - * RX queue sysfs structures and functions. - */ -struct rx_queue_attribute { - struct attribute attr; - ssize_t (*show)(struct netdev_rx_queue *queue, char *buf); - ssize_t (*store)(struct netdev_rx_queue *queue, - const char *buf, size_t len); -}; - /* XPS map type and offset of the xps map within net_device->xps_maps[]. */ enum xps_map_type { XPS_CPUS = 0, @@ -3828,24 +3802,6 @@ static inline int netif_set_real_num_rx_queues(struct net_device *dev, int netif_set_real_num_queues(struct net_device *dev, unsigned int txq, unsigned int rxq); -static inline struct netdev_rx_queue * -__netif_get_rx_queue(struct net_device *dev, unsigned int rxq) -{ - return dev->_rx + rxq; -} - -#ifdef CONFIG_SYSFS -static inline unsigned int get_netdev_rx_queue_index( - struct netdev_rx_queue *queue) -{ - struct net_device *dev = queue->dev; - int index = queue - dev->_rx; - - BUG_ON(index >= dev->num_rx_queues); - return index; -} -#endif - int netif_get_num_default_rss_queues(void); void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason); -- cgit v1.2.3 From 680ee0456a5712309db9ec2692e908ea1d6b1644 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 18:02:30 -0700 Subject: net: invert the netdevice.h vs xdp.h dependency xdp.h is far more specific and is included in only 67 other files vs netdevice.h's 1538 include sites. Make xdp.h include netdevice.h, instead of the other way around. This decreases the incremental allmodconfig builds size when xdp.h is touched from 5947 to 662 objects. Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h is a mega-header in its own right so it's nice to avoid xdp.h getting included there as well. The only unfortunate part is that the typedef for xdp_features_t has to move to netdevice.h, since its embedded in struct netdevice. Signed-off-by: Jakub Kicinski Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 17 ----------------- include/linux/netdevice.h | 11 ++++------- 2 files changed, 4 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index f5eabe3fa5e8..2d6fe30bad5f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -774,23 +774,6 @@ DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); u32 xdp_master_redirect(struct xdp_buff *xdp); -static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, - struct xdp_buff *xdp) -{ - /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus - * under local_bh_disable(), which provides the needed RCU protection - * for accessing map entries. - */ - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); - - if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { - if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) - act = xdp_master_redirect(xdp); - } - - return act; -} - void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5563c8a210b5..d8ed85183fe4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -40,7 +40,6 @@ #include #endif #include -#include #include #include @@ -76,8 +75,12 @@ struct udp_tunnel_nic_info; struct udp_tunnel_nic; struct bpf_prog; struct xdp_buff; +struct xdp_frame; +struct xdp_metadata_ops; struct xdp_md; +typedef u32 xdp_features_t; + void synchronize_net(void); void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops); @@ -1628,12 +1631,6 @@ struct net_device_ops { bool cycles); }; -struct xdp_metadata_ops { - int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); - int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, - enum xdp_rss_hash_type *rss_type); -}; - /** * enum netdev_priv_flags - &struct net_device priv_flags * -- cgit v1.2.3 From 2abcc4b5a64a65a2d2287ba0be5c2871c1552416 Mon Sep 17 00:00:00 2001 From: James Morse Date: Tue, 1 Aug 2023 14:54:07 +0000 Subject: module: Expose module_init_layout_section() module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when module unloading is disabled. This code will never run, so it can be free()d once the module has been initialised. arm and arm64 need to count the number of PLTs they need before applying relocations based on the section name. The init PLTs are stored separately so they can be free()d. arm and arm64 both use within_module_init() to decide which list of PLTs to use when applying the relocation. Because within_module_init()'s behaviour changes when module unloading is disabled, both architecture would need to take this into account when counting the PLTs. Today neither architecture does this, meaning when module unloading is disabled there are insufficient PLTs in the init section to load some modules, resulting in warnings: | WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc | Modules linked in: crct10dif_common | CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : module_emit_plt_entry+0x184/0x1cc | lr : module_emit_plt_entry+0x94/0x1cc | sp : ffffffc0803bba60 [...] | Call trace: | module_emit_plt_entry+0x184/0x1cc | apply_relocate_add+0x2bc/0x8e4 | load_module+0xe34/0x1bd4 | init_module_from_file+0x84/0xc0 | __arm64_sys_finit_module+0x1b8/0x27c | invoke_syscall.constprop.0+0x5c/0x104 | do_el0_svc+0x58/0x160 | el0_svc+0x38/0x110 | el0t_64_sync_handler+0xc0/0xc4 | el0t_64_sync+0x190/0x194 Instead of duplicating module_init_layout_section()s logic, expose it. Reported-by: Adam Johnston Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse Signed-off-by: Luis Chamberlain --- include/linux/moduleloader.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 03be088fb439..001b2ce83832 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -42,6 +42,11 @@ bool module_init_section(const char *name); */ bool module_exit_section(const char *name); +/* Describes whether within_module_init() will consider this an init section + * or not. This behaviour changes with CONFIG_MODULE_UNLOAD. + */ +bool module_init_layout_section(const char *sname); + /* * Apply the given relocation to the (simplified) ELF. Return -error * or 0. -- cgit v1.2.3 From efe47a18e43f59f063a82ccaa464a3b4844bb8a8 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Thu, 27 Jul 2023 13:04:28 +0300 Subject: bus: mhi: host: allow MHI client drivers to provide the firmware via a pointer Currently MHI loads the firmware image from the path provided by client devices. ath11k needs to support firmware image embedded along with meta data (named as firmware-2.bin). So allow the client driver to request the firmware file from user space on it's own and provide the firmware image data and size to MHI via a pointer struct mhi_controller::fw_data. This is an optional feature, if fw_data is NULL MHI load the firmware using the name from struct mhi_controller::fw_image string as before. Tested with ath11k and WCN6855 hw2.0. Signed-off-by: Kalle Valo Reviewed-by: Manivannan Sadhasivam Reviewed-by: Jeffrey Hugo Link: https://lore.kernel.org/r/20230727100430.3603551-2-kvalo@kernel.org [mani: wrapped commit message to 75 columns] Signed-off-by: Manivannan Sadhasivam --- include/linux/mhi.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi.h b/include/linux/mhi.h index f6de4b6ecfc7..039943ec4d4e 100644 --- a/include/linux/mhi.h +++ b/include/linux/mhi.h @@ -299,6 +299,10 @@ struct mhi_controller_config { * @iova_start: IOMMU starting address for data (required) * @iova_stop: IOMMU stop address for data (required) * @fw_image: Firmware image name for normal booting (optional) + * @fw_data: Firmware image data content for normal booting, used only + * if fw_image is NULL and fbc_download is true (optional) + * @fw_sz: Firmware image data size for normal booting, used only if fw_image + * is NULL and fbc_download is true (optional) * @edl_image: Firmware image name for emergency download mode (optional) * @rddm_size: RAM dump size that host should allocate for debugging purpose * @sbl_size: SBL image size downloaded through BHIe (optional) @@ -384,6 +388,8 @@ struct mhi_controller { dma_addr_t iova_start; dma_addr_t iova_stop; const char *fw_image; + const u8 *fw_data; + size_t fw_sz; const char *edl_image; size_t rddm_size; size_t sbl_size; -- cgit v1.2.3 From 7740bb882fdea16b2f3a4d3804827b910f44206c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 3 Aug 2023 07:14:26 +0000 Subject: net: vlan: update wrong comments vlan_insert_tag() and friends do not allocate a new skb. However they might allocate a new skb->head. Update their comments to better describe their behavior. Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/if_vlan.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 6ba71957851e..3028af87716e 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -408,7 +408,7 @@ static inline int __vlan_insert_tag(struct sk_buff *skb, * @mac_len: MAC header length including outer vlan headers * * Inserts the VLAN tag into @skb as part of the payload at offset mac_len - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. @@ -437,7 +437,7 @@ static inline struct sk_buff *vlan_insert_inner_tag(struct sk_buff *skb, * @vlan_tci: VLAN TCI to insert * * Inserts the VLAN tag into @skb as part of the payload - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. @@ -457,7 +457,7 @@ static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb, * @vlan_tci: VLAN TCI to insert * * Inserts the VLAN tag into @skb as part of the payload - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. -- cgit v1.2.3 From a13b5340aa6892685adf655cd4bfa91a83d5a9d2 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Wed, 2 Aug 2023 10:01:01 -0500 Subject: PCI: add ArrowLake-S PCI ID for Intel HDAudio subsystem. Add part ID to common include file Signed-off-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Reviewed-by: Bard Liao Link: https://lore.kernel.org/r/20230802150105.24604-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai --- include/linux/pci_ids.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 3066660cd39b..a6411aa4c331 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -3058,6 +3058,7 @@ #define PCI_DEVICE_ID_INTEL_HDA_RPL_S 0x7a50 #define PCI_DEVICE_ID_INTEL_HDA_ADL_S 0x7ad0 #define PCI_DEVICE_ID_INTEL_HDA_MTL 0x7e28 +#define PCI_DEVICE_ID_INTEL_HDA_ARL_S 0x7f50 #define PCI_DEVICE_ID_INTEL_SCH_LPC 0x8119 #define PCI_DEVICE_ID_INTEL_SCH_IDE 0x811a #define PCI_DEVICE_ID_INTEL_HDA_POULSBO 0x811b -- cgit v1.2.3 From a10b6a03e6377f8e658ee3c9f4bc0feae30f3df2 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Thu, 3 Aug 2023 15:56:53 +0200 Subject: serial: cpm_uart: Remove linux/fs_uart_pd.h linux/fs_uart_pd.h is not used anymore. Remove it. Signed-off-by: Christophe Leroy Link: https://lore.kernel.org/r/f2cb444fa2b5776c9c51b5e46ea85edab62d1524.1691068700.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman --- include/linux/fs_uart_pd.h | 71 ---------------------------------------------- 1 file changed, 71 deletions(-) delete mode 100644 include/linux/fs_uart_pd.h (limited to 'include/linux') diff --git a/include/linux/fs_uart_pd.h b/include/linux/fs_uart_pd.h deleted file mode 100644 index 36b61ff39277..000000000000 --- a/include/linux/fs_uart_pd.h +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Platform information definitions for the CPM Uart driver. - * - * 2006 (c) MontaVista Software, Inc. - * Vitaly Bordug - * - * This file is licensed under the terms of the GNU General Public License - * version 2. This program is licensed "as is" without any warranty of any - * kind, whether express or implied. - */ - -#ifndef FS_UART_PD_H -#define FS_UART_PD_H - -#include - -enum fs_uart_id { - fsid_smc1_uart, - fsid_smc2_uart, - fsid_scc1_uart, - fsid_scc2_uart, - fsid_scc3_uart, - fsid_scc4_uart, - fs_uart_nr, -}; - -static inline int fs_uart_id_scc2fsid(int id) -{ - return fsid_scc1_uart + id - 1; -} - -static inline int fs_uart_id_fsid2scc(int id) -{ - return id - fsid_scc1_uart + 1; -} - -static inline int fs_uart_id_smc2fsid(int id) -{ - return fsid_smc1_uart + id - 1; -} - -static inline int fs_uart_id_fsid2smc(int id) -{ - return id - fsid_smc1_uart + 1; -} - -struct fs_uart_platform_info { - void(*init_ioports)(struct fs_uart_platform_info *); - /* device specific information */ - int fs_no; /* controller index */ - char fs_type[4]; /* controller type */ - u32 uart_clk; - u8 tx_num_fifo; - u8 tx_buf_size; - u8 rx_num_fifo; - u8 rx_buf_size; - u8 brg; - u8 clk_rx; - u8 clk_tx; -}; - -static inline int fs_uart_get_id(struct fs_uart_platform_info *fpi) -{ - if(strstr(fpi->fs_type, "SMC")) - return fs_uart_id_smc2fsid(fpi->fs_no); - if(strstr(fpi->fs_type, "SCC")) - return fs_uart_id_scc2fsid(fpi->fs_no); - return fpi->fs_no; -} - -#endif -- cgit v1.2.3 From 31ed379b7cb2b5c1f2abc7255ff31c30bab26066 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= Date: Sun, 9 Jul 2023 23:18:00 +0200 Subject: dyndbg: add source filename to prefix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Printing the line number without the file is of limited usefulness. Knowing the filename also makes it also easier to relate the logged information to the controlfile. Example: # modprobe test_dynamic_debug # echo 'file test_dynamic_debug.c =pfsl' > /proc/dynamic_debug/control # echo 1 > /sys/module/test_dynamic_debug/parameters/do_prints # dmesg | tail -2 [ 71.802212] do_cats:lib/test_dynamic_debug.c:103: test_dd: doing categories [ 71.802227] do_levels:lib/test_dynamic_debug.c:123: test_dd: doing levels Signed-off-by: Thomas Weißschuh Acked-by: Jim Cromie Acked-by: Jason Baron Reviewed-by: Luis Chamberlain Link: https://lore.kernel.org/r/20230709-dyndbg-filename-v2-3-fd83beef0925@weissschuh.net Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 061dd84d09f3..4fcbf4d4fd0a 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -37,10 +37,12 @@ struct _ddebug { #define _DPRINTK_FLAGS_INCL_FUNCNAME (1<<2) #define _DPRINTK_FLAGS_INCL_LINENO (1<<3) #define _DPRINTK_FLAGS_INCL_TID (1<<4) +#define _DPRINTK_FLAGS_INCL_SOURCENAME (1<<5) #define _DPRINTK_FLAGS_INCL_ANY \ (_DPRINTK_FLAGS_INCL_MODNAME | _DPRINTK_FLAGS_INCL_FUNCNAME |\ - _DPRINTK_FLAGS_INCL_LINENO | _DPRINTK_FLAGS_INCL_TID) + _DPRINTK_FLAGS_INCL_LINENO | _DPRINTK_FLAGS_INCL_TID |\ + _DPRINTK_FLAGS_INCL_SOURCENAME) #if defined DEBUG #define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT -- cgit v1.2.3 From 8306d6f35dbde19fe2d56964159fa2b4c6343bc1 Mon Sep 17 00:00:00 2001 From: Zev Weiss Date: Fri, 23 Jun 2023 16:28:05 +0200 Subject: peci: Constify struct peci_controller_ops As with most ops structs, we never modify it at runtime, and keeping function pointers in read-only memory is generally a good thing security-wise. Signed-off-by: Zev Weiss Link: https://lore.kernel.org/r/20230327224315.11135-1-zev@bewilderbeest.net Reviewed-by: Iwona Winiarska Signed-off-by: Iwona Winiarska Link: https://lore.kernel.org/r/20230623142805.577612-1-iwona.winiarska@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/peci.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/peci.h b/include/linux/peci.h index 06e6ef935297..9b3d36aff431 100644 --- a/include/linux/peci.h +++ b/include/linux/peci.h @@ -42,13 +42,13 @@ struct peci_controller_ops { */ struct peci_controller { struct device dev; - struct peci_controller_ops *ops; + const struct peci_controller_ops *ops; struct mutex bus_lock; /* held for the duration of xfer */ u8 id; }; struct peci_controller *devm_peci_controller_add(struct device *parent, - struct peci_controller_ops *ops); + const struct peci_controller_ops *ops); static inline struct peci_controller *to_peci_controller(void *d) { -- cgit v1.2.3 From 34949a31fb5ec5269cbf5e065370ae0e14d25223 Mon Sep 17 00:00:00 2001 From: Teh Wen Ping Date: Thu, 27 Jul 2023 14:29:06 -0500 Subject: firmware: stratix10-svc: Generic Mailbox Command Add generic mailbox command that can support SDM command. User can use this command to send SDM mailbox command. User have to specified an input file which contain the command data and an output file for SDM response to be copied over. Signed-off-by: Teh Wen Ping Signed-off-by: Kah Jing Lee Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20230727192907.982070-1-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 25 ++++++++++++++++++++++ .../linux/firmware/intel/stratix10-svc-client.h | 5 +++++ 2 files changed, 30 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index a718f853d457..ee80ca4bb0d0 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -466,6 +466,31 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_FIRMWARE_VERSION \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FIRMWARE_VERSION) +/** + * SMC call protocol for Mailbox, starting FUNCID from 60 + * + * Call register usage: + * a0 INTEL_SIP_SMC_MBOX_SEND_CMD + * a1 mailbox command code + * a2 physical address that contain mailbox command data (not include header) + * a3 mailbox command data size in word + * a4 set to 0 for CASUAL, set to 1 for URGENT + * a5 physical address for secure firmware to put response data + * (not include header) + * a6 maximum size in word of physical address to store response data + * a7 not used + * + * Return status + * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_REJECTED or + * INTEL_SIP_SMC_STATUS_ERROR + * a1 mailbox error code + * a2 response data length in word + * a3 not used + */ +#define INTEL_SIP_SMC_FUNCID_MBOX_SEND_CMD 60 + #define INTEL_SIP_SMC_MBOX_SEND_CMD \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_MBOX_SEND_CMD) + /** * Request INTEL_SIP_SMC_SVC_VERSION * diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 0c16037fd08d..60ed82112680 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -118,6 +118,9 @@ struct stratix10_svc_chan; * @COMMAND_SMC_SVC_VERSION: Non-mailbox SMC SVC API Version, * return status is SVC_STATUS_OK * + * @COMMAND_MBOX_SEND_CMD: send generic mailbox command, return status is + * SVC_STATUS_OK or SVC_STATUS_ERROR + * * @COMMAND_RSU_DCMF_STATUS: query firmware for the DCMF status * return status is SVC_STATUS_OK or SVC_STATUS_ERROR * @@ -164,6 +167,8 @@ enum stratix10_svc_command_code { COMMAND_FCS_RANDOM_NUMBER_GEN, /* for general status poll */ COMMAND_POLL_SERVICE_STATUS = 40, + /* for generic mailbox send command */ + COMMAND_MBOX_SEND_CMD = 100, /* Non-mailbox SMC Call */ COMMAND_SMC_SVC_VERSION = 200, }; -- cgit v1.2.3 From 5cd474e57368f0957c343bb21e309cf82826b1ef Mon Sep 17 00:00:00 2001 From: D Scott Phillips Date: Mon, 26 Jun 2023 17:29:39 -0700 Subject: arm64: sdei: abort running SDEI handlers during crash Interrupts are blocked in SDEI context, per the SDEI spec: "The client interrupts cannot preempt the event handler." If we crashed in the SDEI handler-running context (as with ACPI's AGDI) then we need to clean up the SDEI state before proceeding to the crash kernel so that the crash kernel can have working interrupts. Track the active SDEI handler per-cpu so that we can COMPLETE_AND_RESUME the handler, discarding the interrupted context. Fixes: f5df26961853 ("arm64: kernel: Add arch-specific SDEI entry code and CPU masking") Signed-off-by: D Scott Phillips Cc: stable@vger.kernel.org Reviewed-by: James Morse Tested-by: Mihai Carabas Link: https://lore.kernel.org/r/20230627002939.2758-1-scott@os.amperecomputing.com Signed-off-by: Will Deacon --- include/linux/arm_sdei.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index 14dc461b0e82..255701e1251b 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -47,10 +47,12 @@ int sdei_unregister_ghes(struct ghes *ghes); int sdei_mask_local_cpu(void); int sdei_unmask_local_cpu(void); void __init sdei_init(void); +void sdei_handler_abort(void); #else static inline int sdei_mask_local_cpu(void) { return 0; } static inline int sdei_unmask_local_cpu(void) { return 0; } static inline void sdei_init(void) { } +static inline void sdei_handler_abort(void) { } #endif /* CONFIG_ARM_SDE_INTERFACE */ -- cgit v1.2.3 From 73aca58b781e3e3d52bb7247b374fb5334a05001 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Mon, 17 Jul 2023 08:37:16 -0600 Subject: of: Move of_platform_register_reconfig_notifier() into DT core There's no reason the generic platform bus code needs to call of_platform_register_reconfig_notifier(). The notifier can be setup before the platform bus is. Let's move it into of_core_init() which is called just before platform_bus_init() instead to keep more of the DT bits in the DT code. Reviewed-by: Greg Kroah-Hartman Link: https://lore.kernel.org/r/20230717143718.1715773-1-robh@kernel.org Signed-off-by: Rob Herring --- include/linux/of_platform.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h index d8045bcfc35e..fadfea575485 100644 --- a/include/linux/of_platform.h +++ b/include/linux/of_platform.h @@ -127,10 +127,4 @@ static inline int devm_of_platform_populate(struct device *dev) static inline void devm_of_platform_depopulate(struct device *dev) { } #endif -#if defined(CONFIG_OF_DYNAMIC) && defined(CONFIG_OF_ADDRESS) -extern void of_platform_register_reconfig_notifier(void); -#else -static inline void of_platform_register_reconfig_notifier(void) { } -#endif - #endif /* _LINUX_OF_PLATFORM_H */ -- cgit v1.2.3 From 8a60a041eada0fbfdc7b6b7a10fdf68ae6a840ce Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Thu, 3 Aug 2023 17:51:01 -0700 Subject: bpf: fix inconsistent return types of bpf_xdp_copy_buf(). Fix inconsistent return types in two implementations of bpf_xdp_copy_buf(). There are two implementations: one is an empty implementation whose return type does not match the actual implementation. Suggested-by: Alexei Starovoitov Signed-off-by: Kui-Feng Lee Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230804005101.1534505-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 2d6fe30bad5f..761af6b3cf2b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1572,10 +1572,9 @@ static inline void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) return NULL; } -static inline void *bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, - unsigned long len, bool flush) +static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, + unsigned long len, bool flush) { - return NULL; } #endif /* CONFIG_NET */ -- cgit v1.2.3 From 2746f13f6f1df7999001d6595b16f789ecc28ad1 Mon Sep 17 00:00:00 2001 From: Biju Das Date: Tue, 25 Jul 2023 18:51:40 +0100 Subject: clk: Fix undefined reference to `clk_rate_exclusive_{get,put}' The COMMON_CLK config is not enabled in some of the architectures. This causes below build issues: pwm-rz-mtu3.c:(.text+0x114): undefined reference to `clk_rate_exclusive_put' pwm-rz-mtu3.c:(.text+0x32c): undefined reference to `clk_rate_exclusive_get' Fix these issues by moving clk_rate_exclusive_{get,put} inside COMMON_CLK code block, as clk.c is enabled by COMMON_CLK. Fixes: 55e9b8b7b806 ("clk: add clk_rate_exclusive api") Reported-by: kernel test robot Closes: https://lore.kernel.org/all/202307251752.vLfmmhYm-lkp@intel.com/ Signed-off-by: Biju Das Link: https://lore.kernel.org/r/20230725175140.361479-1-biju.das.jz@bp.renesas.com Signed-off-by: Stephen Boyd --- include/linux/clk.h | 80 ++++++++++++++++++++++++++--------------------------- 1 file changed, 40 insertions(+), 40 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk.h b/include/linux/clk.h index 1ef013324237..06f1b292f8a0 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -183,6 +183,39 @@ int clk_get_scaled_duty_cycle(struct clk *clk, unsigned int scale); */ bool clk_is_match(const struct clk *p, const struct clk *q); +/** + * clk_rate_exclusive_get - get exclusivity over the rate control of a + * producer + * @clk: clock source + * + * This function allows drivers to get exclusive control over the rate of a + * provider. It prevents any other consumer to execute, even indirectly, + * opereation which could alter the rate of the provider or cause glitches + * + * If exlusivity is claimed more than once on clock, even by the same driver, + * the rate effectively gets locked as exclusivity can't be preempted. + * + * Must not be called from within atomic context. + * + * Returns success (0) or negative errno. + */ +int clk_rate_exclusive_get(struct clk *clk); + +/** + * clk_rate_exclusive_put - release exclusivity over the rate control of a + * producer + * @clk: clock source + * + * This function allows drivers to release the exclusivity it previously got + * from clk_rate_exclusive_get() + * + * The caller must balance the number of clk_rate_exclusive_get() and + * clk_rate_exclusive_put() calls. + * + * Must not be called from within atomic context. + */ +void clk_rate_exclusive_put(struct clk *clk); + #else static inline int clk_notifier_register(struct clk *clk, @@ -236,6 +269,13 @@ static inline bool clk_is_match(const struct clk *p, const struct clk *q) return p == q; } +static inline int clk_rate_exclusive_get(struct clk *clk) +{ + return 0; +} + +static inline void clk_rate_exclusive_put(struct clk *clk) {} + #endif #ifdef CONFIG_HAVE_CLK_PREPARE @@ -583,38 +623,6 @@ struct clk *devm_clk_get_optional_enabled(struct device *dev, const char *id); */ struct clk *devm_get_clk_from_child(struct device *dev, struct device_node *np, const char *con_id); -/** - * clk_rate_exclusive_get - get exclusivity over the rate control of a - * producer - * @clk: clock source - * - * This function allows drivers to get exclusive control over the rate of a - * provider. It prevents any other consumer to execute, even indirectly, - * opereation which could alter the rate of the provider or cause glitches - * - * If exlusivity is claimed more than once on clock, even by the same driver, - * the rate effectively gets locked as exclusivity can't be preempted. - * - * Must not be called from within atomic context. - * - * Returns success (0) or negative errno. - */ -int clk_rate_exclusive_get(struct clk *clk); - -/** - * clk_rate_exclusive_put - release exclusivity over the rate control of a - * producer - * @clk: clock source - * - * This function allows drivers to release the exclusivity it previously got - * from clk_rate_exclusive_get() - * - * The caller must balance the number of clk_rate_exclusive_get() and - * clk_rate_exclusive_put() calls. - * - * Must not be called from within atomic context. - */ -void clk_rate_exclusive_put(struct clk *clk); /** * clk_enable - inform the system when the clock source should be running. @@ -974,14 +982,6 @@ static inline void clk_bulk_put_all(int num_clks, struct clk_bulk_data *clks) {} static inline void devm_clk_put(struct device *dev, struct clk *clk) {} - -static inline int clk_rate_exclusive_get(struct clk *clk) -{ - return 0; -} - -static inline void clk_rate_exclusive_put(struct clk *clk) {} - static inline int clk_enable(struct clk *clk) { return 0; -- cgit v1.2.3 From 045ad46441a1cf2ff358af0126f8c57f337804fa Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 4 Aug 2023 17:39:07 +0300 Subject: lib/string_helpers: Add kstrdup_and_replace() helper Duplicate a NULL-terminated string and replace all occurrences of the old character with a new one. In other words, provide functionality of kstrdup() + strreplace(). Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230804143910.15504-2-andriy.shevchenko@linux.intel.com Signed-off-by: Stephen Boyd --- include/linux/string_helpers.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 789ab30045da..9d1f5bb74dd5 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -109,6 +109,8 @@ char *kstrdup_quotable(const char *src, gfp_t gfp); char *kstrdup_quotable_cmdline(struct task_struct *task, gfp_t gfp); char *kstrdup_quotable_file(struct file *file, gfp_t gfp); +char *kstrdup_and_replace(const char *src, char old, char new, gfp_t gfp); + char **kasprintf_strarray(gfp_t gfp, const char *prefix, size_t n); void kfree_strarray(char **array, size_t n); -- cgit v1.2.3 From f5992717b5826debc4aa58d4da6233563b139320 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 8 Jul 2023 12:13:45 +0200 Subject: kobject: Reorder fields in 'struct kobject' Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct kobject' from 256 to 244 bytes. This structure is often included in some other structures. So these other structures will also benefit from this 8 bytes saving. This is especially nice for structure like 'cma_kobject' or 'class_dir' that are now 256 bytes long. When they are kzalloc()'ed, 256 bytes are allocated, instead of 512. Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/6c7d1e3005dbec5483bdb9b7b60071175bf7bf70.1688811201.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman --- include/linux/kobject.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kobject.h b/include/linux/kobject.h index c392c811d9ad..c30affcc43b4 100644 --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -69,14 +69,16 @@ struct kobject { const struct kobj_type *ktype; struct kernfs_node *sd; /* sysfs directory entry */ struct kref kref; -#ifdef CONFIG_DEBUG_KOBJECT_RELEASE - struct delayed_work release; -#endif + unsigned int state_initialized:1; unsigned int state_in_sysfs:1; unsigned int state_add_uevent_sent:1; unsigned int state_remove_uevent_sent:1; unsigned int uevent_suppress:1; + +#ifdef CONFIG_DEBUG_KOBJECT_RELEASE + struct delayed_work release; +#endif }; __printf(2, 3) int kobject_set_name(struct kobject *kobj, const char *name, ...); -- cgit v1.2.3 From 9e0cace7a6254070159ebd86497eadc29ea307ca Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 21 Jul 2023 16:13:09 +0300 Subject: driver core: Move dev_err_probe() to where it belogs dev_err_probe() belongs to the printing API, hence move the definition from device.h to dev_printk.h. There is no change to the callers at all, since: 1) implementation is located in the same core.c; 2) dev_printk.h is guaranteed to be included by device.h. Signed-off-by: Andy Shevchenko Reviewed-by: Andi Shyti Link: https://lore.kernel.org/r/20230721131309.16821-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dev_printk.h | 2 ++ include/linux/device.h | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dev_printk.h b/include/linux/dev_printk.h index 8904063d4c9f..6bfe70decc9f 100644 --- a/include/linux/dev_printk.h +++ b/include/linux/dev_printk.h @@ -274,4 +274,6 @@ do { \ WARN_ONCE(condition, "%s %s: " format, \ dev_driver_string(dev), dev_name(dev), ## arg) +__printf(3, 4) int dev_err_probe(const struct device *dev, int err, const char *fmt, ...); + #endif /* _DEVICE_PRINTK_H_ */ diff --git a/include/linux/device.h b/include/linux/device.h index bbaeabd04b0d..731337ead54e 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1215,8 +1215,6 @@ void device_link_remove(void *consumer, struct device *supplier); void device_links_supplier_sync_state_pause(void); void device_links_supplier_sync_state_resume(void); -__printf(3, 4) int dev_err_probe(const struct device *dev, int err, const char *fmt, ...); - /* Create alias, so I can be autoloaded. */ #define MODULE_ALIAS_CHARDEV(major,minor) \ MODULE_ALIAS("char-major-" __stringify(major) "-" __stringify(minor)) -- cgit v1.2.3 From afdf5dd33a91bd2c3921e477191f2c1e738efd44 Mon Sep 17 00:00:00 2001 From: Ivan Orlov Date: Tue, 20 Jun 2023 20:31:42 +0200 Subject: HID: roccat: make all 'class' structures const Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: Stefan Achatz Cc: Jiri Kosina Cc: Benjamin Tissoires Cc: linux-input@vger.kernel.org Suggested-by: Greg Kroah-Hartman Signed-off-by: Ivan Orlov Link: https://lore.kernel.org/r/20230620183141.681353-3-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/hid-roccat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid-roccat.h b/include/linux/hid-roccat.h index 3214fb0815fc..753654fff07f 100644 --- a/include/linux/hid-roccat.h +++ b/include/linux/hid-roccat.h @@ -16,7 +16,7 @@ #ifdef __KERNEL__ -int roccat_connect(struct class *klass, struct hid_device *hid, +int roccat_connect(const struct class *klass, struct hid_device *hid, int report_size); void roccat_disconnect(int minor); int roccat_report_event(int minor, u8 const *data); -- cgit v1.2.3 From 79038a99445f69c5d28494dd4f8c6f0509f65b2e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 24 Jul 2023 14:18:16 +0200 Subject: kernfs: add stub helper for kernfs_generic_poll() In some randconfig builds, kernfs ends up being disabled, so there is no prototype for kernfs_generic_poll() In file included from kernel/sched/build_utility.c:97: kernel/sched/psi.c:1479:3: error: implicit declaration of function 'kernfs_generic_poll' is invalid in C99 [-Werror,-Wimplicit-function-declaration] kernfs_generic_poll(t->of, wait); ^ Add a stub helper for it, as we have it for other kernfs functions. Fixes: aff037078ecae ("sched/psi: use kernfs polling functions for PSI trigger polling") Fixes: 147e1a97c4a0b ("fs: kernfs: add poll file operation") Signed-off-by: Arnd Bergmann Reviewed-by: Chengming Zhou Link: https://lore.kernel.org/r/20230724121823.1357562-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 73f5c120def8..2a36f3218b51 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -550,6 +550,10 @@ static inline int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { return -ENOSYS; } +static inline __poll_t kernfs_generic_poll(struct kernfs_open_file *of, + struct poll_table_struct *pt) +{ return -ENOSYS; } + static inline void kernfs_notify(struct kernfs_node *kn) { } static inline int kernfs_xattr_get(struct kernfs_node *kn, const char *name, -- cgit v1.2.3 From e15e117bbbe18258a5ad506bbf6c58ff129c9576 Mon Sep 17 00:00:00 2001 From: Wang Jianjian Date: Wed, 2 Aug 2023 22:45:34 +0800 Subject: jbd2: remove unused t_handle_lock Since commit f7f497cb7024 ("jbd2: kill t_handle_lock transaction spinlock"), this lock has been no use. Fixes: f7f497cb7024 ("jbd2: kill t_handle_lock transaction spinlock") Signed-off-by: Wang Jianjian Reviewed-by: Ritesh Harjani (IBM) Link: https://lore.kernel.org/r/tencent_8477CBE568348A1862C64E393D587B342008@qq.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 44c298aa58d4..52772c826c86 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -630,11 +630,6 @@ struct transaction_s */ struct list_head t_inode_list; - /* - * Protects info related to handles - */ - spinlock_t t_handle_lock; - /* * Longest time some handle had to wait for running transaction */ -- cgit v1.2.3 From d58f2e15aa0c07f6f03ec71f64d7697ca43d04a1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Aug 2023 14:46:12 +0000 Subject: tcp: set TCP_USER_TIMEOUT locklessly icsk->icsk_user_timeout can be set locklessly, if all read sides use READ_ONCE(). Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller --- include/linux/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index d16abdb3541a..3c5efeeb024f 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -564,6 +564,6 @@ void __tcp_sock_set_nodelay(struct sock *sk, bool on); void tcp_sock_set_nodelay(struct sock *sk); void tcp_sock_set_quickack(struct sock *sk, int val); int tcp_sock_set_syncnt(struct sock *sk, int val); -void tcp_sock_set_user_timeout(struct sock *sk, u32 val); +int tcp_sock_set_user_timeout(struct sock *sk, int val); #endif /* _LINUX_TCP_H */ -- cgit v1.2.3 From 3e3271549670783be20e233a2b78a87a0b04c715 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 5 Aug 2023 12:25:01 -0700 Subject: vfs: get rid of old '->iterate' directory operation All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds Signed-off-by: Christian Brauner --- include/linux/fs.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..562f2623c9c9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1780,7 +1780,6 @@ struct file_operations { ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iopoll)(struct kiocb *kiocb, struct io_comp_batch *, unsigned int flags); - int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); @@ -1817,6 +1816,13 @@ struct file_operations { unsigned int poll_flags); } __randomize_layout; +/* Wrap a directory iterator that needs exclusive inode access */ +int wrap_directory_iterator(struct file *, struct dir_context *, + int (*) (struct file *, struct dir_context *)); +#define WRAP_DIR_ITER(x) \ + static int shared_##x(struct file *file , struct dir_context *ctx) \ + { return wrap_directory_iterator(file, ctx, x); } + struct inode_operations { struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *); -- cgit v1.2.3 From 03ffa9af3a5ff88f9855b18584d00aca7eb43e6d Mon Sep 17 00:00:00 2001 From: Dhaval Shah Date: Mon, 31 Jul 2023 15:20:23 +0530 Subject: firmware: xilinx: Add support to get platform information Add function to get family code and sub family code from the idcode. This family code and sub family code helps to identify the platform. Family code of any platform is on bits 21 to 27 and Sub family code is on bits 19 and 20. Signed-off-by: Dhaval Shah Signed-off-by: Sai Krishna Potthuri Reviewed-by: Michal Simek Link: https://lore.kernel.org/r/20230731095026.3766675-2-sai.krishna.potthuri@amd.com Signed-off-by: Linus Walleij --- include/linux/firmware/xlnx-zynqmp.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 9dda7d9898ff..f0b341c1c847 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -34,6 +34,17 @@ /* PM API versions */ #define PM_API_VERSION_2 2 +#define ZYNQMP_FAMILY_CODE 0x23 +#define VERSAL_FAMILY_CODE 0x26 + +/* When all subfamily of platform need to support */ +#define ALL_SUB_FAMILY_CODE 0x00 +#define VERSAL_SUB_FAMILY_CODE 0x01 +#define VERSALNET_SUB_FAMILY_CODE 0x03 + +#define FAMILY_CODE_MASK GENMASK(27, 21) +#define SUB_FAMILY_CODE_MASK GENMASK(20, 19) + /* ATF only commands */ #define TF_A_PM_REGISTER_SGI 0xa04 #define PM_GET_TRUSTZONE_VERSION 0xa03 -- cgit v1.2.3 From aa5ed7b3fb3939ca6a3605c6fdc7399bb219ec23 Mon Sep 17 00:00:00 2001 From: Sai Krishna Potthuri Date: Mon, 31 Jul 2023 15:20:24 +0530 Subject: firmware: xilinx: Add version check for TRISTATE configuration Support for configuring TRISTATE parameter is added in ZYNQMP PMUFW(Xilinx ZynqMP Platform Management Firmware) Configuration Param Set version 2.0. If the requested configuration is TRISTATE and platform is ZYNQMP then check the version before requesting Xilinx firmware to set the configuration. Signed-off-by: Sai Krishna Potthuri Reviewed-by: Michal Simek Link: https://lore.kernel.org/r/20230731095026.3766675-3-sai.krishna.potthuri@amd.com Signed-off-by: Linus Walleij --- include/linux/firmware/xlnx-zynqmp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index f0b341c1c847..e8b12ec8b060 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -34,6 +34,8 @@ /* PM API versions */ #define PM_API_VERSION_2 2 +#define PM_PINCTRL_PARAM_SET_VERSION 2 + #define ZYNQMP_FAMILY_CODE 0x23 #define VERSAL_FAMILY_CODE 0x26 -- cgit v1.2.3 From 2326dee41c01c8d31574a62045fb1c5f242885f0 Mon Sep 17 00:00:00 2001 From: Marco Morandini Date: Mon, 17 Jul 2023 18:40:52 +0200 Subject: HID: Add introduction about HID for non-kernel programmers Add an introduction about HID meant for the casual programmer that is trying either to fix his device or to understand what is going wrong. Signed-off-by: Marco Morandini Co-authored-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 39e21e3815ad..463d2e66b2c3 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -341,6 +341,29 @@ struct hid_item { */ #define MAX_USBHID_BOOT_QUIRKS 4 +/** + * DOC: HID quirks + * | @HID_QUIRK_NOTOUCH: + * | @HID_QUIRK_IGNORE: ignore this device + * | @HID_QUIRK_NOGET: + * | @HID_QUIRK_HIDDEV_FORCE: + * | @HID_QUIRK_BADPAD: + * | @HID_QUIRK_MULTI_INPUT: + * | @HID_QUIRK_HIDINPUT_FORCE: + * | @HID_QUIRK_ALWAYS_POLL: + * | @HID_QUIRK_INPUT_PER_APP: + * | @HID_QUIRK_X_INVERT: + * | @HID_QUIRK_Y_INVERT: + * | @HID_QUIRK_SKIP_OUTPUT_REPORTS: + * | @HID_QUIRK_SKIP_OUTPUT_REPORT_ID: + * | @HID_QUIRK_NO_OUTPUT_REPORTS_ON_INTR_EP: + * | @HID_QUIRK_HAVE_SPECIAL_DRIVER: + * | @HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE: + * | @HID_QUIRK_FULLSPEED_INTERVAL: + * | @HID_QUIRK_NO_INIT_REPORTS: + * | @HID_QUIRK_NO_IGNORE: + * | @HID_QUIRK_NO_INPUT_SYNC: + */ /* BIT(0) reserved for backward compatibility, was HID_QUIRK_INVERT */ #define HID_QUIRK_NOTOUCH BIT(1) #define HID_QUIRK_IGNORE BIT(2) -- cgit v1.2.3 From bcf847e4dbb8b1bc10a37533f079d582ee77cdf5 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 2 Aug 2023 21:32:01 +0800 Subject: iommu/amd: Remove unsued extern declaration amd_iommu_init_hardware() Commit 2c0ae1720c09 ("iommu/amd: Convert iommu initialization to state machine") left behind this. Signed-off-by: YueHaibing Reviewed-by: Suravee Suthikulpanit Link: https://lore.kernel.org/r/20230802133201.17512-1-yuehaibing@huawei.com Signed-off-by: Joerg Roedel --- include/linux/amd-iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h index 953e6f12fa1c..99a5201d9e62 100644 --- a/include/linux/amd-iommu.h +++ b/include/linux/amd-iommu.h @@ -32,7 +32,6 @@ struct task_struct; struct pci_dev; extern int amd_iommu_detect(void); -extern int amd_iommu_init_hardware(void); /** * amd_iommu_init_device() - Init device for use with IOMMUv2 driver -- cgit v1.2.3 From 33af38d85b5c9e8ffe26f9712319ed28c5ded64f Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 5 Aug 2023 19:04:06 +0800 Subject: cpu/hotplug: Remove unused function declaration cpu_set_state_online() Commit 5356297d12d9 ("cpu/hotplug: Remove cpu_report_state() and related unused cruft") removed function but leave the declaration. Signed-off-by: Yue Haibing Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20230805110406.45900-1-yuehaibing@huawei.com --- include/linux/cpu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6b326a9e8191..ed56b2534500 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -191,7 +191,6 @@ void arch_cpu_finalize_init(void); static inline void arch_cpu_finalize_init(void) { } #endif -void cpu_set_state_online(int cpu); void play_idle_precise(u64 duration_ns, u64 latency_ns); static inline void play_idle(unsigned long duration_us) -- cgit v1.2.3 From f3147015fa0769cf1dcbfdb9040ad380cc4daeb5 Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Mon, 12 Jun 2023 11:58:14 +0300 Subject: net/mlx5: Add IRQ vector to CPU lookup function Currently, once driver load completes, IRQ requests were performed for all vectors. However, as we move to support dynamic creation of EQs, this will not be the case as some IRQs will not exist at this stage. Thus, in such case, use the default CPU to IRQ mapping which is the serial mapping based on IRQ vector index. Meaning, the n'th vector gets mapped to the n'th CPU. Introduce an API function mlx5_comp_vector_cpu() that takes an IRQ index and provides the corresponding CPU mapping. It utilizes the existing IRQ affinity if defined, or resorts to the default serialized CPU mapping otherwise. Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index fa70c25423b2..e686722fa4ca 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1109,8 +1109,7 @@ int mlx5_alloc_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg, void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg); unsigned int mlx5_comp_vectors_count(struct mlx5_core_dev *dev); -struct cpumask * -mlx5_comp_irq_get_affinity_mask(struct mlx5_core_dev *dev, int vector); +int mlx5_comp_vector_get_cpu(struct mlx5_core_dev *dev, int vector); unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev); int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index, u8 roce_version, u8 roce_l3_type, const u8 *gid, -- cgit v1.2.3 From 674dd4e2e04e7a62bfacf28129e0808f33395bdf Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Thu, 22 Jun 2023 18:52:44 +0300 Subject: net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() To accurately represent its purpose, rename the function that retrieves the value of maximum vectors from mlx5_comp_vectors_count() to mlx5_comp_vectors_max(). Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e686722fa4ca..43c4fd26c69a 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1108,7 +1108,7 @@ int mlx5_alloc_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg, bool map_wc, bool fast_path); void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg); -unsigned int mlx5_comp_vectors_count(struct mlx5_core_dev *dev); +unsigned int mlx5_comp_vectors_max(struct mlx5_core_dev *dev); int mlx5_comp_vector_get_cpu(struct mlx5_core_dev *dev, int vector); unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev); int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index, -- cgit v1.2.3 From f14c1a14e63227a65faa68237687784a6dd2e922 Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Mon, 12 Jun 2023 10:13:50 +0300 Subject: net/mlx5: Allocate completion EQs dynamically This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 43c4fd26c69a..3e1017d764b7 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1058,7 +1058,7 @@ void mlx5_unregister_debugfs(void); void mlx5_fill_page_frag_array_perm(struct mlx5_frag_buf *buf, __be64 *pas, u8 perm); void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas); -int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn); +int mlx5_comp_eqn_get(struct mlx5_core_dev *dev, u16 vecidx, int *eqn); int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); -- cgit v1.2.3 From 554b841d470338a3b1d6335b14ee1cd0c8f5d754 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Wed, 2 Aug 2023 07:25:33 -0500 Subject: tpm: Disable RNG for all AMD fTPMs The TPM RNG functionality is not necessary for entropy when the CPU already supports the RDRAND instruction. The TPM RNG functionality was previously disabled on a subset of AMD fTPM series, but reports continue to show problems on some systems causing stutter root caused to TPM RNG functionality. Expand disabling TPM RNG use for all AMD fTPMs whether they have versions that claim to have fixed or not. To accomplish this, move the detection into part of the TPM CRB registration and add a flag indicating that the TPM should opt-out of registration to hwrng. Cc: stable@vger.kernel.org # 6.1.y+ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Reported-by: daniil.stas@posteo.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719 Reported-by: bitlord0xff@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Signed-off-by: Mario Limonciello Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen --- include/linux/tpm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/tpm.h b/include/linux/tpm.h index 6a1e8f157255..4ee9d13749ad 100644 --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -283,6 +283,7 @@ enum tpm_chip_flags { TPM_CHIP_FLAG_FIRMWARE_POWER_MANAGED = BIT(6), TPM_CHIP_FLAG_FIRMWARE_UPGRADE = BIT(7), TPM_CHIP_FLAG_SUSPENDED = BIT(8), + TPM_CHIP_FLAG_HWRNG_DISABLED = BIT(9), }; #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev) -- cgit v1.2.3 From 0437719c1a97791481c5fd59642494f2108701a8 Mon Sep 17 00:00:00 2001 From: Hao Jia Date: Mon, 7 Aug 2023 11:29:30 +0800 Subject: cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants The member variable bstat of the structure cgroup_rstat_cpu records the per-cpu time of the cgroup itself, but does not include the per-cpu time of its descendants. The per-cpu time including descendants is very useful for calculating the per-cpu usage of cgroups. Although we can indirectly obtain the total per-cpu time of the cgroup and its descendants by accumulating the per-cpu bstat of each descendant of the cgroup. But after a child cgroup is removed, we will lose its bstat information. This will cause the cumulative value to be non-monotonic, thus affecting the accuracy of cgroup per-cpu usage. So we add the subtree_bstat variable to record the total per-cpu time of this cgroup and its descendants, which is similar to "cpuacct.usage*" in cgroup v1. And this is also helpful for the migration from cgroup v1 to cgroup v2. After adding this variable, we can obtain the per-cpu time of cgroup and its descendants in user mode through eBPF/drgn, etc. And we are still trying to determine how to expose it in the cgroupfs interface. Suggested-by: Tejun Heo Signed-off-by: Hao Jia Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 8a0d5466c7be..7a2862172f51 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -341,6 +341,20 @@ struct cgroup_rstat_cpu { */ struct cgroup_base_stat last_bstat; + /* + * This field is used to record the cumulative per-cpu time of + * the cgroup and its descendants. Currently it can be read via + * eBPF/drgn etc, and we are still trying to determine how to + * expose it in the cgroupfs interface. + */ + struct cgroup_base_stat subtree_bstat; + + /* + * Snapshots at the last reading. These are used to calculate the + * deltas to propagate to the per-cpu subtree_bstat. + */ + struct cgroup_base_stat last_subtree_bstat; + /* * Child cgroups with stat updates on this cpu since the last read * are linked on the parent's ->updated_children through -- cgit v1.2.3 From 8217ad0a435ff06d651d7298ea8ae8d72388179e Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 7 Aug 2023 18:27:15 +0200 Subject: decompress: Use 8 byte alignment The ZSTD decompressor requires malloc() allocations to be 8 byte aligned, so ensure that this the case. Signed-off-by: Ard Biesheuvel Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/r/20230807162720.545787-19-ardb@kernel.org --- include/linux/decompress/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h index 9192986b1a73..ac862422df15 100644 --- a/include/linux/decompress/mm.h +++ b/include/linux/decompress/mm.h @@ -48,7 +48,7 @@ MALLOC_VISIBLE void *malloc(int size) if (!malloc_ptr) malloc_ptr = free_mem_ptr; - malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */ + malloc_ptr = (malloc_ptr + 7) & ~7; /* Align */ p = (void *)malloc_ptr; malloc_ptr += size; -- cgit v1.2.3 From 26cfb838aa002a5c03319f7fce87e9313e794351 Mon Sep 17 00:00:00 2001 From: Johannes Zink Date: Tue, 1 Aug 2023 17:44:29 +0200 Subject: net: stmmac: correct MAC propagation delay The IEEE1588 Standard specifies that the timestamps of Packets must be captured when the PTP message timestamp point (leading edge of first octet after the start of frame delimiter) crosses the boundary between the node and the network. As the MAC latches the timestamp at an internal point, the captured timestamp must be corrected for the additional data transmission latency, as described in the publicly available datasheet [1]. This patch only corrects for the MAC-Internal delay, which can be read out from the MAC_Ingress_Timestamp_Latency register on DWMAC version 5, since the Phy framework currently does not support querying the Phy ingress and egress latency. The Closs Domain Crossing Circuits errors as indicated in [1] are already being accounted in the stmmac_get_tx_hwtstamp() function and are not corrected here. As the Latency varies for different link speeds and MII modes of operation, the correction value needs to be updated on each link state change. As the delay also causes a phase shift in the timestamp counter compared to the rest of the network, this correction will also reduce phase error when generating PPS outputs from the timestamp counter. Since the correction registers may be unavailable on some hardware and no feature bits are documented for dynamically detection of the MAC propagation delay readout, introduce a feature bit to explicitely enable MAC delay Correction in the gluecode driver. [1] i.MX8MP Reference Manual, rev.1 Section 11.7.2.5.3 "Timestamp correction" Signed-off-by: Johannes Zink Link: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v2-1-3366f38ee9a6@pengutronix.de Link: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v3-1-61e63427735e@pengutronix.de Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 3d0702510224..652404c03944 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -218,6 +218,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) #define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING BIT(11) +#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY BIT(12) struct plat_stmmacenet_data { int bus_id; -- cgit v1.2.3 From a9ca9f9ceff382b58b488248f0c0da9e157f5d06 Mon Sep 17 00:00:00 2001 From: Yunsheng Lin Date: Fri, 4 Aug 2023 20:05:24 +0200 Subject: page_pool: split types and declarations from page_pool.h Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin Suggested-by: Jakub Kicinski Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 16a49ba534e4..888e3d7e74c1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -32,7 +32,7 @@ #include #include #include -#include +#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif -- cgit v1.2.3 From 75eaf63ea7afeafd026ffef03bdc69e31f10829b Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 4 Aug 2023 20:05:25 +0200 Subject: net: skbuff: don't include to Currently, touching triggers a rebuild of more than half of the kernel. That's because it's included in . And each new include to page_pool/types.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool_types.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't work with skbs at all and the former name tells nothing. The #if guards here are only to not compile and have it in the vmlinux when not needed -- both call sites are already guarded. Now, touching page_pool_types.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski # make skbuff.h less heavy Suggested-by: Alexander Duyck # move to skbuff.c Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 888e3d7e74c1..aa57e2eca33b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -32,7 +32,6 @@ #include #include #include -#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif @@ -3421,13 +3420,15 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } +bool napi_pp_put_page(struct page *page, bool napi_safe); + static inline void napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) { struct page *page = skb_frag_page(frag); #ifdef CONFIG_PAGE_POOL - if (recycle && page_pool_return_skb_page(page, napi_safe)) + if (recycle && napi_pp_put_page(page, napi_safe)) return; #endif put_page(page); -- cgit v1.2.3 From ff4e538c8c3e675a15e1e49509c55951832e0451 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 4 Aug 2023 20:05:28 +0200 Subject: page_pool: add a lockdep check for recycling in hardirq Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don't think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :) Acked-by: Jesper Dangaard Brouer Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/linux/lockdep.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 310f85903c91..dc2844b071c2 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -625,6 +625,12 @@ do { \ WARN_ON_ONCE(__lockdep_enabled && !this_cpu_read(hardirq_context)); \ } while (0) +#define lockdep_assert_no_hardirq() \ +do { \ + WARN_ON_ONCE(__lockdep_enabled && (this_cpu_read(hardirq_context) || \ + !this_cpu_read(hardirqs_enabled))); \ +} while (0) + #define lockdep_assert_preemption_enabled() \ do { \ WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \ @@ -659,6 +665,7 @@ do { \ # define lockdep_assert_irqs_enabled() do { } while (0) # define lockdep_assert_irqs_disabled() do { } while (0) # define lockdep_assert_in_irq() do { } while (0) +# define lockdep_assert_no_hardirq() do { } while (0) # define lockdep_assert_preemption_enabled() do { } while (0) # define lockdep_assert_preemption_disabled() do { } while (0) -- cgit v1.2.3 From a3c485a5d8d47af5d2d1a0e5c3b7a1ed223669f9 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 7 Aug 2023 10:59:54 +0200 Subject: bpf: Add support for bpf_get_func_ip helper for uprobe program Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko Tested-by: Alan Maguire [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa Tested-by: Viktor Malik Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index abe75063630b..db3fe5a61b05 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1819,6 +1819,7 @@ struct bpf_cg_run_ctx { struct bpf_trace_run_ctx { struct bpf_run_ctx run_ctx; u64 bpf_cookie; + bool is_uprobe; }; struct bpf_tramp_run_ctx { @@ -1867,6 +1868,8 @@ bpf_prog_run_array(const struct bpf_prog_array *array, if (unlikely(!array)) return ret; + run_ctx.is_uprobe = false; + migrate_disable(); old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; @@ -1891,8 +1894,8 @@ bpf_prog_run_array(const struct bpf_prog_array *array, * rcu-protected dynamically sized maps. */ static __always_inline u32 -bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, - const void *ctx, bpf_prog_run_fn run_prog) +bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, + const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; @@ -1906,6 +1909,8 @@ bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, rcu_read_lock_trace(); migrate_disable(); + run_ctx.is_uprobe = true; + array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); if (unlikely(!array)) goto out; -- cgit v1.2.3 From 636b927eba5bc633753f8eb80f35e1d5be806e51 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:23 -1000 Subject: workqueue: Make unbound workqueues to use per-cpu pool_workqueues A pwq (pool_workqueue) represents an association between a workqueue and a worker_pool. When a work item is queued, the workqueue selects the pwq to use, which in turn determines the pool, and queues the work item to the pool through the pwq. pwq is also what implements the maximum concurrency limit - @max_active. As a per-cpu workqueue should be assocaited with a different worker_pool on each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq. However, unbound workqueues were sharing a pwq within each NUMA node by default. The sharing has several downsides: * Because @max_active is per-pwq, the meaning of @max_active changes depending on the machine configuration and whether workqueue NUMA locality support is enabled. * Makes per-cpu and unbound code deviate. * Gets in the way of making workqueue CPU locality awareness more flexible. This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu workqueues do by making the following changes: * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound workqueues. * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs the specified pwq to the target CPU's wq->cpu_pwq. * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq. This makes the return value of wq_calc_node_cpumask() unnecessary. It now returns void. * @max_active now means the same thing for both per-cpu and unbound workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer used in workqueue implementation and will be removed later. * All unbound pwq operations which used to be per-numa-node are now per-cpu. For most unbound workqueue users, this shouldn't cause noticeable changes. Work item issue and completion will be a small bit faster, flush_workqueue() would become a bit more expensive, and the total concurrency limit would likely become higher. All @max_active==1 use cases are currently being audited for conversion into alloc_ordered_workqueue() and they shouldn't be affected once the audit and conversion is complete. One area where the behavior change may be more noticeable is workqueue_congested() as the reported congestion state is now per CPU instead of NUMA node. There are only two users of this interface - drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are cc'd. Inputs on the behavior change would be very much appreciated. Signed-off-by: Tejun Heo Acked-by: Dennis Dalessandro Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Karsten Graul Cc: Wenjia Zhang Cc: Jan Karcher --- include/linux/workqueue.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 1a4223cbdb5f..52d1a6225b44 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -343,14 +343,10 @@ enum { __WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */ WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */ - WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */ + WQ_UNBOUND_MAX_ACTIVE = WQ_MAX_ACTIVE, WQ_DFL_ACTIVE = WQ_MAX_ACTIVE / 2, }; -/* unbound wq's aren't per-cpu, scale max_active according to #cpus */ -#define WQ_UNBOUND_MAX_ACTIVE \ - max_t(int, WQ_MAX_ACTIVE, num_possible_cpus() * WQ_MAX_UNBOUND_PER_CPU) - /* * System-wide workqueues which are always present. * @@ -391,7 +387,7 @@ extern struct workqueue_struct *system_freezable_power_efficient_wq; * alloc_workqueue - allocate a workqueue * @fmt: printf format for the name of the workqueue * @flags: WQ_* flags - * @max_active: max in-flight work items, 0 for default + * @max_active: max in-flight work items per CPU, 0 for default * remaining args: args for @fmt * * Allocate a workqueue with the specified parameters. For detailed -- cgit v1.2.3 From af73f5c9febe5095ee492ae43e9898fca65ced70 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:23 -1000 Subject: workqueue: Rename workqueue_attrs->no_numa to ->ordered With the recent removal of NUMA related module param and sysfs knob, workqueue_attrs->no_numa is now only used to implement ordered workqueues. Let's rename the field so that it's less confusing especially with the planned CPU affinity awareness improvements. Just a rename. No functional changes. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 52d1a6225b44..f0c10f491b15 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -142,13 +142,13 @@ struct workqueue_attrs { cpumask_var_t cpumask; /** - * @no_numa: disable NUMA affinity + * @ordered: work items must be executed one by one in queueing order * - * Unlike other fields, ``no_numa`` isn't a property of a worker_pool. It + * Unlike other fields, ``ordered`` isn't a property of a worker_pool. It * only modifies how :c:func:`apply_workqueue_attrs` select pools and thus * doesn't participate in pool hash calculations or equality comparisons. */ - bool no_numa; + bool ordered; }; static inline struct delayed_work *to_delayed_work(struct work_struct *work) -- cgit v1.2.3 From 2930155b2e27232c033970f2e110aaac4187cb9e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:24 -1000 Subject: workqueue: Initialize unbound CPU pods later in the boot During boot, to initialize unbound CPU pods, wq_pod_init() was called from workqueue_init(). This is early enough for NUMA nodes to be set up but before SMP is brought up and CPU topology information is populated. Workqueue is in the process of improving CPU locality for unbound workqueues and will need access to topology information during pod init. This adds a new init function workqueue_init_topology() which is called after CPU topology information is available and replaces wq_pod_init(). As unbound CPU pods are now initialized after workqueues are activated, we need to revisit the workqueues to apply the pod configuration. Workqueues which are created before workqueue_init_topology() are set up so that they always use the default worker pool. After pods are set up in workqueue_init_topology(), wq_update_pod() is called on all existing workqueues to update the pool associations accordingly. Note that wq_update_pod_attrs_buf allocation is moved to workqueue_init_early(). This isn't necessary right now but enables further generalization of pod handling in the future. This patch changes the initialization sequence but the end result should be the same. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index f0c10f491b15..bab9fa3453ed 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -672,5 +672,6 @@ int workqueue_offline_cpu(unsigned int cpu); void __init workqueue_init_early(void); void __init workqueue_init(void); +void __init workqueue_init_topology(void); #endif -- cgit v1.2.3 From 84193c07105c62d206fb230b2f29002226628989 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:24 -1000 Subject: workqueue: Generalize unbound CPU pods While renamed to pod, the code still assumes that the pods are defined by NUMA boundaries. Let's generalize it: * workqueue_attrs->affn_scope is added. Each enum represents the type of boundaries that define the pods. There are currently two scopes - WQ_AFFN_NUMA and WQ_AFFN_SYSTEM. The former is the same behavior as before - one pod per NUMA node. The latter defines one global pod across the whole system. * struct wq_pod_type is added which describes how pods are configured for each affnity scope. For each pod, it lists the member CPUs and the preferred NUMA node for memory allocations. The reverse mapping from CPU to pod is also available. * wq_pod_enabled is dropped. Pod is now always enabled. The previously disabled behavior is now implemented through WQ_AFFN_SYSTEM. * get_unbound_pool() wants to determine the NUMA node to allocate memory from for the new pool. The variables are renamed from node to pod but the logic still assumes they're one and the same. Clearly distinguish them - walk the WQ_AFFN_NUMA pods to find the matching pod and then use the pod's NUMA node. * wq_calc_pod_cpumask() was taking @pod but assumed that it was the NUMA node. Take @cpu instead and determine the cpumask to use from the pod_type matching @attrs. * apply_wqattrs_prepare() is update to return ERR_PTR() on error instead of NULL so that it can indicate -EINVAL on invalid affinity scopes. This patch allows CPUs to be grouped into pods however desired per type. While this patch causes some internal behavior changes, nothing material should change for workqueue users. v2: Trigger WARN_ON_ONCE() in wqattrs_pod_type() if affn_scope is WQ_AFFN_NR_TYPES which indicates that the function is called with a worker_pool's attrs instead of a workqueue's. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index bab9fa3453ed..180491ee6706 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -125,6 +125,15 @@ struct rcu_work { struct workqueue_struct *wq; }; +enum wq_affn_scope { + WQ_AFFN_NUMA, /* one pod per NUMA node */ + WQ_AFFN_SYSTEM, /* one pod across the whole system */ + + WQ_AFFN_NR_TYPES, + + WQ_AFFN_DFL = WQ_AFFN_NUMA, +}; + /** * struct workqueue_attrs - A struct for workqueue attributes. * @@ -141,12 +150,26 @@ struct workqueue_attrs { */ cpumask_var_t cpumask; + /* + * Below fields aren't properties of a worker_pool. They only modify how + * :c:func:`apply_workqueue_attrs` select pools and thus don't + * participate in pool hash calculations or equality comparisons. + */ + /** - * @ordered: work items must be executed one by one in queueing order + * @affn_scope: unbound CPU affinity scope * - * Unlike other fields, ``ordered`` isn't a property of a worker_pool. It - * only modifies how :c:func:`apply_workqueue_attrs` select pools and thus - * doesn't participate in pool hash calculations or equality comparisons. + * CPU pods are used to improve execution locality of unbound work + * items. There are multiple pod types, one for each wq_affn_scope, and + * every CPU in the system belongs to one pod in every pod type. CPUs + * that belong to the same pod share the worker pool. For example, + * selecting %WQ_AFFN_NUMA makes the workqueue use a separate worker + * pool for each NUMA node. + */ + enum wq_affn_scope affn_scope; + + /** + * @ordered: work items must be executed one by one in queueing order */ bool ordered; }; -- cgit v1.2.3 From 63c5484e74952f60f5810256bd69814d167b8d22 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:24 -1000 Subject: workqueue: Add multiple affinity scopes and interface to select them Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 180491ee6706..568cfbc24bc0 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -126,12 +126,15 @@ struct rcu_work { }; enum wq_affn_scope { + WQ_AFFN_CPU, /* one pod per CPU */ + WQ_AFFN_SMT, /* one pod poer SMT */ + WQ_AFFN_CACHE, /* one pod per LLC */ WQ_AFFN_NUMA, /* one pod per NUMA node */ WQ_AFFN_SYSTEM, /* one pod across the whole system */ WQ_AFFN_NR_TYPES, - WQ_AFFN_DFL = WQ_AFFN_NUMA, + WQ_AFFN_DFL = WQ_AFFN_CACHE, }; /** -- cgit v1.2.3 From 9546b29e4a6ad6ed7924dd7980975c8e675740a3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:25 -1000 Subject: workqueue: Add workqueue_attrs->__pod_cpumask workqueue_attrs has two uses: * to specify the required unouned workqueue properties by users * to match worker_pool's properties to workqueues by core code For example, if the user wants to restrict a workqueue to run only CPUs 0 and 2, and the two CPUs are on different affinity scopes, the workqueue's attrs->cpumask would contains CPUs 0 and 2, and the workqueue would be associated with two worker_pools, one with attrs->cpumask containing just CPU 0 and the other CPU 2. Workqueue wants to support non-strict affinity scopes where work items are started in their matching affinity scopes but the scheduler is free to migrate them outside the starting scopes, which can enable utilizing the whole machine while maintaining most of the locality benefits from affinity scopes. To enable that, worker_pools need to distinguish the strict affinity that it has to follow (because that's the restriction coming from the user) and the soft affinity that it wants to apply when dispatching work items. Note that two worker_pools with different soft dispatching requirements have to be separate; otherwise, for example, we'd be ping-ponging worker threads across NUMA boundaries constantly. This patch adds workqueue_attrs->__pod_cpumask. The new field is double underscored as it's only used internally to distinguish worker_pools. A worker_pool's ->cpumask is now always the same as the online subset of allowed CPUs of the associated workqueues, and ->__pod_cpumask is the pod's subset of that ->cpumask. Going back to the example above, both worker_pools would have ->cpumask containing both CPUs 0 and 2 but one's ->__pod_cpumask would contain 0 while the other's 2. * pool_allowed_cpus() is added. It returns the worker_pool's strict cpumask that the pool's workers must stay within. This is currently always ->__pod_cpumask as all boundaries are still strict. * As a workqueue_attrs can now track both the associated workqueues' cpumask and its per-pod subset, wq_calc_pod_cpumask() no longer needs an external out-argument. Drop @cpumask and instead store the result in ->__pod_cpumask. * The above also simplifies apply_wqattrs_prepare() as the same workqueue_attrs can be used to create all pods associated with a workqueue. tmp_attrs is dropped. * wq_update_pod() is updated to use wqattrs_equal() to test whether a pwq update is needed instead of only comparing ->cpumask so that ->__pod_cpumask is compared too. It can directly compare ->__pod_cpumaks but the code is easier to understand and more robust this way. The only user-visible behavior change is that two workqueues with different cpumasks no longer can share worker_pools even when their pod subsets coincide. Going back to the example, let's say there's another workqueue with cpumask 0, 2, 3, where 2 and 3 are in the same pod. It would be mapped to two worker_pools - one with CPU 0, the other with 2 and 3. The former has the same cpumask as the first pod of the earlier example and would have shared the same worker_pool but that's no longer the case after this patch. The worker_pools would have the same ->__pod_cpumask but their ->cpumask's wouldn't match. While this is necessary to support non-strict affinity scopes, there can be further optimizations to maintain sharing among strict affinity scopes. However, non-strict affinity scopes are going to be preferable for most use cases and we don't see very diverse mixture of unbound workqueue cpumasks anyway, so the additional overhead doesn't seem to justify the extra complexity. v2: - wq_update_pod() was incorrectly comparing target_attrs->__pod_cpumask to pool->attrs->cpumask instead of its ->__pod_cpumask. Fix it by using wqattrs_equal() for comparison instead. - Per-cpu worker pools weren't initializing ->__pod_cpumask which caused a subtle problem later on. Set it to cpumask_of(cpu) like ->cpumask. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 568cfbc24bc0..fe53976e088e 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -150,9 +150,25 @@ struct workqueue_attrs { /** * @cpumask: allowed CPUs + * + * Work items in this workqueue are affine to these CPUs and not allowed + * to execute on other CPUs. A pool serving a workqueue must have the + * same @cpumask. */ cpumask_var_t cpumask; + /** + * @__pod_cpumask: internal attribute used to create per-pod pools + * + * Internal use only. + * + * Per-pod unbound worker pools are used to improve locality. Always a + * subset of ->cpumask. A workqueue can be associated with multiple + * worker pools with disjoint @__pod_cpumask's. Whether the enforcement + * of a pool's @__pod_cpumask is strict depends on @affn_strict. + */ + cpumask_var_t __pod_cpumask; + /* * Below fields aren't properties of a worker_pool. They only modify how * :c:func:`apply_workqueue_attrs` select pools and thus don't -- cgit v1.2.3 From 8639ecebc9b1796d7074751a350462f5e1c61cd4 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:25 -1000 Subject: workqueue: Implement non-strict affinity scope for unbound workqueues An unbound workqueue can be served by multiple worker_pools to improve locality. The segmentation is achieved by grouping CPUs into pods. By default, the cache boundaries according to cpus_share_cache() define the CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the system has two L3 caches. The workqueue would be mapped to two worker_pools each serving one L3 cache domains. While this improves locality, because the pod boundaries are strict, it limits the total bandwidth a given issuer can consume. For example, let's say there is a thread pinned to a CPU issuing enough work items to saturate the whole machine. With the machine segmented into two pods, no matter how many work items it issues, it can only use half of the CPUs on the system. While this limitation has existed for a very long time, it wasn't very pronounced because the affinity grouping used to be always by NUMA nodes. With cache boundaries as the default and support for even finer grained scopes (smt and cpu), it is now an a lot more pressing problem. This patch implements non-strict affinity scope where the pod boundaries aren't enforced strictly. Going back to the previous example, the workqueue would still be mapped to two worker_pools; however, the affinity enforcement would be soft. The workers in both pools would have their cpus_allowed set to the whole machine thus allowing the scheduler to migrate them anywhere on the machine. However, whenever an idle worker is woken up, the workqueue code asks the scheduler to bring back the task within the pod if the worker is outside. ie. work items start executing within its affinity scope but can be migrated outside as the scheduler sees fit. This removes the hard cap on utilization while maintaining the benefits of affinity scopes. After the earlier ->__pod_cpumask changes, the implementation is pretty simple. When non-strict which is the new default: * pool_allowed_cpus() returns @pool->attrs->cpumask instead of ->__pod_cpumask so that the workers are allowed to run on any CPU that the associated workqueues allow. * If the idle worker task's ->wake_cpu is outside the pod, kick_pool() sets the field to a CPU within the pod. This would be the first use of task_struct->wake_cpu outside scheduler proper, so it isn't clear whether this would be acceptable. However, other methods of migrating tasks are significantly more expensive and are likely prohibitively so if we want to do this on every work item. This needs discussion with scheduler folks. There is also a race window where setting ->wake_cpu wouldn't be effective as the target task is still on CPU. However, the window is pretty small and this being a best-effort optimization, it doesn't seem to warrant more complexity at the moment. While the non-strict cache affinity scopes seem to be the best option, the performance picture interacts with the affinity scope and is a bit complicated to fully discuss in this patch, so the behavior is made easily selectable through wqattrs and sysfs and the next patch will add documentation to discuss performance implications. v2: pool->attrs->affn_strict is set to true for per-cpu worker_pools. Signed-off-by: Tejun Heo Cc: Peter Zijlstra Cc: Linus Torvalds --- include/linux/workqueue.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index fe53976e088e..0c1cad38f9db 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -169,6 +169,17 @@ struct workqueue_attrs { */ cpumask_var_t __pod_cpumask; + /** + * @affn_strict: affinity scope is strict + * + * If clear, workqueue will make a best-effort attempt at starting the + * worker inside @__pod_cpumask but the scheduler is free to migrate it + * outside. + * + * If set, workers are only allowed to run inside @__pod_cpumask. + */ + bool affn_strict; + /* * Below fields aren't properties of a worker_pool. They only modify how * :c:func:`apply_workqueue_attrs` select pools and thus don't -- cgit v1.2.3 From 523a301e66afd1ea9856660bcf3cee3a7c84c6dd Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 7 Aug 2023 15:57:25 -1000 Subject: workqueue: Make default affinity_scope dynamically updatable While workqueue.default_affinity_scope is writable, it only affects workqueues which are created afterwards and isn't very useful. Instead, let's introduce explicit "default" scope and update the effective scope dynamically when workqueue.default_affinity_scope is changed. Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 0c1cad38f9db..1c1d06804d45 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -126,6 +126,7 @@ struct rcu_work { }; enum wq_affn_scope { + WQ_AFFN_DFL, /* use system default */ WQ_AFFN_CPU, /* one pod per CPU */ WQ_AFFN_SMT, /* one pod poer SMT */ WQ_AFFN_CACHE, /* one pod per LLC */ @@ -133,8 +134,6 @@ enum wq_affn_scope { WQ_AFFN_SYSTEM, /* one pod across the whole system */ WQ_AFFN_NR_TYPES, - - WQ_AFFN_DFL = WQ_AFFN_CACHE, }; /** -- cgit v1.2.3 From e0a99a839f04c90bf9f16919997c4b34f9c8f1f0 Mon Sep 17 00:00:00 2001 From: Anna-Maria Behnsen Date: Mon, 15 May 2023 18:20:38 +0200 Subject: Documentation: core-api/cpuhotplug: Fix state names Dynamic allocated hotplug states in documentation and the comment above cpuhp_state enum do not match the code. To not get confused by wrong documentation, change to proper state names. Signed-off-by: Anna-Maria Behnsen Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20230515162038.62703-1-anna-maria@linutronix.de --- include/linux/cpuhotplug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 25b6e6e6ba6b..06dda85f0424 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -48,7 +48,7 @@ * same section. * * If neither #1 nor #2 apply, please use the dynamic state space when - * setting up a state by using CPUHP_PREPARE_DYN or CPUHP_PREPARE_ONLINE + * setting up a state by using CPUHP_BP_PREPARE_DYN or CPUHP_AP_ONLINE_DYN * for the @state argument of the setup function. * * See Documentation/core-api/cpu_hotplug.rst for further information and -- cgit v1.2.3 From 29cfda963f899da403d6bc5a3abe19d2e0be0cf4 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 2 Aug 2023 21:09:57 +0800 Subject: netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush() Commit a23f89a99906 ("netfilter: conntrack: nf_ct_gre_keymap_flush() removal") leave this unused, remove it. Signed-off-by: Yue Haibing Signed-off-by: Florian Westphal --- include/linux/netfilter/nf_conntrack_proto_gre.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfilter/nf_conntrack_proto_gre.h b/include/linux/netfilter/nf_conntrack_proto_gre.h index f33aa6021364..34ce5d2f37a2 100644 --- a/include/linux/netfilter/nf_conntrack_proto_gre.h +++ b/include/linux/netfilter/nf_conntrack_proto_gre.h @@ -25,7 +25,6 @@ struct nf_ct_gre_keymap { int nf_ct_gre_keymap_add(struct nf_conn *ct, enum ip_conntrack_dir dir, struct nf_conntrack_tuple *t); -void nf_ct_gre_keymap_flush(struct net *net); /* delete keymap entries */ void nf_ct_gre_keymap_destroy(struct nf_conn *ct); -- cgit v1.2.3 From 61e9ab294b39e5e7c040884b65d06f52e06ac40f Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 7 Aug 2023 22:25:26 +0800 Subject: netfilter: h323: Remove unused function declarations Commit f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") declared but never implemented these. Signed-off-by: Yue Haibing Signed-off-by: Florian Westphal --- include/linux/netfilter/nf_conntrack_h323.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfilter/nf_conntrack_h323.h b/include/linux/netfilter/nf_conntrack_h323.h index 9e937f64a1ad..81286c499325 100644 --- a/include/linux/netfilter/nf_conntrack_h323.h +++ b/include/linux/netfilter/nf_conntrack_h323.h @@ -34,10 +34,6 @@ struct nf_ct_h323_master { int get_h225_addr(struct nf_conn *ct, unsigned char *data, TransportAddress *taddr, union nf_inet_addr *addr, __be16 *port); -void nf_conntrack_h245_expect(struct nf_conn *new, - struct nf_conntrack_expect *this); -void nf_conntrack_q931_expect(struct nf_conn *new, - struct nf_conntrack_expect *this); struct nfct_h323_nat_hooks { int (*set_h245_addr)(struct sk_buff *skb, unsigned int protoff, -- cgit v1.2.3 From 54e73cd52250adeba836cd3afef3658b48ae8dc9 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 1 Aug 2023 12:58:15 +0200 Subject: virtio: Remove PM #ifdef guards to fix i2c driver A cleanup in the virtio i2c caused a build failure: drivers/i2c/busses/i2c-virtio.c:270:10: error: 'struct virtio_driver' has no member named 'freeze' drivers/i2c/busses/i2c-virtio.c:271:10: error: 'struct virtio_driver' has no member named 'restore' Change the structure definition to allow this cleanup to be applied everywhere. Fixes: 73d546c76235b ("i2c: virtio: Remove #ifdef guards for PM related functions") Signed-off-by: Arnd Bergmann Reviewed-by: Paul Cercueil Reviewed-by: Andi Shyti Link: https://lore.kernel.org/r/20230801105846.3708252-1-arnd@kernel.org Signed-off-by: Andi Shyti --- include/linux/virtio.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index de6041deee37..7ed071b5ef07 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -184,10 +184,8 @@ struct virtio_driver { void (*scan)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); -#ifdef CONFIG_PM int (*freeze)(struct virtio_device *dev); int (*restore)(struct virtio_device *dev); -#endif }; static inline struct virtio_driver *drv_to_virtio(struct device_driver *drv) -- cgit v1.2.3 From 6672efbb685f7c9c9df005beb839e1942fd6b34e Mon Sep 17 00:00:00 2001 From: Khadija Kamran Date: Mon, 7 Aug 2023 11:59:29 +0500 Subject: lsm: constify the 'target' parameter in security_capget() Three LSMs register the implementations for the "capget" hook: AppArmor, SELinux, and the normal capability code. Looking at the function implementations we may observe that the first parameter "target" is not changing. Mark the first argument "target" of LSM hook security_capget() as "const" since it will not be changing in the LSM hook. cap_capget() LSM hook declaration exceeds the 80 characters per line limit. Split the function declaration to multiple lines to decrease the line length. Signed-off-by: Khadija Kamran Acked-by: John Johansen [PM: align the cap_capget() declaration, spelling fixes] Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 920d8f16fa99..540caa703573 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -36,7 +36,7 @@ LSM_HOOK(int, 0, binder_transfer_file, const struct cred *from, LSM_HOOK(int, 0, ptrace_access_check, struct task_struct *child, unsigned int mode) LSM_HOOK(int, 0, ptrace_traceme, struct task_struct *parent) -LSM_HOOK(int, 0, capget, struct task_struct *target, kernel_cap_t *effective, +LSM_HOOK(int, 0, capget, const struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted) LSM_HOOK(int, 0, capset, struct cred *new, const struct cred *old, const kernel_cap_t *effective, const kernel_cap_t *inheritable, diff --git a/include/linux/security.h b/include/linux/security.h index 32828502f09e..7665f56d920a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -145,7 +145,8 @@ extern int cap_capable(const struct cred *cred, struct user_namespace *ns, extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz); extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode); extern int cap_ptrace_traceme(struct task_struct *parent); -extern int cap_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); +extern int cap_capget(const struct task_struct *target, kernel_cap_t *effective, + kernel_cap_t *inheritable, kernel_cap_t *permitted); extern int cap_capset(struct cred *new, const struct cred *old, const kernel_cap_t *effective, const kernel_cap_t *inheritable, @@ -271,7 +272,7 @@ int security_binder_transfer_file(const struct cred *from, const struct cred *to, struct file *file); int security_ptrace_access_check(struct task_struct *child, unsigned int mode); int security_ptrace_traceme(struct task_struct *parent); -int security_capget(struct task_struct *target, +int security_capget(const struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); @@ -553,7 +554,7 @@ static inline int security_ptrace_traceme(struct task_struct *parent) return cap_ptrace_traceme(parent); } -static inline int security_capget(struct task_struct *target, +static inline int security_capget(const struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted) -- cgit v1.2.3 From d74f714896fd6268882789ba28e52c9145951403 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 8 Aug 2023 11:03:28 -0600 Subject: block: get rid of unused plug->nowait flag This was introduced to add a plug based way of signaling nowait issues, but we have since moved on from that. Kill the old dead code, nobody is setting it anymore. Reviewed-by: Chaitanya Kulkarni Reviewed-by: Bart Van Assche Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ed44a997f629..87d94be7825a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -969,7 +969,6 @@ struct blk_plug { bool multiple_queues; bool has_elevator; - bool nowait; struct list_head cb_list; /* md requires an unplug callback */ }; -- cgit v1.2.3 From 26bbbef8ff4047f857ac2d7353eb19e46100ce18 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:13 +0200 Subject: net: fs_enet: Remove fs_get_id() fs_get_id() hasn't been used since commit b219108cbace ("fs_enet: Remove !CONFIG_PPC_CPM_NEW_BINDING code") Remove it. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/7a53b88cc40302fcbea59554f5e7067e3594ad53.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 77d783f71527..2351c3d9404d 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -151,15 +151,4 @@ struct fs_mii_fec_platform_info { u32 mii_speed; }; -static inline int fs_get_id(struct fs_platform_info *fpi) -{ - if(strstr(fpi->fs_type, "SCC")) - return fs_scc_index2id(fpi->fs_no); - if(strstr(fpi->fs_type, "FCC")) - return fs_fcc_index2id(fpi->fs_no); - if(strstr(fpi->fs_type, "FEC")) - return fs_fec_index2id(fpi->fs_no); - return fpi->fs_no; -} - #endif -- cgit v1.2.3 From caaf482e265415778de74e262a6f153dd2f18fa4 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:14 +0200 Subject: net: fs_enet: Remove unused fields in fs_platform_info struct Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") many fields of fs_platform_info structure are not used anymore. Remove them. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/2e584fcd75e21a0f7e7d5f942eebdc067b2f82f9.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 19 ------------------- 1 file changed, 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 2351c3d9404d..a1905e41c167 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -111,33 +111,14 @@ struct fs_mii_bb_platform_info { }; struct fs_platform_info { - - void(*init_ioports)(struct fs_platform_info *); /* device specific information */ - int fs_no; /* controller index */ - char fs_type[4]; /* controller type */ - - u32 cp_page; /* CPM page */ - u32 cp_block; /* CPM sblock */ u32 cp_command; /* CPM page/sblock/mcn */ - u32 clk_trx; /* some stuff for pins & mux configuration*/ - u32 clk_rx; - u32 clk_tx; - u32 clk_route; - u32 clk_mask; - - u32 mem_offset; u32 dpram_offset; - u32 fcc_regs_c; - u32 device_flags; - struct device_node *phy_node; - const struct fs_mii_bus_info *bus_info; int rx_ring, tx_ring; /* number of buffers on rx */ - __u8 macaddr[ETH_ALEN]; /* mac address */ int rx_copybreak; /* limit we copy small frames */ int napi_weight; /* NAPI weight */ -- cgit v1.2.3 From 9359a48c65a3c2723d469ba11889c56387e4395a Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:15 +0200 Subject: net: fs_enet: Remove has_phy field in fs_platform_info struct Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") has_phy field is never set. Remove dead code and remove the field. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/bb5264e09e18f0ce8a0dbee399926a59f33cb248.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index a1905e41c167..2b351b676467 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -123,7 +123,6 @@ struct fs_platform_info { int napi_weight; /* NAPI weight */ int use_rmii; /* use RMII mode */ - int has_phy; /* if the network is phy container as well...*/ struct clk *clk_per; /* 'per' clock for register access */ }; -- cgit v1.2.3 From 7a76918371fe04667528142554a3f26a8879e8ab Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:17 +0200 Subject: net: fs_enet: Move struct fs_platform_info into fs_enet.h struct fs_platform_info is only used in fs_enet ethernet driver since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding."). Stale prototypes using fs_platform_info were left over in arch/powerpc/sysdev/fsl_soc.c but they are now removed by previous patch. Move struct fs_platform_info into fs_enet.h Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/f882d6b0b7075d0d8435310634ceaa2cc8e9938f.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 2b351b676467..7c9897dab558 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -110,22 +110,6 @@ struct fs_mii_bb_platform_info { int irq[32]; /* irqs per phy's */ }; -struct fs_platform_info { - /* device specific information */ - u32 cp_command; /* CPM page/sblock/mcn */ - - u32 dpram_offset; - - struct device_node *phy_node; - - int rx_ring, tx_ring; /* number of buffers on rx */ - int rx_copybreak; /* limit we copy small frames */ - int napi_weight; /* NAPI weight */ - - int use_rmii; /* use RMII mode */ - - struct clk *clk_per; /* 'per' clock for register access */ -}; struct fs_mii_fec_platform_info { u32 irq[32]; u32 mii_speed; -- cgit v1.2.3 From 7149b38dc7cbc77548afd65e4bb4d4b11dca8670 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:19 +0200 Subject: net: fs_enet: Remove linux/fs_enet_pd.h linux/fs_enet_pd.h is not used anymore. Remove it. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/5be102791c987792ad127b15543ee6715394cf67.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 118 --------------------------------------------- 1 file changed, 118 deletions(-) delete mode 100644 include/linux/fs_enet_pd.h (limited to 'include/linux') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h deleted file mode 100644 index 7c9897dab558..000000000000 --- a/include/linux/fs_enet_pd.h +++ /dev/null @@ -1,118 +0,0 @@ -/* - * Platform information definitions for the - * universal Freescale Ethernet driver. - * - * Copyright (c) 2003 Intracom S.A. - * by Pantelis Antoniou - * - * 2005 (c) MontaVista Software, Inc. - * Vitaly Bordug - * - * This file is licensed under the terms of the GNU General Public License - * version 2. This program is licensed "as is" without any warranty of any - * kind, whether express or implied. - */ - -#ifndef FS_ENET_PD_H -#define FS_ENET_PD_H - -#include -#include -#include -#include -#include - -#define FS_ENET_NAME "fs_enet" - -enum fs_id { - fsid_fec1, - fsid_fec2, - fsid_fcc1, - fsid_fcc2, - fsid_fcc3, - fsid_scc1, - fsid_scc2, - fsid_scc3, - fsid_scc4, -}; - -#define FS_MAX_INDEX 9 - -static inline int fs_get_fec_index(enum fs_id id) -{ - if (id >= fsid_fec1 && id <= fsid_fec2) - return id - fsid_fec1; - return -1; -} - -static inline int fs_get_fcc_index(enum fs_id id) -{ - if (id >= fsid_fcc1 && id <= fsid_fcc3) - return id - fsid_fcc1; - return -1; -} - -static inline int fs_get_scc_index(enum fs_id id) -{ - if (id >= fsid_scc1 && id <= fsid_scc4) - return id - fsid_scc1; - return -1; -} - -static inline int fs_fec_index2id(int index) -{ - int id = fsid_fec1 + index - 1; - if (id >= fsid_fec1 && id <= fsid_fec2) - return id; - return FS_MAX_INDEX; - } - -static inline int fs_fcc_index2id(int index) -{ - int id = fsid_fcc1 + index - 1; - if (id >= fsid_fcc1 && id <= fsid_fcc3) - return id; - return FS_MAX_INDEX; -} - -static inline int fs_scc_index2id(int index) -{ - int id = fsid_scc1 + index - 1; - if (id >= fsid_scc1 && id <= fsid_scc4) - return id; - return FS_MAX_INDEX; -} - -enum fs_mii_method { - fsmii_fixed, - fsmii_fec, - fsmii_bitbang, -}; - -enum fs_ioport { - fsiop_porta, - fsiop_portb, - fsiop_portc, - fsiop_portd, - fsiop_porte, -}; - -struct fs_mii_bit { - u32 offset; - u8 bit; - u8 polarity; -}; -struct fs_mii_bb_platform_info { - struct fs_mii_bit mdio_dir; - struct fs_mii_bit mdio_dat; - struct fs_mii_bit mdc_dat; - int delay; /* delay in us */ - int irq[32]; /* irqs per phy's */ -}; - -struct fs_mii_fec_platform_info { - u32 irq[32]; - u32 mii_speed; -}; - -#endif -- cgit v1.2.3 From de3ecc4fd8bf201c5cd02dc49687fb1506cebb45 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Mon, 7 Aug 2023 09:25:54 +0800 Subject: team: change the init function in the team_option structure to void Because the init function in the team_option structure always returns 0, so change the init function to void and remove redundant code. Signed-off-by: Zhengchao Shao Reviewed-by: Hangbin Liu Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230807012556.3146071-4-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 8de6b6e67829..fc01c3cfe86d 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -162,7 +162,7 @@ struct team_option { bool per_port; unsigned int array_size; /* != 0 means the option is array */ enum team_option_type type; - int (*init)(struct team *team, struct team_option_inst_info *info); + void (*init)(struct team *team, struct team_option_inst_info *info); int (*getter)(struct team *team, struct team_gsetter_ctx *ctx); int (*setter)(struct team *team, struct team_gsetter_ctx *ctx); }; -- cgit v1.2.3 From c3b41f4c7b7ce573f379c0053e3c86e722562659 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Mon, 7 Aug 2023 09:25:55 +0800 Subject: team: change the getter function in the team_option structure to void Because the getter function in the team_option structure always returns 0, so change the getter function to void and remove redundant code. Signed-off-by: Zhengchao Shao Reviewed-by: Hangbin Liu Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index fc01c3cfe86d..1b9b15a492fa 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -163,7 +163,7 @@ struct team_option { unsigned int array_size; /* != 0 means the option is array */ enum team_option_type type; void (*init)(struct team *team, struct team_option_inst_info *info); - int (*getter)(struct team *team, struct team_gsetter_ctx *ctx); + void (*getter)(struct team *team, struct team_gsetter_ctx *ctx); int (*setter)(struct team *team, struct team_gsetter_ctx *ctx); }; -- cgit v1.2.3 From 2adbb7637fd1fcec93f4680ddb5ddbbd1a91aefb Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 8 Aug 2023 22:57:41 +0800 Subject: bpf: btf: Remove two unused function declarations Commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") removed the implementations but leave declarations. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230808145741.33292-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau --- include/linux/btf.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index cac9f304e27a..df64cc642074 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -204,8 +204,6 @@ u32 btf_nr_types(const struct btf *btf); bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, const struct btf_member *m, u32 expected_offset, u32 expected_size); -int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); -int btf_find_timer(const struct btf *btf, const struct btf_type *t); struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size); int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); -- cgit v1.2.3 From 6a3207395563f724d91231ec0aa7c4d95bf9591d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 7 Aug 2023 12:26:25 +0100 Subject: fs, block: remove bdev->bd_super bdev->bd_super is unused now, remove it. Signed-off-by: Christoph Hellwig Reviewed-by: Christian Brauner Message-Id: <20230807112625.652089-5-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blk_types.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 0bad62cca3d0..d5c5e59ddbd2 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -52,7 +52,6 @@ struct block_device { atomic_t bd_openers; spinlock_t bd_size_lock; /* for bd_inode->i_size updates */ struct inode * bd_inode; /* will die */ - struct super_block * bd_super; void * bd_claiming; void * bd_holder; const struct blk_holder_ops *bd_holder_ops; -- cgit v1.2.3 From 0d72b92883c651a11059d93335f33d65c6eb653b Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 7 Aug 2023 15:38:33 -0400 Subject: fs: pass the request_mask to generic_fillattr generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi Reviewed-by: Xiubo Li Reviewed-by: "Paulo Alcantara (SUSE)" Reviewed-by: Jan Kara Signed-off-by: Jeff Layton Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 61f27011fd04..85977cdeda94 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2917,7 +2917,7 @@ extern void page_put_link(void *); extern int page_symlink(struct inode *inode, const char *symname, int len); extern const struct inode_operations page_symlink_inode_operations; extern void kfree_link(void *); -void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *); +void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *); void generic_fill_statx_attr(struct inode *inode, struct kstat *stat); extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int); extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int); -- cgit v1.2.3 From 541d4c798a598854fcce7326d947cbcbd35701d6 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 7 Aug 2023 15:38:34 -0400 Subject: fs: drop the timespec64 arg from generic_update_time In future patches we're going to change how the ctime is updated to keep track of when it has been queried. The way that the update_time operation works (and a lot of its callers) make this difficult, since they grab a timestamp early and then pass it down to eventually be copied into the inode. All of the existing update_time callers pass in the result of current_time() in some fashion. Drop the "time" parameter from generic_update_time, and rework it to fetch its own timestamp. This change means that an update_time could fetch a different timestamp than was seen in inode_needs_update_time. update_time is only ever called with one of two flag combinations: Either S_ATIME is set, or S_MTIME|S_CTIME|S_VERSION are set. With this change we now treat the flags argument as an indicator that some value needed to be updated when last checked, rather than an indication to update specific timestamps. Rework the logic for updating the timestamps and put it in a new inode_update_timestamps helper that other update_time routines can use. S_ATIME is as treated as we always have, but if any of the other three are set, then we attempt to update all three. Also, some callers of generic_update_time need to know what timestamps were actually updated. Change it to return an S_* flag mask to indicate that and rework the callers to expect it. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Message-Id: <20230807-mgctime-v7-3-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 85977cdeda94..bb3c2c4f871f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2343,7 +2343,8 @@ extern int current_umask(void); extern void ihold(struct inode * inode); extern void iput(struct inode *); -extern int generic_update_time(struct inode *, struct timespec64 *, int); +int inode_update_timestamps(struct inode *inode, int flags); +int generic_update_time(struct inode *, int); /* /sys/fs */ extern struct kobject *fs_kobj; -- cgit v1.2.3 From eafc474e202978ac735c551d5ee1eb8c02e2be54 Mon Sep 17 00:00:00 2001 From: Carlos Maiolino Date: Tue, 25 Jul 2023 16:45:07 +0200 Subject: shmem: prepare shmem quota infrastructure Add new shmem quota format, its quota_format_ops together with dquot_operations Signed-off-by: Lukas Czerner Signed-off-by: Carlos Maiolino Reviewed-by: Jan Kara Message-Id: <20230725144510.253763-5-cem@kernel.org> Signed-off-by: Christian Brauner --- include/linux/shmem_fs.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9029abd29b1c..7abfaf70b58a 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -13,6 +13,10 @@ /* inode in-kernel data */ +#ifdef CONFIG_TMPFS_QUOTA +#define SHMEM_MAXQUOTAS 2 +#endif + struct shmem_inode_info { spinlock_t lock; unsigned int seals; /* shmem seals */ @@ -172,4 +176,12 @@ extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ +/* + * Used space is stored as unsigned 64-bit value in bytes but + * quota core supports only signed 64-bit values so use that + * as a limit + */ +#define SHMEM_QUOTA_MAX_SPC_LIMIT 0x7fffffffffffffffLL /* 2^63-1 */ +#define SHMEM_QUOTA_MAX_INO_LIMIT 0x7fffffffffffffffLL + #endif -- cgit v1.2.3 From e09764cff44b5d31c2ca5477444565e3080637d2 Mon Sep 17 00:00:00 2001 From: Carlos Maiolino Date: Tue, 25 Jul 2023 16:45:08 +0200 Subject: shmem: quota support Now the basic infra-structure is in place, enable quota support for tmpfs. This offers user and group quotas to tmpfs (project quotas will be added later). Also, as other filesystems, the tmpfs quota is not supported within user namespaces yet, so idmapping is not translated. Signed-off-by: Lukas Czerner Signed-off-by: Carlos Maiolino Reviewed-by: Jan Kara Message-Id: <20230725144510.253763-6-cem@kernel.org> Signed-off-by: Christian Brauner --- include/linux/shmem_fs.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 7abfaf70b58a..1a568a0f542f 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -31,6 +31,9 @@ struct shmem_inode_info { atomic_t stop_eviction; /* hold when working on inode */ struct timespec64 i_crtime; /* file creation time */ unsigned int fsflags; /* flags for FS_IOC_[SG]ETFLAGS */ +#ifdef CONFIG_TMPFS_QUOTA + struct dquot *i_dquot[MAXQUOTAS]; +#endif struct inode vfs_inode; }; @@ -184,4 +187,9 @@ extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, #define SHMEM_QUOTA_MAX_SPC_LIMIT 0x7fffffffffffffffLL /* 2^63-1 */ #define SHMEM_QUOTA_MAX_INO_LIMIT 0x7fffffffffffffffLL +#ifdef CONFIG_TMPFS_QUOTA +extern const struct dquot_operations shmem_quota_operations; +extern struct quota_format_type shmem_quota_format; +#endif /* CONFIG_TMPFS_QUOTA */ + #endif -- cgit v1.2.3 From de4c0e7ca8b526a82ff7e5ee5533787bb6d01724 Mon Sep 17 00:00:00 2001 From: Lukas Czerner Date: Tue, 25 Jul 2023 16:45:09 +0200 Subject: shmem: Add default quota limit mount options Allow system administrator to set default global quota limits at tmpfs mount time. Signed-off-by: Lukas Czerner Signed-off-by: Carlos Maiolino Reviewed-by: Jan Kara Message-Id: <20230725144510.253763-7-cem@kernel.org> Signed-off-by: Christian Brauner --- include/linux/shmem_fs.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 1a568a0f542f..c0058f3bba70 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -42,6 +42,13 @@ struct shmem_inode_info { (FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL | FS_NOATIME_FL) #define SHMEM_FL_INHERITED (FS_NODUMP_FL | FS_NOATIME_FL) +struct shmem_quota_limits { + qsize_t usrquota_bhardlimit; /* Default user quota block hard limit */ + qsize_t usrquota_ihardlimit; /* Default user quota inode hard limit */ + qsize_t grpquota_bhardlimit; /* Default group quota block hard limit */ + qsize_t grpquota_ihardlimit; /* Default group quota inode hard limit */ +}; + struct shmem_sb_info { unsigned long max_blocks; /* How many blocks are allowed */ struct percpu_counter used_blocks; /* How many are allocated */ @@ -60,6 +67,7 @@ struct shmem_sb_info { spinlock_t shrinklist_lock; /* Protects shrinklist */ struct list_head shrinklist; /* List of shinkable inodes */ unsigned long shrinklist_len; /* Length of shrinklist */ + struct shmem_quota_limits qlimits; /* Default quota limits */ }; static inline struct shmem_inode_info *SHMEM_I(struct inode *inode) -- cgit v1.2.3 From 6faddda69f623d38bb097640689901a4b5ff881a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 30 Jun 2023 13:48:49 -0400 Subject: libfs: Add directory operations for stable offsets Create a vector of directory operations in fs/libfs.c that handles directory seeks and readdir via stable offsets instead of the current cursor-based mechanism. For the moment these are unused. Signed-off-by: Chuck Lever Message-Id: <168814732984.530310.11190772066786107220.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner --- include/linux/fs.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..59a4129ce14c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1770,6 +1770,7 @@ struct dir_context { struct iov_iter; struct io_uring_cmd; +struct offset_ctx; struct file_operations { struct module *owner; @@ -1857,6 +1858,7 @@ struct inode_operations { int (*fileattr_set)(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); + struct offset_ctx *(*get_offset_ctx)(struct inode *inode); } ____cacheline_aligned; static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, @@ -2971,6 +2973,22 @@ extern ssize_t simple_read_from_buffer(void __user *to, size_t count, extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count); +struct offset_ctx { + struct xarray xa; + u32 next_offset; +}; + +void simple_offset_init(struct offset_ctx *octx); +int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry); +void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry); +int simple_offset_rename_exchange(struct inode *old_dir, + struct dentry *old_dentry, + struct inode *new_dir, + struct dentry *new_dentry); +void simple_offset_destroy(struct offset_ctx *octx); + +extern const struct file_operations simple_offset_dir_operations; + extern int __generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_file_fsync(struct file *, loff_t, loff_t, int); -- cgit v1.2.3 From a2e459555c5f9da3e619b7e47a63f98574dc75f1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 30 Jun 2023 13:49:03 -0400 Subject: shmem: stable directory offsets The current cursor-based directory offset mechanism doesn't work when a tmpfs filesystem is exported via NFS. This is because NFS clients do not open directories. Each server-side READDIR operation has to open the directory, read it, then close it. The cursor state for that directory, being associated strictly with the opened struct file, is thus discarded after each NFS READDIR operation. Directory offsets are cached not only by NFS clients, but also by user space libraries on those clients. Essentially there is no way to invalidate those caches when directory offsets have changed on an NFS server after the offset-to-dentry mapping changes. Thus the whole application stack depends on unchanging directory offsets. The solution we've come up with is to make the directory offset for each file in a tmpfs filesystem stable for the life of the directory entry it represents. shmem_readdir() and shmem_dir_llseek() now use an xarray to map each directory offset (an loff_t integer) to the memory address of a struct dentry. Signed-off-by: Chuck Lever Message-Id: <168814734331.530310.3911190551060453102.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner --- include/linux/shmem_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index c0058f3bba70..9b2d2faff1d0 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -34,6 +34,7 @@ struct shmem_inode_info { #ifdef CONFIG_TMPFS_QUOTA struct dquot *i_dquot[MAXQUOTAS]; #endif + struct offset_ctx dir_offsets; /* stable entry offsets */ struct inode vfs_inode; }; -- cgit v1.2.3 From 5de75970c9fd7220e394b76e6d20fbafa1369b5a Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 8 Aug 2023 21:30:59 -0700 Subject: xattr: simple_xattr_set() return old_xattr to be freed tmpfs wants to support limited user extended attributes, but kernfs (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR) already supports user extended attributes through simple xattrs: but limited by a policy (128KiB per inode) too liberal to be used on tmpfs. To allow a different limiting policy for tmpfs, without affecting the policy for kernfs, change simple_xattr_set() to return the replaced or removed xattr (if any), leaving the caller to update their accounting then free the xattr (by simple_xattr_free(), renamed from the static free_simple_xattr()). Signed-off-by: Hugh Dickins Reviewed-by: Jan Kara Reviewed-by: Christian Brauner Reviewed-by: Carlos Maiolino Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com> Signed-off-by: Christian Brauner --- include/linux/xattr.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/xattr.h b/include/linux/xattr.h index d591ef59aa98..e37fe667ae04 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -116,11 +116,12 @@ struct simple_xattr { void simple_xattrs_init(struct simple_xattrs *xattrs); void simple_xattrs_free(struct simple_xattrs *xattrs); struct simple_xattr *simple_xattr_alloc(const void *value, size_t size); +void simple_xattr_free(struct simple_xattr *xattr); int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, void *buffer, size_t size); -int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, - const void *value, size_t size, int flags, - ssize_t *removed_size); +struct simple_xattr *simple_xattr_set(struct simple_xattrs *xattrs, + const char *name, const void *value, + size_t size, int flags); ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, char *buffer, size_t size); void simple_xattr_add(struct simple_xattrs *xattrs, -- cgit v1.2.3 From e07c469e979c104464300aaa3b7923f929055cd0 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 8 Aug 2023 21:32:21 -0700 Subject: tmpfs: track free_ispace instead of free_inodes In preparation for assigning some inode space to extended attributes, keep track of free_ispace instead of number of free_inodes: as if one tmpfs inode (and accompanying dentry) occupies very approximately 1KiB. Unsigned long is large enough for free_ispace, on 64-bit and on 32-bit: but take care to enforce the maximum. And fix the nr_blocks maximum on 32-bit: S64_MAX would be too big for it there, so say LONG_MAX instead. Delete the incorrect limited<->unlimited blocks/inodes comment above shmem_reconfigure(): leave it to the error messages below to describe. Signed-off-by: Hugh Dickins Reviewed-by: Jan Kara Reviewed-by: Carlos Maiolino Message-Id: <4fe1739-d9e7-8dfd-5bce-12e7339711da@google.com> Signed-off-by: Christian Brauner --- include/linux/shmem_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9b2d2faff1d0..6b0c626620f5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -54,7 +54,7 @@ struct shmem_sb_info { unsigned long max_blocks; /* How many blocks are allowed */ struct percpu_counter used_blocks; /* How many are allocated */ unsigned long max_inodes; /* How many inodes are allowed */ - unsigned long free_inodes; /* How many are left for allocation */ + unsigned long free_ispace; /* How much ispace left for allocation */ raw_spinlock_t stat_lock; /* Serialize shmem_sb_info changes */ umode_t mode; /* Mount mode for root directory */ unsigned char huge; /* Whether to try for hugepages */ -- cgit v1.2.3 From 12e6ac69cc7e7d3367599ae26a92a0f9a18bc728 Mon Sep 17 00:00:00 2001 From: Xu Yang Date: Wed, 9 Aug 2023 10:44:32 +0800 Subject: usb: chipidea: add workaround for chipidea PEC bug Some NXP processors using ChipIdea USB IP have a bug when frame babble is detected. Issue description: In USB camera test, our controller is host in HS mode. In ISOC IN, when device sends data across the micro frame, it causes the babble in host controller. This will clear the PE bit. In spec, it also requires to set the PEC bit and then set the PCI bit. Without the PCI interrupt, the software does not know the PE is cleared. This will add a flag CI_HDRC_HAS_PORTSC_PEC_MISSED to some impacted platform datas. And the ehci host driver will assert PEC by SW when specific conditions are satisfied. Signed-off-by: Xu Yang Link: https://lore.kernel.org/r/20230809024432.535160-2-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/chipidea.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/chipidea.h b/include/linux/usb/chipidea.h index ee38835ed77c..0b4f2d5faa08 100644 --- a/include/linux/usb/chipidea.h +++ b/include/linux/usb/chipidea.h @@ -63,6 +63,7 @@ struct ci_hdrc_platform_data { #define CI_HDRC_IMX_IS_HSIC BIT(14) #define CI_HDRC_PMQOS BIT(15) #define CI_HDRC_PHY_VBUS_CONTROL BIT(16) +#define CI_HDRC_HAS_PORTSC_PEC_MISSED BIT(17) enum usb_dr_mode dr_mode; #define CI_HDRC_CONTROLLER_RESET_EVENT 0 #define CI_HDRC_CONTROLLER_STOPPED_EVENT 1 -- cgit v1.2.3 From 1e4c574225cc5a0553115e5eb5787d1474db5b0f Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Tue, 8 Aug 2023 20:44:18 -0400 Subject: USB: Remove remnants of Wireless USB and UWB Wireless USB has long been defunct, and kernel support for it was removed in 2020 by commit caa6772db4c1 ("Staging: remove wusbcore and UWB from the kernel tree."). Nevertheless, some vestiges of the old implementation still clutter up the USB subsystem and one or two other places. Let's get rid of them once and for all. The only parts still left are the user-facing APIs in include/uapi/linux/usb/ch9.h. (There are also a couple of misleading instances, such as the Sierra Wireless USB modem, which is a USB modem made by Sierra Wireless.) Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/b4f2710f-a2de-4fb0-b50f-76776f3a961b@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 12 ------------ include/linux/usb/ch9.h | 5 +---- include/linux/usb/composite.h | 23 ----------------------- include/linux/usb/hcd.h | 2 -- 4 files changed, 1 insertion(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 25f8e62a30ec..a21074861f91 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -25,7 +25,6 @@ struct usb_device; struct usb_driver; -struct wusb_dev; /*-------------------------------------------------------------------------*/ @@ -425,7 +424,6 @@ struct usb_host_config { struct usb_host_bos { struct usb_bos_descriptor *desc; - /* wireless cap descriptor is handled by wusb */ struct usb_ext_cap_descriptor *ext_cap; struct usb_ss_cap_descriptor *ss_cap; struct usb_ssp_cap_descriptor *ssp_cap; @@ -612,7 +610,6 @@ struct usb3_lpm_parameters { * WUSB devices are not, until we authorize them from user space. * FIXME -- complete doc * @authenticated: Crypto authentication passed - * @wusb: device is Wireless USB * @lpm_capable: device supports LPM * @lpm_devinit_allow: Allow USB3 device initiated LPM, exit latency is in range * @usb2_hw_lpm_capable: device can perform USB2 hardware LPM @@ -634,8 +631,6 @@ struct usb3_lpm_parameters { * @do_remote_wakeup: remote wakeup should be enabled * @reset_resume: needs reset instead of resume * @port_is_suspended: the upstream port is suspended (L2 or U3) - * @wusb_dev: if this is a Wireless USB device, link to the WUSB - * specific data for the device. * @slot_id: Slot ID assigned by xHCI * @removable: Device can be physically removed from this port * @l1_params: best effor service latency for USB2 L1 LPM state, and L1 timeout. @@ -696,7 +691,6 @@ struct usb_device { unsigned have_langid:1; unsigned authorized:1; unsigned authenticated:1; - unsigned wusb:1; unsigned lpm_capable:1; unsigned lpm_devinit_allow:1; unsigned usb2_hw_lpm_capable:1; @@ -727,7 +721,6 @@ struct usb_device { unsigned reset_resume:1; unsigned port_is_suspended:1; - struct wusb_dev *wusb_dev; int slot_id; struct usb2_lpm_parameters l1_params; struct usb3_lpm_parameters u1_params; @@ -1742,11 +1735,6 @@ static inline void usb_fill_bulk_urb(struct urb *urb, * encoding of the endpoint interval, and express polling intervals in * microframes (eight per millisecond) rather than in frames (one per * millisecond). - * - * Wireless USB also uses the logarithmic encoding, but specifies it in units of - * 128us instead of 125us. For Wireless USB devices, the interval is passed - * through to the host controller, rather than being translated into microframe - * units. */ static inline void usb_fill_int_urb(struct urb *urb, struct usb_device *dev, diff --git a/include/linux/usb/ch9.h b/include/linux/usb/ch9.h index 969e7dba6358..c93b410b314a 100644 --- a/include/linux/usb/ch9.h +++ b/include/linux/usb/ch9.h @@ -3,7 +3,7 @@ * This file holds USB constants and structures that are needed for * USB device APIs. These are used by the USB device model, which is * defined in chapter 9 of the USB 2.0 specification and in the - * Wireless USB 1.0 (spread around). Linux has several APIs in C that + * Wireless USB 1.0 spec (now defunct). Linux has several APIs in C that * need these: * * - the host side Linux-USB kernel driver API; @@ -14,9 +14,6 @@ * act either as a USB host or as a USB device. That means the host and * device side APIs benefit from working well together. * - * There's also "Wireless USB", using low power short range radios for - * peripheral interconnection but otherwise building on the USB framework. - * * Note all descriptors are declared '__attribute__((packed))' so that: * * [a] they never get padded, either internally (USB spec writers diff --git a/include/linux/usb/composite.h b/include/linux/usb/composite.h index 07531c4f4350..6014340ba980 100644 --- a/include/linux/usb/composite.h +++ b/include/linux/usb/composite.h @@ -450,29 +450,6 @@ static inline struct usb_composite_driver *to_cdriver( * * One of these devices is allocated and initialized before the * associated device driver's bind() is called. - * - * OPEN ISSUE: it appears that some WUSB devices will need to be - * built by combining a normal (wired) gadget with a wireless one. - * This revision of the gadget framework should probably try to make - * sure doing that won't hurt too much. - * - * One notion for how to handle Wireless USB devices involves: - * - * (a) a second gadget here, discovery mechanism TBD, but likely - * needing separate "register/unregister WUSB gadget" calls; - * (b) updates to usb_gadget to include flags "is it wireless", - * "is it wired", plus (presumably in a wrapper structure) - * bandgroup and PHY info; - * (c) presumably a wireless_ep wrapping a usb_ep, and reporting - * wireless-specific parameters like maxburst and maxsequence; - * (d) configurations that are specific to wireless links; - * (e) function drivers that understand wireless configs and will - * support wireless for (additional) function instances; - * (f) a function to support association setup (like CBAF), not - * necessarily requiring a wireless adapter; - * (g) composite device setup that can create one or more wireless - * configs, including appropriate association setup support; - * (h) more, TBD. */ struct usb_composite_dev { struct usb_gadget *gadget; diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 4e9623e8492b..61d4f0b793dc 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -154,7 +154,6 @@ struct usb_hcd { /* The next flag is a stopgap, to be removed when all the HCDs * support the new root-hub polling mechanism. */ unsigned uses_new_polling:1; - unsigned wireless:1; /* Wireless USB HCD */ unsigned has_tt:1; /* Integrated TT in root hub */ unsigned amd_resume_bug:1; /* AMD remote wakeup quirk */ unsigned can_do_streams:1; /* HC supports streams */ @@ -249,7 +248,6 @@ struct hc_driver { #define HCD_SHARED 0x0004 /* Two (or more) usb_hcds share HW */ #define HCD_USB11 0x0010 /* USB 1.1 */ #define HCD_USB2 0x0020 /* USB 2.0 */ -#define HCD_USB25 0x0030 /* Wireless USB 1.0 (USB 2.5)*/ #define HCD_USB3 0x0040 /* USB 3.0 */ #define HCD_USB31 0x0050 /* USB 3.1 */ #define HCD_USB32 0x0060 /* USB 3.2 */ -- cgit v1.2.3 From 4298780126c298f20ae4bc8676591eaf8c48767e Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Wed, 9 Aug 2023 20:47:54 +0800 Subject: iommu: Generalize PASID 0 for normal DMA w/o PASID PCIe Process address space ID (PASID) is used to tag DMA traffic, it provides finer grained isolation than requester ID (RID). For each device/RID, 0 is a special PASID for the normal DMA (no PASID). This is universal across all architectures that supports PASID, therefore warranted to be reserved globally and declared in the common header. Consequently, we can avoid the conflict between different PASID use cases in the generic code. e.g. SVA and DMA API with PASIDs. This paved away for device drivers to choose global PASID policy while continue doing normal DMA. Noting that VT-d could support none-zero RID/NO_PASID, but currently not used. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Jean-Philippe Brucker Reviewed-by: Jason Gunthorpe Signed-off-by: Jacob Pan Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..2870bc29d456 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -196,6 +196,7 @@ enum iommu_dev_features { IOMMU_DEV_FEAT_IOPF, }; +#define IOMMU_NO_PASID (0U) /* Reserved for DMA w/o PASID */ #define IOMMU_PASID_INVALID (-1U) typedef unsigned int ioasid_t; -- cgit v1.2.3 From 2dcebc7ddce7ffd4015824227c7623558b89d721 Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Wed, 9 Aug 2023 20:47:55 +0800 Subject: iommu: Move global PASID allocation from SVA to core Intel ENQCMD requires a single PASID to be shared between multiple devices, as the PASID is stored in a single MSR register per-process and userspace can use only that one PASID. This means that the PASID allocation for any ENQCMD using device driver must always come from a shared global pool, regardless of what kind of domain the PASID will be used with. Split the code for the global PASID allocator into iommu_alloc/free_global_pasid() so that drivers can attach non-SVA domains to PASIDs as well. This patch moves global PASID allocation APIs from SVA to IOMMU APIs. Reserved PASIDs, currently only RID_PASID, are excluded from the global PASID allocation. It is expected that device drivers will use the allocated PASIDs to attach to appropriate IOMMU domains for use. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Signed-off-by: Jacob Pan Link: https://lore.kernel.org/r/20230802212427.1497170-3-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2870bc29d456..6b821cb715d5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -197,6 +197,7 @@ enum iommu_dev_features { }; #define IOMMU_NO_PASID (0U) /* Reserved for DMA w/o PASID */ +#define IOMMU_FIRST_GLOBAL_PASID (1U) /*starting range for allocation */ #define IOMMU_PASID_INVALID (-1U) typedef unsigned int ioasid_t; @@ -728,6 +729,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct iommu_domain * iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, unsigned int type); +ioasid_t iommu_alloc_global_pasid(struct device *dev); +void iommu_free_global_pasid(ioasid_t pasid); #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -1089,6 +1092,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, { return NULL; } + +static inline ioasid_t iommu_alloc_global_pasid(struct device *dev) +{ + return IOMMU_PASID_INVALID; +} + +static inline void iommu_free_global_pasid(ioasid_t pasid) {} #endif /* CONFIG_IOMMU_API */ /** -- cgit v1.2.3 From a48ce36e2786f8bb33e9f42ea2d0c23ea4af3e0d Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 9 Aug 2023 20:48:02 +0800 Subject: iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy Cc: Alex Williamson Cc: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Acked-by: Joerg Roedel Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6b821cb715d5..54bae452975f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -411,6 +411,7 @@ struct iommu_fault_param { * @priv: IOMMU Driver private data * @max_pasids: number of PASIDs this device can consume * @attach_deferred: the dma domain attachment is deferred + * @require_direct: device requires IOMMU_RESV_DIRECT regions * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -424,6 +425,7 @@ struct dev_iommu { void *priv; u32 max_pasids; u32 attach_deferred:1; + u32 require_direct:1; }; int iommu_device_register(struct iommu_device *iommu, -- cgit v1.2.3 From cb4396e0d8c498dcd50974ca931a176e5fa1e3fa Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 9 Aug 2023 20:48:06 +0800 Subject: iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() Since commit 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") this is not used anymore, so can remove it. Signed-off-by: YueHaibing Link: https://lore.kernel.org/r/20230802133934.19712-1-yuehaibing@huawei.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel --- include/linux/dmar.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dmar.h b/include/linux/dmar.h index 27dbd4c64860..e34b601b71fd 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -106,8 +106,6 @@ static inline bool dmar_rcu_check(void) extern int dmar_table_init(void); extern int dmar_dev_scope_init(void); extern void dmar_register_bus_notifier(void); -extern int dmar_parse_dev_scope(void *start, void *end, int *cnt, - struct dmar_dev_scope **devices, u16 segment); extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt); extern void dmar_free_dev_scope(struct dmar_dev_scope **devices, int *cnt); extern int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, -- cgit v1.2.3 From 8e9fad0e70b7b62848e0aeb1a873903b9ce4d7c4 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Tue, 27 Jun 2023 06:44:24 -0700 Subject: io_uring: Add io_uring command support for sockets Enable io_uring commands on network sockets. Create two new SOCKET_URING_OP commands that will operate on sockets. In order to call ioctl on sockets, use the file_operations->io_uring_cmd callbacks, and map it to a uring socket function, which handles the SOCKET_URING_OP accordingly, and calls socket ioctls. This patches was tested by creating a new test case in liburing. Link: https://github.com/leitao/liburing/tree/io_uring_cmd Signed-off-by: Breno Leitao Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230627134424.2784797-1-leitao@debian.org Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index bb9c666bd584..106cdc55ff3b 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -81,6 +81,7 @@ static inline void io_uring_free(struct task_struct *tsk) if (tsk->io_uring) __io_uring_free(tsk); } +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) @@ -116,6 +117,11 @@ static inline const char *io_uring_get_opcode(u8 opcode) { return ""; } +static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + return -EOPNOTSUPP; +} #endif #endif -- cgit v1.2.3 From 0fb53e64705ae0fabd9593102e0f0e6812968802 Mon Sep 17 00:00:00 2001 From: Kelvin Cao Date: Fri, 23 Jun 2023 17:00:03 -0700 Subject: PCI: switchtec: Add support for PCIe Gen5 devices Advertise support of Gen5 devices in the driver's device ID table and add the same IDs for the switchtec quirks. Also update driver code to accommodate them. Link: https://lore.kernel.org/r/20230624000003.2315364-3-kelvin.cao@microchip.com Signed-off-by: Kelvin Cao Signed-off-by: Bjorn Helgaas Reviewed-by: Logan Gunthorpe --- include/linux/switchtec.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/switchtec.h b/include/linux/switchtec.h index 48fabe36509e..8d8fac1626bd 100644 --- a/include/linux/switchtec.h +++ b/include/linux/switchtec.h @@ -41,6 +41,7 @@ enum { enum switchtec_gen { SWITCHTEC_GEN3, SWITCHTEC_GEN4, + SWITCHTEC_GEN5, }; struct mrpc_regs { -- cgit v1.2.3 From 90ed8d3dc34bb36b5fb59671924d1ac35f01e75d Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 8 Aug 2023 22:46:10 +0800 Subject: net: phy: Remove two unused function declarations Commit 1e2dc14509fd ("net: ethtool: Add helpers for reporting test results") declared but never implemented these function. Signed-off-by: Yue Haibing Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20230808144610.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index ba08b0e60279..b963ce22e7c7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1732,10 +1732,6 @@ int phy_start_cable_test_tdr(struct phy_device *phydev, } #endif -int phy_cable_test_result(struct phy_device *phydev, u8 pair, u16 result); -int phy_cable_test_fault_length(struct phy_device *phydev, u8 pair, - u16 cm); - static inline void phy_device_reset(struct phy_device *phydev, int value) { mdio_device_reset(&phydev->mdio, value); -- cgit v1.2.3 From 2bc057692599a5b3dc93d75a3dff34f72576355d Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 8 Aug 2023 11:06:17 -0600 Subject: block: don't make REQ_POLLED imply REQ_NOWAIT Normally these two flags do go together, as the issuer of polled IO generally cannot wait for resources that will get freed as part of IO completion. This is because that very task is the one that will complete the request and free those resources, hence that would introduce a deadlock. But it is possible to have someone else issue the polled IO, eg via io_uring if the request is punted to io-wq. For that case, it's fine to have the task block on IO submission, as it is not the same task that will be completing the IO. It's completely up to the caller to ask for both polled and nowait IO separately! If we don't allow polled IO where IOCB_NOWAIT isn't set in the kiocb, then we can run into repeated -EAGAIN submissions and not make any progress. Reviewed-by: Bart Van Assche Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index c4f5b5228105..11984ed29cb8 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -791,7 +791,7 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page, static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb) { bio->bi_opf |= REQ_POLLED; - if (!is_sync_kiocb(kiocb)) + if (kiocb->ki_flags & IOCB_NOWAIT) bio->bi_opf |= REQ_NOWAIT; } -- cgit v1.2.3 From 1ded5e5a5931bb8b31e15b63b655fe232e3416b2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 8 Aug 2023 13:58:09 +0000 Subject: net: annotate data-races around sock->ops IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/net.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/net.h b/include/linux/net.h index 41c608c1b02c..c9b4a63791a4 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -123,7 +123,7 @@ struct socket { struct file *file; struct sock *sk; - const struct proto_ops *ops; + const struct proto_ops *ops; /* Might change with IPV6_ADDRFORM or MPTCP. */ struct socket_wq wq; }; -- cgit v1.2.3 From 1f507e80c700e31e358bf4213dc7e4dd614c7c72 Mon Sep 17 00:00:00 2001 From: Adham Faris Date: Mon, 7 Aug 2023 11:05:07 -0700 Subject: net/mlx5: Expose NIC temperature via hardware monitoring kernel API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare Signed-off-by: Adham Faris Reviewed-by: Tariq Toukan Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed Acked-by: Guenter Roeck Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 3 ++- include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++++- 2 files changed, 15 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 3e1017d764b7..e1c7e502a4fc 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -134,6 +134,7 @@ enum { MLX5_REG_PCAM = 0x507f, MLX5_REG_NODE_DESC = 0x6001, MLX5_REG_HOST_ENDIANNESS = 0x7004, + MLX5_REG_MTCAP = 0x9009, MLX5_REG_MTMP = 0x900A, MLX5_REG_MCIA = 0x9014, MLX5_REG_MFRL = 0x9028, @@ -805,7 +806,7 @@ struct mlx5_core_dev { struct mlx5_rsc_dump *rsc_dump; u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; - struct mlx5_thermal *thermal; + struct mlx5_hwmon *hwmon; u64 num_block_tc; u64 num_block_ipsec; }; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index b3ad6b9852ec..87fd6f9ed82c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -10196,7 +10196,9 @@ struct mlx5_ifc_mcam_access_reg_bits { u8 mrtc[0x1]; u8 regs_44_to_32[0xd]; - u8 regs_31_to_0[0x20]; + u8 regs_31_to_10[0x16]; + u8 mtmp[0x1]; + u8 regs_8_to_0[0x9]; }; struct mlx5_ifc_mcam_access_reg_bits1 { @@ -10949,6 +10951,15 @@ struct mlx5_ifc_mrtc_reg_bits { u8 time_l[0x20]; }; +struct mlx5_ifc_mtcap_reg_bits { + u8 reserved_at_0[0x19]; + u8 sensor_count[0x7]; + + u8 reserved_at_20[0x20]; + + u8 sensor_map[0x40]; +}; + struct mlx5_ifc_mtmp_reg_bits { u8 reserved_at_0[0x14]; u8 sensor_index[0xc]; @@ -11036,6 +11047,7 @@ union mlx5_ifc_ports_control_registers_document_bits { struct mlx5_ifc_mfrl_reg_bits mfrl_reg; struct mlx5_ifc_mtutc_reg_bits mtutc_reg; struct mlx5_ifc_mrtc_reg_bits mrtc_reg; + struct mlx5_ifc_mtcap_reg_bits mtcap_reg; struct mlx5_ifc_mtmp_reg_bits mtmp_reg; u8 reserved_at_0[0x60e0]; }; -- cgit v1.2.3 From 40b0425f8ba17c32cf7182975032a3999c364dfc Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 7 Aug 2023 22:33:19 +0300 Subject: net: ptp: create a mock-up PTP Hardware Clock driver There are several cases where virtual net devices may benefit from having a PTP clock, and these have to do with testing. I can see at least netdevsim and veth as potential users of a common mock-up PTP hardware clock driver. The proposed idea is to create an object which emulates PTP clock operations on top of the unadjustable CLOCK_MONOTONIC_RAW plus a software-controlled time domain via a timecounter/cyclecounter and then link that PHC to the netdevsim device. The driver is fully functional for its intended purpose, and it successfully passes the PTP selftests. $ cd tools/testing/selftests/ptp/ $ ./phc.sh /dev/ptp2 TEST: settime [ OK ] TEST: adjtime [ OK ] TEST: adjfreq [ OK ] Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230807193324.4128292-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/ptp_mock.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 include/linux/ptp_mock.h (limited to 'include/linux') diff --git a/include/linux/ptp_mock.h b/include/linux/ptp_mock.h new file mode 100644 index 000000000000..72eb401034d9 --- /dev/null +++ b/include/linux/ptp_mock.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Mock-up PTP Hardware Clock driver for virtual network devices + * + * Copyright 2023 NXP + */ + +#ifndef _PTP_MOCK_H_ +#define _PTP_MOCK_H_ + +struct device; +struct mock_phc; + +#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_MOCK) + +struct mock_phc *mock_phc_create(struct device *dev); +void mock_phc_destroy(struct mock_phc *phc); +int mock_phc_index(struct mock_phc *phc); + +#else + +static inline struct mock_phc *mock_phc_create(struct device *dev) +{ + return NULL; +} + +static inline void mock_phc_destroy(struct mock_phc *phc) +{ +} + +static inline int mock_phc_index(struct mock_phc *phc) +{ + return -1; +} + +#endif + +#endif /* _PTP_MOCK_H_ */ -- cgit v1.2.3 From 809e4dc71a0f2b8d2836035d98603694fff11d5d Mon Sep 17 00:00:00 2001 From: Xu Kuohai Date: Fri, 4 Aug 2023 03:37:38 -0400 Subject: bpf, sockmap: Fix bug that strp_done cannot be called strp_done is only called when psock->progs.stream_parser is not NULL, but stream_parser was set to NULL by sk_psock_stop_strp(), called by sk_psock_drop() earlier. So, strp_done can never be called. Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock. Change the condition for calling strp_done from judging whether stream_parser is set to judging whether this flag is set. This flag is only set once when strp_init() succeeds, and will never be cleared later. Fixes: c0d95d3380ee ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap") Signed-off-by: Xu Kuohai Reviewed-by: John Fastabend Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau --- include/linux/skmsg.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 054d7911bfc9..c1637515a8a4 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -62,6 +62,7 @@ struct sk_psock_progs { enum sk_psock_state_bits { SK_PSOCK_TX_ENABLED, + SK_PSOCK_RX_STRP_ENABLED, }; struct sk_psock_link { -- cgit v1.2.3 From cf6da236c27a73ab91b657232cd3841aab27c37a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 2 Aug 2023 17:41:20 +0200 Subject: fs: export setup_bdev_super We'll want to use setup_bdev_super instead of duplicating it in nilfs2. Signed-off-by: Christoph Hellwig Reviewed-by: Christian Brauner Message-Id: <20230802154131.2221419-2-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/fs_context.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index ff6341e09925..58ef8433a94b 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -158,6 +158,8 @@ extern int get_tree_keyed(struct fs_context *fc, struct fs_context *fc), void *key); +int setup_bdev_super(struct super_block *sb, int sb_flags, + struct fs_context *fc); extern int get_tree_bdev(struct fs_context *fc, int (*fill_super)(struct super_block *sb, struct fs_context *fc)); -- cgit v1.2.3 From 2daf18a7884dc03d5164ab9c7dc3f2ea70638469 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 8 Aug 2023 21:33:56 -0700 Subject: tmpfs,xattr: enable limited user extended attributes Enable "user." extended attributes on tmpfs, limiting them by tracking the space they occupy, and deducting that space from the limited ispace (unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited). tmpfs inodes and simple xattrs are both unswappable, and have to be in lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate for xattrs, without any need for a further mount option. Add simple_xattr_space() to give approximate but deterministic estimate of the space taken up by each xattr: with simple_xattrs_free() outputting the space freed if required (but kernfs and even some tmpfs usages do not require that, so don't waste time on strlen'ing if not needed). Security and trusted xattrs were already supported: for consistency and simplicity, account them from the same pool; though there's a small risk that a tmpfs with enough space before would now be considered too small. When extended attributes are used, "df -i" does show more IUsed and less IFree than can be explained by the inodes: document that (manpage later). xfstests tests/generic which were not run on tmpfs before but now pass: 020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728 with no new failures. Signed-off-by: Hugh Dickins Reviewed-by: Jan Kara Reviewed-by: Carlos Maiolino Message-Id: <2e63b26e-df46-5baa-c7d6-f9a8dd3282c5@google.com> Signed-off-by: Christian Brauner --- include/linux/xattr.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/xattr.h b/include/linux/xattr.h index e37fe667ae04..d20051865800 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -114,7 +114,8 @@ struct simple_xattr { }; void simple_xattrs_init(struct simple_xattrs *xattrs); -void simple_xattrs_free(struct simple_xattrs *xattrs); +void simple_xattrs_free(struct simple_xattrs *xattrs, size_t *freed_space); +size_t simple_xattr_space(const char *name, size_t size); struct simple_xattr *simple_xattr_alloc(const void *value, size_t size); void simple_xattr_free(struct simple_xattr *xattr); int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, -- cgit v1.2.3 From c64016609b6f66b753b5f37929a191477fa584c0 Mon Sep 17 00:00:00 2001 From: Avadhut Naik Date: Tue, 8 Aug 2023 22:52:42 -0500 Subject: x86/amd_nb: Add PCI IDs for AMD Family 1Ah-based models Add new PCI Device IDs required to support AMD's new Family 1Ah-based models 00h-1Fh, 20h and 40h-4Fh. [ bp: Zap a useless sentence. ] Co-developed-by: Mario Limonciello Signed-off-by: Mario Limonciello Signed-off-by: Avadhut Naik Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/r/20230809035244.2722455-2-avadhut.naik@amd.com --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 2dc75df1437f..8f9a459e1671 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -576,6 +576,8 @@ #define PCI_DEVICE_ID_AMD_19H_M60H_DF_F3 0x14e3 #define PCI_DEVICE_ID_AMD_19H_M70H_DF_F3 0x14f3 #define PCI_DEVICE_ID_AMD_19H_M78H_DF_F3 0x12fb +#define PCI_DEVICE_ID_AMD_1AH_M00H_DF_F3 0x12c3 +#define PCI_DEVICE_ID_AMD_1AH_M20H_DF_F3 0x16fb #define PCI_DEVICE_ID_AMD_MI200_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 -- cgit v1.2.3 From 5e70d0acf0825f439079736080350371f8d6699a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Mon, 17 Jul 2023 15:04:53 +0300 Subject: PCI: Add locking to RMW PCI Express Capability Register accessors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Many places in the kernel write the Link Control and Root Control PCI Express Capability Registers without proper concurrency control and this could result in losing the changes one of the writers intended to make. Add pcie_cap_lock spinlock into the struct pci_dev and use it to protect bit changes made in the RMW capability accessors. Protect only a selected set of registers by differentiating the RMW accessor internally to locked/unlocked variants using a wrapper which has the same signature as pcie_capability_clear_and_set_word(). As the Capability Register (pos) given to the wrapper is always a constant, the compiler should be able to simplify all the dead-code away. So far only the Link Control Register (ASPM, hotplug, link retraining, various drivers) and the Root Control Register (AER & PME) seem to require RMW locking. Suggested-by: Lukas Wunner Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver") Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors") Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM") Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC") Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL") Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261") Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG") Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function") Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination") Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space") Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving") Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260") Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors") Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ") Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver") Fixes: e6de30d63eb1 ("r8169: more 8168dp support.") Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards") Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata") Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants") Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue") Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time") Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default") Link: https://lore.kernel.org/r/20230717120503.15276-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen Signed-off-by: Bjorn Helgaas Reviewed-by: "Rafael J. Wysocki" --- include/linux/pci.h | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index c69a2cc1f412..7ee498cd1f37 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -467,6 +467,7 @@ struct pci_dev { pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ + spinlock_t pcie_cap_lock; /* Protects RMW ops in capability accessors */ u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space; int rom_attr_enabled; /* Display of ROM attribute enabled? */ @@ -1217,11 +1218,40 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val); int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val); int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val); int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val); -int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos, - u16 clear, u16 set); +int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos, + u16 clear, u16 set); +int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos, + u16 clear, u16 set); int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos, u32 clear, u32 set); +/** + * pcie_capability_clear_and_set_word - RMW accessor for PCI Express Capability Registers + * @dev: PCI device structure of the PCI Express device + * @pos: PCI Express Capability Register + * @clear: Clear bitmask + * @set: Set bitmask + * + * Perform a Read-Modify-Write (RMW) operation using @clear and @set + * bitmasks on PCI Express Capability Register at @pos. Certain PCI Express + * Capability Registers are accessed concurrently in RMW fashion, hence + * require locking which is handled transparently to the caller. + */ +static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev, + int pos, + u16 clear, u16 set) +{ + switch (pos) { + case PCI_EXP_LNKCTL: + case PCI_EXP_RTCTL: + return pcie_capability_clear_and_set_word_locked(dev, pos, + clear, set); + default: + return pcie_capability_clear_and_set_word_unlocked(dev, pos, + clear, set); + } +} + static inline int pcie_capability_set_word(struct pci_dev *dev, int pos, u16 set) { -- cgit v1.2.3 From a57c27c7ad85c420b7de44c6ee56692d51709dda Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 9 Aug 2023 15:04:59 +0200 Subject: x86/speculation: Add cpu_show_gds() prototype The newly added function has two definitions but no prototypes: drivers/base/cpu.c:605:16: error: no previous prototype for 'cpu_show_gds' [-Werror=missing-prototypes] Add a declaration next to the other ones for this file to avoid the warning. Fixes: 8974eb588283b ("x86/speculation: Add Gather Data Sampling mitigation") Signed-off-by: Arnd Bergmann Signed-off-by: Dave Hansen Tested-by: Daniel Sneddon Cc: stable@kernel.org Link: https://lore.kernel.org/all/20230809130530.1913368-1-arnd%40kernel.org --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 23ac87be1ff1..e006c719182b 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -72,6 +72,8 @@ extern ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_spec_rstack_overflow(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_gds(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit v1.2.3 From 1fc04a0b973392df5975901f56addc913d2c8f4d Mon Sep 17 00:00:00 2001 From: Shenwei Wang Date: Mon, 7 Aug 2023 11:07:15 -0500 Subject: net: stmmac: add new mode parameter for fix_mac_speed A mode parameter has been added to the callback function of fix_mac_speed to indicate the physical layer type. The mode can be one the following: MLO_AN_PHY - Conventional PHY MLO_AN_FIXED - Fixed-link mode MLO_AN_INBAND - In-band protocol Signed-off-by: Shenwei Wang Link: https://lore.kernel.org/r/20230807160716.259072-2-shenwei.wang@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 652404c03944..784277d666eb 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -256,7 +256,7 @@ struct plat_stmmacenet_data { u8 tx_sched_algorithm; struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES]; struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES]; - void (*fix_mac_speed)(void *priv, unsigned int speed); + void (*fix_mac_speed)(void *priv, unsigned int speed, unsigned int mode); int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); -- cgit v1.2.3 From 1dcc03c9a7a824a31eaaecdfaa03542fb25feb6c Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 8 Aug 2023 23:04:34 +0200 Subject: net: phy: phy_device: Call into the PHY driver to set LED offload Linux LEDs can be requested to perform hardware accelerated blinking to indicate link, RX, TX etc. Pass the rules for blinking to the PHY driver, if it implements the ops needed to determine if a given pattern can be offloaded, to offload it, and what the current offload is. Additionally implement the op needed to get what device the LED is for. Reviewed-by: Simon Horman Signed-off-by: Andrew Lunn Tested-by: Daniel Golle Link: https://lore.kernel.org/r/20230808210436.838995-3-andrew@lunn.ch Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index b963ce22e7c7..3c1ceedd1b77 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1105,6 +1105,39 @@ struct phy_driver { int (*led_blink_set)(struct phy_device *dev, u8 index, unsigned long *delay_on, unsigned long *delay_off); + /** + * @led_hw_is_supported: Can the HW support the given rules. + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules The core is interested in these rules + * + * Return 0 if yes, -EOPNOTSUPP if not, or an error code. + */ + int (*led_hw_is_supported)(struct phy_device *dev, u8 index, + unsigned long rules); + /** + * @led_hw_control_set: Set the HW to control the LED + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules The rules used to control the LED + * + * Returns 0, or a an error code. + */ + int (*led_hw_control_set)(struct phy_device *dev, u8 index, + unsigned long rules); + /** + * @led_hw_control_get: Get how the HW is controlling the LED + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules Pointer to the rules used to control the LED + * + * Set *@rules to how the HW is currently blinking. Returns 0 + * on success, or a error code if the current blinking cannot + * be represented in rules, or some other error happens. + */ + int (*led_hw_control_get)(struct phy_device *dev, u8 index, + unsigned long *rules); + }; #define to_phy_driver(d) container_of(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv) -- cgit v1.2.3 From 23afc82fb22bccd3f1d2a856d3eccb70453f33b0 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 17:13:31 +0800 Subject: soundwire: extend parameters of new_peripheral_assigned() callback The parameters are only the bus and the device number, manager ops may need additional details on the type of peripheral connected, such as whether it is wake-capable or not. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20230731091333.3593132-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index f523ceabd059..94676a3fd0b5 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -862,7 +862,9 @@ struct sdw_master_ops { int (*pre_bank_switch)(struct sdw_bus *bus); int (*post_bank_switch)(struct sdw_bus *bus); u32 (*read_ping_status)(struct sdw_bus *bus); - void (*new_peripheral_assigned)(struct sdw_bus *bus, int dev_num); + void (*new_peripheral_assigned)(struct sdw_bus *bus, + struct sdw_slave *slave, + int dev_num); }; /** -- cgit v1.2.3 From 39d80b0e5fed2c32f66093fead626358b7106974 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 17:13:32 +0800 Subject: soundwire: bus: add callbacks for device_number allocation Rather than add logic in the core for vendor-specific usages, add callbacks for vendor-specific device_number allocation and release. This patch only moves the existing IDA-based allocator used only by Intel to the intel_auxdevice.c file and does not change the functionality. Follow-up patches will extend the behavior by modifying the Intel callbacks. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20230731091333.3593132-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 94676a3fd0b5..bde93ca6aaa6 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -847,6 +847,8 @@ struct sdw_defer { * @post_bank_switch: Callback for post bank switch * @read_ping_status: Read status from PING frames, reported with two bits per Device. * Bits 31:24 are reserved. + * @get_device_num: Callback for vendor-specific device_number allocation + * @put_device_num: Callback for vendor-specific device_number release * @new_peripheral_assigned: Callback to handle enumeration of new peripheral. */ struct sdw_master_ops { @@ -862,6 +864,8 @@ struct sdw_master_ops { int (*pre_bank_switch)(struct sdw_bus *bus); int (*post_bank_switch)(struct sdw_bus *bus); u32 (*read_ping_status)(struct sdw_bus *bus); + int (*get_device_num)(struct sdw_bus *bus, struct sdw_slave *slave); + void (*put_device_num)(struct sdw_bus *bus, struct sdw_slave *slave); void (*new_peripheral_assigned)(struct sdw_bus *bus, struct sdw_slave *slave, int dev_num); @@ -898,9 +902,6 @@ struct sdw_master_ops { * meaningful if multi_link is set. If set to 1, hardware-based * synchronization will be used even if a stream only uses a single * SoundWire segment. - * @dev_num_ida_min: if set, defines the minimum values for the IDA - * used to allocate system-unique device numbers. This value needs to be - * identical across all SoundWire bus in the system. */ struct sdw_bus { struct device *dev; @@ -927,7 +928,6 @@ struct sdw_bus { u32 bank_switch_timeout; bool multi_link; int hw_sync_min_links; - int dev_num_ida_min; }; int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent, -- cgit v1.2.3 From e66f91a2d10b9a25eedcaddee9d6f08c8132760a Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Mon, 31 Jul 2023 17:13:33 +0800 Subject: soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation The IDA-based allocation is useful to simplify debug, but it was also introduced as a prerequisite to deal with the Intel Lunar Lake hardware programming sequences: the wake-ups have to be handled with a system-unique SDI address at the HDaudio controller level. At the time, the restriction introduced by the IDA to 8 devices total seemed perfectly fine, but recently hardware vendors created configurations with more than 8 devices. Add a new allocation strategy to allow for more than 8 devices using information on the type of devices, and only use the IDA-based allocation for devices capable of generating a wake. In theory the information on wake capabilities should come from firmware, but none of the existing ACPI tables provide it. The drivers set the 'wake_capable' property, but this cannot be used reliably: if the driver probe happens *after* the enumeration, then that property is not initialized yet. Trying to modify the device_number on-the-fly proved to be an impossible task generating race conditions left and right. The only reliable work-around to control the enumeration is to add a quirk table. It's ugly but until platform firmware improves, hopefully as a result of MIPI/SDCA stardization, we can expect that quirk table to grow for each new headset or microphone codec. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20230731091333.3593132-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 11fc88fb0d78..3a824cae7379 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -433,4 +433,11 @@ struct sdw_intel_hw_ops { extern const struct sdw_intel_hw_ops sdw_intel_cnl_hw_ops; extern const struct sdw_intel_hw_ops sdw_intel_lnl_hw_ops; +/* + * IDA min selected to allow for 5 unconstrained devices per link, + * and 6 system-unique Device Numbers for wake-capable devices. + */ + +#define SDW_INTEL_DEV_NUM_IDA_MIN 6 + #endif -- cgit v1.2.3 From 913e99287b98fd051ac1976140a2764a8ef9dfbf Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 7 Aug 2023 15:38:39 -0400 Subject: fs: drop the timespec64 argument from update_time Now that all of the update_time operations are prepared for it, we can drop the timespec64 argument from the update_time operation. Do that and remove it from some associated functions like inode_update_time and inode_needs_update_time. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bb3c2c4f871f..a83313f90fe3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1887,7 +1887,7 @@ struct inode_operations { ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len); - int (*update_time)(struct inode *, struct timespec64 *, int); + int (*update_time)(struct inode *, int); int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode); @@ -2237,7 +2237,7 @@ enum file_time_flags { extern bool atime_needs_update(const struct path *, struct inode *); extern void touch_atime(const struct path *); -int inode_update_time(struct inode *inode, struct timespec64 *time, int flags); +int inode_update_time(struct inode *inode, int flags); static inline void file_accessed(struct file *file) { -- cgit v1.2.3 From ffb6cf19e06334062744b7e3493f71e500964f8e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 7 Aug 2023 15:38:40 -0400 Subject: fs: add infrastructure for multigrain timestamps The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. What we need is a way to only use fine-grained timestamps when they are being actively queried. POSIX generally mandates that when the the mtime changes, the ctime must also change. The kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Use the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Later patches will convert individual filesystems to use the new infrastructure. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Message-Id: <20230807-mgctime-v7-9-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 46 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index a83313f90fe3..455835d0e963 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1474,18 +1474,47 @@ static inline bool fsuidgid_has_mapping(struct super_block *sb, kgid_has_mapping(fs_userns, kgid); } +struct timespec64 current_mgtime(struct inode *inode); struct timespec64 current_time(struct inode *inode); struct timespec64 inode_set_ctime_current(struct inode *inode); +/* + * Multigrain timestamps + * + * Conditionally use fine-grained ctime and mtime timestamps when there + * are users actively observing them via getattr. The primary use-case + * for this is NFS clients that use the ctime to distinguish between + * different states of the file, and that are often fooled by multiple + * operations that occur in the same coarse-grained timer tick. + * + * The kernel always keeps normalized struct timespec64 values in the ctime, + * which means that only the first 30 bits of the value are used. Use the + * 31st bit of the ctime's tv_nsec field as a flag to indicate that the value + * has been queried since it was last updated. + */ +#define I_CTIME_QUERIED (1L<<30) + /** * inode_get_ctime - fetch the current ctime from the inode * @inode: inode from which to fetch ctime * - * Grab the current ctime from the inode and return it. + * Grab the current ctime tv_nsec field from the inode, mask off the + * I_CTIME_QUERIED flag and return it. This is mostly intended for use by + * internal consumers of the ctime that aren't concerned with ensuring a + * fine-grained update on the next change (e.g. when preparing to store + * the value in the backing store for later retrieval). + * + * This is safe to call regardless of whether the underlying filesystem + * is using multigrain timestamps. */ static inline struct timespec64 inode_get_ctime(const struct inode *inode) { - return inode->__i_ctime; + struct timespec64 ctime; + + ctime.tv_sec = inode->__i_ctime.tv_sec; + ctime.tv_nsec = inode->__i_ctime.tv_nsec & ~I_CTIME_QUERIED; + + return ctime; } /** @@ -2259,6 +2288,7 @@ struct file_system_type { #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ +#define FS_MGTIME 64 /* FS uses multigrain timestamps */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; @@ -2282,6 +2312,17 @@ struct file_system_type { #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME) +/** + * is_mgtime: is this inode using multigrain timestamps + * @inode: inode to test for multigrain timestamps + * + * Return true if the inode uses multigrain timestamps, false otherwise. + */ +static inline bool is_mgtime(const struct inode *inode) +{ + return inode->i_sb->s_type->fs_flags & FS_MGTIME; +} + extern struct dentry *mount_bdev(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, int (*fill_super)(struct super_block *, void *, int)); @@ -2918,6 +2959,7 @@ extern void page_put_link(void *); extern int page_symlink(struct inode *inode, const char *symname, int len); extern const struct inode_operations page_symlink_inode_operations; extern void kfree_link(void *); +void fill_mg_cmtime(struct kstat *stat, u32 request_mask, struct inode *inode); void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *); void generic_fill_statx_attr(struct inode *inode, struct kstat *stat); extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int); -- cgit v1.2.3 From 7ecd0b6f510005e120f5bc198bacb5951814cf36 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 2 Aug 2023 17:41:27 +0200 Subject: fs: export fs_holder_ops Export fs_holder_ops so that file systems that open additional block devices can use it as well. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Christian Brauner Message-Id: <20230802154131.2221419-9-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blkdev.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ed44a997f629..83262702eea7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1464,6 +1464,8 @@ struct blk_holder_ops { void (*mark_dead)(struct block_device *bdev); }; +extern const struct blk_holder_ops fs_holder_ops; + /* * Return the correct open flags for blkdev_get_by_* for super block flags * as stored in sb->s_flags. -- cgit v1.2.3 From 8314aa8af4f9905dded47e53b3e283e632736d41 Mon Sep 17 00:00:00 2001 From: Peng Fan Date: Mon, 7 Aug 2023 20:14:28 +0800 Subject: firmware: imx: scu: use EOPNOTSUPP Per checkpatch.pl, "ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP" So use EOPNOTSUPP to replace ENOTSUPP. Signed-off-by: Peng Fan Signed-off-by: Shawn Guo --- include/linux/firmware/imx/sci.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/imx/sci.h b/include/linux/firmware/imx/sci.h index 5cc63fe7e84d..7fa0f3b329b5 100644 --- a/include/linux/firmware/imx/sci.h +++ b/include/linux/firmware/imx/sci.h @@ -25,27 +25,27 @@ int imx_scu_soc_init(struct device *dev); #else static inline int imx_scu_soc_init(struct device *dev) { - return -ENOTSUPP; + return -EOPNOTSUPP; } static inline int imx_scu_enable_general_irq_channel(struct device *dev) { - return -ENOTSUPP; + return -EOPNOTSUPP; } static inline int imx_scu_irq_register_notifier(struct notifier_block *nb) { - return -ENOTSUPP; + return -EOPNOTSUPP; } static inline int imx_scu_irq_unregister_notifier(struct notifier_block *nb) { - return -ENOTSUPP; + return -EOPNOTSUPP; } static inline int imx_scu_irq_group_enable(u8 group, u32 mask, u8 enable) { - return -ENOTSUPP; + return -EOPNOTSUPP; } #endif #endif /* _SC_SCI_H */ -- cgit v1.2.3 From d2bd250cefabd2ce51a77bf69435e28f1fa47176 Mon Sep 17 00:00:00 2001 From: Peng Fan Date: Mon, 7 Aug 2023 20:14:30 +0800 Subject: firmware: imx: scu-irq: add imx_scu_irq_get_status Extract the scu irq get status code from imx_scu_irq_work_handler and make into a new function imx_scu_irq_get_status which could be used by others, such as SECO. Signed-off-by: Peng Fan Signed-off-by: Shawn Guo --- include/linux/firmware/imx/sci.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/imx/sci.h b/include/linux/firmware/imx/sci.h index 7fa0f3b329b5..df17196df5ff 100644 --- a/include/linux/firmware/imx/sci.h +++ b/include/linux/firmware/imx/sci.h @@ -21,6 +21,7 @@ int imx_scu_enable_general_irq_channel(struct device *dev); int imx_scu_irq_register_notifier(struct notifier_block *nb); int imx_scu_irq_unregister_notifier(struct notifier_block *nb); int imx_scu_irq_group_enable(u8 group, u32 mask, u8 enable); +int imx_scu_irq_get_status(u8 group, u32 *irq_status); int imx_scu_soc_init(struct device *dev); #else static inline int imx_scu_soc_init(struct device *dev) @@ -47,5 +48,10 @@ static inline int imx_scu_irq_group_enable(u8 group, u32 mask, u8 enable) { return -EOPNOTSUPP; } + +static inline int imx_scu_irq_get_status(u8 group, u32 *irq_status) +{ + return -EOPNOTSUPP; +} #endif #endif /* _SC_SCI_H */ -- cgit v1.2.3 From abb05ac9f78b3760d118d17e69e27367a5f0a9d7 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:36 +0200 Subject: tty: ldisc: document that ldops are optional There is no need to provide any hook in struct tty_ldisc_ops. Document that and write down that read/write return EIO in that case. The rest is simply ignored. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230810091510.13006-3-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_ldisc.h | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index 49dc172dedc7..62e089434995 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -71,7 +71,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * call to @receive_buf(). Returning an error will prevent the ldisc from * being attached. * - * Can sleep. + * Optional. Can sleep. * * @close: [TTY] ``void ()(struct tty_struct *tty)`` * @@ -80,7 +80,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * changed to use a new line discipline. At the point of execution no * further users will enter the ldisc code for this tty. * - * Can sleep. + * Optional. Can sleep. * * @flush_buffer: [TTY] ``void ()(struct tty_struct *tty)`` * @@ -88,6 +88,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * input characters it may have queued to be delivered to the user mode * process. It may be called at any point between open and close. * + * Optional. + * * @read: [TTY] ``ssize_t ()(struct tty_struct *tty, struct file *file, * unsigned char *buf, size_t nr)`` * @@ -97,7 +99,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * an %EIO error. Multiple read calls may occur in parallel and the ldisc * must deal with serialization issues. * - * Can sleep. + * Optional: %EIO unless provided. Can sleep. * * @write: [TTY] ``ssize_t ()(struct tty_struct *tty, struct file *file, * const unsigned char *buf, size_t nr)`` @@ -108,7 +110,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * characters first. If this function is not defined, the user will * receive an %EIO error. * - * Can sleep. + * Optional: %EIO unless provided. Can sleep. * * @ioctl: [TTY] ``int ()(struct tty_struct *tty, unsigned int cmd, * unsigned long arg)`` @@ -120,6 +122,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * discpline. So a low-level driver can "grab" an ioctl request before * the line discpline has a chance to see it. * + * Optional. + * * @compat_ioctl: [TTY] ``int ()(struct tty_struct *tty, unsigned int cmd, * unsigned long arg)`` * @@ -130,11 +134,15 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * a pointer to wordsize-sensitive structure belongs here, but most of * ldiscs will happily leave it %NULL. * + * Optional. + * * @set_termios: [TTY] ``void ()(struct tty_struct *tty, const struct ktermios *old)`` * * This function notifies the line discpline that a change has been made * to the termios structure. * + * Optional. + * * @poll: [TTY] ``int ()(struct tty_struct *tty, struct file *file, * struct poll_table_struct *wait)`` * @@ -142,6 +150,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * device. It is solely the responsibility of the line discipline to * handle poll requests. * + * Optional. + * * @hangup: [TTY] ``void ()(struct tty_struct *tty)`` * * Called on a hangup. Tells the discipline that it should cease I/O to @@ -149,7 +159,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * but should wait until any pending driver I/O is completed. No further * calls into the ldisc code will occur. * - * Can sleep. + * Optional. Can sleep. * * @receive_buf: [DRV] ``void ()(struct tty_struct *tty, * const unsigned char *cp, const char *fp, int count)`` @@ -161,6 +171,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * character was received with a parity error, etc. @fp may be %NULL to * indicate all data received is %TTY_NORMAL. * + * Optional. + * * @write_wakeup: [DRV] ``void ()(struct tty_struct *tty)`` * * This function is called by the low-level tty driver to signal that line @@ -170,11 +182,15 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * send, please arise a tasklet or workqueue to do the real data transfer. * Do not send data in this hook, it may lead to a deadlock. * + * Optional. + * * @dcd_change: [DRV] ``void ()(struct tty_struct *tty, bool active)`` * * Tells the discipline that the DCD pin has changed its status. Used * exclusively by the %N_PPS (Pulse-Per-Second) line discipline. * + * Optional. + * * @receive_buf2: [DRV] ``int ()(struct tty_struct *tty, * const unsigned char *cp, const char *fp, int count)`` * @@ -186,6 +202,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * indicate all data received is %TTY_NORMAL. If assigned, prefer this * function for automatic flow control. * + * Optional. + * * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, * const unsigned char *cp, const char *fp, int count)`` * @@ -198,6 +216,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * same characters (e.g. by skipping the actions for high-priority * characters already handled by ->lookahead_buf()). * + * Optional. + * * @owner: module containting this ldisc (for reference counting) * * This structure defines the interface between the tty line discipline -- cgit v1.2.3 From 0b7a2b282959d3311f158629f67c6d681a3dc2b3 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:43 +0200 Subject: tty: make tty_port_client_operations operate with u8 The parameters are already unsigned chars. So make them explicitly u8s, as the rest is going to be unified to u8 eventually too. Signed-off-by: "Jiri Slaby (SUSE)" Cc: Rob Herring Link: https://lore.kernel.org/r/20230810091510.13006-10-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_port.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_port.h b/include/linux/tty_port.h index edf685a24f7c..726575743367 100644 --- a/include/linux/tty_port.h +++ b/include/linux/tty_port.h @@ -39,9 +39,10 @@ struct tty_port_operations { }; struct tty_port_client_operations { - int (*receive_buf)(struct tty_port *port, const unsigned char *, const unsigned char *, size_t); - void (*lookahead_buf)(struct tty_port *port, const unsigned char *cp, - const unsigned char *fp, unsigned int count); + int (*receive_buf)(struct tty_port *port, const u8 *cp, const u8 *fp, + size_t count); + void (*lookahead_buf)(struct tty_port *port, const u8 *cp, + const u8 *fp, unsigned int count); void (*write_wakeup)(struct tty_port *port); }; -- cgit v1.2.3 From 0468a8071d7cfb0f5bc02b0888cec4525551299f Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:44 +0200 Subject: tty: make counts in tty_port_client_operations hooks size_t The counts in tty_port_client_operations hooks' are currently represented by all 'int', 'unsigned int', and 'size_t'. Unify them all to unsigned 'size_t' for clarity. Note that size_t is used already in tty_buffer.c. So, eventually, it is spread for counts everywhere and this is the beginning. So the two changes namely: * ::receive_buf() is called from tty_ldisc_receive_buf(). And that expects values ">= 0" from ::receive_buf(), so switch its rettype to size_t is fine. tty_ldisc_receive_buf() types will be changed separately. * ::lookahead_buf()'s count comes from lookahead_bufs() and is already 'unsigned int'. Signed-off-by: "Jiri Slaby (SUSE)" Cc: Rob Herring Link: https://lore.kernel.org/r/20230810091510.13006-11-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_port.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_port.h b/include/linux/tty_port.h index 726575743367..6b367eb17979 100644 --- a/include/linux/tty_port.h +++ b/include/linux/tty_port.h @@ -39,10 +39,10 @@ struct tty_port_operations { }; struct tty_port_client_operations { - int (*receive_buf)(struct tty_port *port, const u8 *cp, const u8 *fp, - size_t count); + size_t (*receive_buf)(struct tty_port *port, const u8 *cp, const u8 *fp, + size_t count); void (*lookahead_buf)(struct tty_port *port, const u8 *cp, - const u8 *fp, unsigned int count); + const u8 *fp, size_t count); void (*write_wakeup)(struct tty_port *port); }; -- cgit v1.2.3 From 8d9526f99fc39ebaca173eda646818023e7f0730 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:46 +0200 Subject: tty: switch count in tty_ldisc_receive_buf() to size_t It comes from both paste_selection() and tty_port_default_receive_buf() as unsigned (int and size_t respectively). Switch to size_t to converge to that eventually. Return the count as size_t too (the two callers above expect that). Switch paste_selection()'s type of 'count' too, so that the returned and passed type match. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230810091510.13006-13-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_flip.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index bfaaeee61a05..09c4dbcd0025 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -41,8 +41,8 @@ static inline int tty_insert_flip_string(struct tty_port *port, return tty_insert_flip_string_fixed_flag(port, chars, TTY_NORMAL, size); } -int tty_ldisc_receive_buf(struct tty_ldisc *ld, const unsigned char *p, - const char *f, int count); +size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const unsigned char *p, + const char *f, size_t count); void tty_buffer_lock_exclusive(struct tty_port *port); void tty_buffer_unlock_exclusive(struct tty_port *port); -- cgit v1.2.3 From e8161447bb0ce2d59277e9276012dd1c6f357850 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:49 +0200 Subject: tty: make tty_ldisc_ops::*buf*() hooks operate on size_t Count passed to tty_ldisc_ops::receive_buf*(), ::lookahead_buf(), and returned from ::receive_buf2() is expected to be size_t. So set it to size_t to unify with the rest of the code. Signed-off-by: "Jiri Slaby (SUSE)" Cc: William Hubbs Cc: Chris Brannon Cc: Kirk Reiser Cc: Samuel Thibault Cc: Marcel Holtmann Cc: Johan Hedberg Cc: Luiz Augusto von Dentz Cc: Dmitry Torokhov Cc: Arnd Bergmann Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Max Staudt Cc: Wolfgang Grandegger Cc: Marc Kleine-Budde Cc: Dario Binacchi Cc: Andreas Koensgen Cc: Jeremy Kerr Cc: Matt Johnston Cc: Krzysztof Kozlowski Cc: Liam Girdwood Cc: Mark Brown Cc: Jaroslav Kysela Cc: Takashi Iwai Acked-by: Mark Brown Link: https://lore.kernel.org/r/20230810091510.13006-16-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_ldisc.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index 62e089434995..f88529e6a783 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -162,7 +162,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional. Can sleep. * * @receive_buf: [DRV] ``void ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, int count)`` + * const unsigned char *cp, const char *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is @@ -191,8 +191,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * * Optional. * - * @receive_buf2: [DRV] ``int ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, int count)`` + * @receive_buf2: [DRV] ``ssize_t ()(struct tty_struct *tty, + * const unsigned char *cp, const char *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is a @@ -205,7 +205,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional. * * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, int count)`` + * const unsigned char *cp, const char *fp, size_t count)`` * * This function is called by the low-level tty driver for characters * not eaten by ->receive_buf() or ->receive_buf2(). It is useful for @@ -256,13 +256,13 @@ struct tty_ldisc_ops { * The following routines are called from below. */ void (*receive_buf)(struct tty_struct *tty, const unsigned char *cp, - const char *fp, int count); + const char *fp, size_t count); void (*write_wakeup)(struct tty_struct *tty); void (*dcd_change)(struct tty_struct *tty, bool active); - int (*receive_buf2)(struct tty_struct *tty, const unsigned char *cp, - const char *fp, int count); + size_t (*receive_buf2)(struct tty_struct *tty, const unsigned char *cp, + const char *fp, size_t count); void (*lookahead_buf)(struct tty_struct *tty, const unsigned char *cp, - const unsigned char *fp, unsigned int count); + const unsigned char *fp, size_t count); struct module *owner; }; -- cgit v1.2.3 From a8d9cd2318606627d3c0e4747dbd7bbc44c48e27 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:50 +0200 Subject: tty: use u8 for chars This makes all those 'unsigned char's an explicit 'u8'. This is part of the continuing unification of chars and flags to be consistent u8. This approaches tty_port_default_receive_buf(). Flags to be next. Signed-off-by: "Jiri Slaby (SUSE)" Cc: William Hubbs Cc: Chris Brannon Cc: Kirk Reiser Cc: Samuel Thibault Cc: Dmitry Torokhov Cc: Arnd Bergmann Cc: Max Staudt Cc: Wolfgang Grandegger Cc: Marc Kleine-Budde Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Dario Binacchi Cc: Andreas Koensgen Cc: Jeremy Kerr Cc: Matt Johnston Cc: Liam Girdwood Cc: Mark Brown Cc: Jaroslav Kysela Cc: Takashi Iwai Cc: Peter Ujfalusi Acked-by: Mark Brown Link: https://lore.kernel.org/r/20230810091510.13006-17-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 4 ++-- include/linux/tty_flip.h | 22 ++++++++++------------ include/linux/tty_ldisc.h | 18 +++++++++--------- 3 files changed, 21 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index 6ceb2789e6c8..6f2966b15093 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -22,9 +22,9 @@ struct tty_buffer { unsigned long data[]; }; -static inline unsigned char *char_buf_ptr(struct tty_buffer *b, int ofs) +static inline u8 *char_buf_ptr(struct tty_buffer *b, int ofs) { - return ((unsigned char *)b->data) + ofs; + return ((u8 *)b->data) + ofs; } static inline char *flag_buf_ptr(struct tty_buffer *b, int ofs) diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index 09c4dbcd0025..a0fcffeaaa25 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -10,17 +10,15 @@ struct tty_ldisc; int tty_buffer_set_limit(struct tty_port *port, int limit); unsigned int tty_buffer_space_avail(struct tty_port *port); int tty_buffer_request_room(struct tty_port *port, size_t size); -int tty_insert_flip_string_flags(struct tty_port *port, - const unsigned char *chars, const char *flags, size_t size); -int tty_insert_flip_string_fixed_flag(struct tty_port *port, - const unsigned char *chars, char flag, size_t size); -int tty_prepare_flip_string(struct tty_port *port, unsigned char **chars, - size_t size); +int tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, + const char *flags, size_t size); +int tty_insert_flip_string_fixed_flag(struct tty_port *port, const u8 *chars, + char flag, size_t size); +int tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); -int __tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag); +int __tty_insert_flip_char(struct tty_port *port, u8 ch, char flag); -static inline int tty_insert_flip_char(struct tty_port *port, - unsigned char ch, char flag) +static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, char flag) { struct tty_buffer *tb = port->buf.tail; int change; @@ -36,13 +34,13 @@ static inline int tty_insert_flip_char(struct tty_port *port, } static inline int tty_insert_flip_string(struct tty_port *port, - const unsigned char *chars, size_t size) + const u8 *chars, size_t size) { return tty_insert_flip_string_fixed_flag(port, chars, TTY_NORMAL, size); } -size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const unsigned char *p, - const char *f, size_t count); +size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const u8 *p, const char *f, + size_t count); void tty_buffer_lock_exclusive(struct tty_port *port); void tty_buffer_unlock_exclusive(struct tty_port *port); diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index f88529e6a783..5551c4400e59 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -161,8 +161,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * * Optional. Can sleep. * - * @receive_buf: [DRV] ``void ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, size_t count)`` + * @receive_buf: [DRV] ``void ()(struct tty_struct *tty, const u8 *cp, + * const char *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is @@ -191,8 +191,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * * Optional. * - * @receive_buf2: [DRV] ``ssize_t ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, size_t count)`` + * @receive_buf2: [DRV] ``ssize_t ()(struct tty_struct *tty, const u8 *cp, + * const char *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is a @@ -204,8 +204,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * * Optional. * - * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, size_t count)`` + * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, const u8 *cp, + * const char *fp, size_t count)`` * * This function is called by the low-level tty driver for characters * not eaten by ->receive_buf() or ->receive_buf2(). It is useful for @@ -255,13 +255,13 @@ struct tty_ldisc_ops { /* * The following routines are called from below. */ - void (*receive_buf)(struct tty_struct *tty, const unsigned char *cp, + void (*receive_buf)(struct tty_struct *tty, const u8 *cp, const char *fp, size_t count); void (*write_wakeup)(struct tty_struct *tty); void (*dcd_change)(struct tty_struct *tty, bool active); - size_t (*receive_buf2)(struct tty_struct *tty, const unsigned char *cp, + size_t (*receive_buf2)(struct tty_struct *tty, const u8 *cp, const char *fp, size_t count); - void (*lookahead_buf)(struct tty_struct *tty, const unsigned char *cp, + void (*lookahead_buf)(struct tty_struct *tty, const u8 *cp, const unsigned char *fp, size_t count); struct module *owner; -- cgit v1.2.3 From 892bc209f250fb49ddca31c74d2c7b1126a7a61a Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:51 +0200 Subject: tty: use u8 for flags This makes all those 'char's an explicit 'u8'. This is part of the continuing unification of chars and flags to be consistent u8. This approaches tty_port_default_receive_buf(). Note that we do not change signedness as we compile with -funsigned-char. Signed-off-by: "Jiri Slaby (SUSE)" Cc: William Hubbs Cc: Chris Brannon Cc: Kirk Reiser Cc: Samuel Thibault Cc: Marcel Holtmann Cc: Johan Hedberg Cc: Luiz Augusto von Dentz Cc: Dmitry Torokhov Cc: Arnd Bergmann Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Max Staudt Cc: Wolfgang Grandegger Cc: Marc Kleine-Budde Cc: Dario Binacchi Cc: Andreas Koensgen Cc: Jeremy Kerr Cc: Matt Johnston Cc: Krzysztof Kozlowski Cc: Liam Girdwood Cc: Mark Brown Cc: Jaroslav Kysela Cc: Takashi Iwai Acked-by: Mark Brown Link: https://lore.kernel.org/r/20230810091510.13006-18-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 4 ++-- include/linux/tty_flip.h | 10 +++++----- include/linux/tty_ldisc.h | 12 ++++++------ 3 files changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index 6f2966b15093..b11cc8c749d2 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -27,9 +27,9 @@ static inline u8 *char_buf_ptr(struct tty_buffer *b, int ofs) return ((u8 *)b->data) + ofs; } -static inline char *flag_buf_ptr(struct tty_buffer *b, int ofs) +static inline u8 *flag_buf_ptr(struct tty_buffer *b, int ofs) { - return (char *)char_buf_ptr(b, ofs) + b->size; + return char_buf_ptr(b, ofs) + b->size; } struct tty_bufhead { diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index a0fcffeaaa25..d33aed2172c7 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -11,14 +11,14 @@ int tty_buffer_set_limit(struct tty_port *port, int limit); unsigned int tty_buffer_space_avail(struct tty_port *port); int tty_buffer_request_room(struct tty_port *port, size_t size); int tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, - const char *flags, size_t size); + const u8 *flags, size_t size); int tty_insert_flip_string_fixed_flag(struct tty_port *port, const u8 *chars, - char flag, size_t size); + u8 flag, size_t size); int tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); -int __tty_insert_flip_char(struct tty_port *port, u8 ch, char flag); +int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); -static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, char flag) +static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) { struct tty_buffer *tb = port->buf.tail; int change; @@ -39,7 +39,7 @@ static inline int tty_insert_flip_string(struct tty_port *port, return tty_insert_flip_string_fixed_flag(port, chars, TTY_NORMAL, size); } -size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const u8 *p, const char *f, +size_t tty_ldisc_receive_buf(struct tty_ldisc *ld, const u8 *p, const u8 *f, size_t count); void tty_buffer_lock_exclusive(struct tty_port *port); diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index 5551c4400e59..a661d7df5497 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -162,7 +162,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional. Can sleep. * * @receive_buf: [DRV] ``void ()(struct tty_struct *tty, const u8 *cp, - * const char *fp, size_t count)`` + * const u8 *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is @@ -192,7 +192,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional. * * @receive_buf2: [DRV] ``ssize_t ()(struct tty_struct *tty, const u8 *cp, - * const char *fp, size_t count)`` + * const u8 *fp, size_t count)`` * * This function is called by the low-level tty driver to send characters * received by the hardware to the line discpline for processing. @cp is a @@ -205,7 +205,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional. * * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, const u8 *cp, - * const char *fp, size_t count)`` + * const u8 *fp, size_t count)`` * * This function is called by the low-level tty driver for characters * not eaten by ->receive_buf() or ->receive_buf2(). It is useful for @@ -256,13 +256,13 @@ struct tty_ldisc_ops { * The following routines are called from below. */ void (*receive_buf)(struct tty_struct *tty, const u8 *cp, - const char *fp, size_t count); + const u8 *fp, size_t count); void (*write_wakeup)(struct tty_struct *tty); void (*dcd_change)(struct tty_struct *tty, bool active); size_t (*receive_buf2)(struct tty_struct *tty, const u8 *cp, - const char *fp, size_t count); + const u8 *fp, size_t count); void (*lookahead_buf)(struct tty_struct *tty, const u8 *cp, - const unsigned char *fp, size_t count); + const u8 *fp, size_t count); struct module *owner; }; -- cgit v1.2.3 From ead03e721f410d83825835b5abd8e84fcb4305fa Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:52 +0200 Subject: misc: ti-st: make st_recv() conforming to tty_ldisc_ops::receive_buf() That is change data type to u8 and count to unsigned int. And propagate to both hooks (st_kim_recv() and kim_int_recv()). Signed-off-by: "Jiri Slaby (SUSE)" Cc: Arnd Bergmann Link: https://lore.kernel.org/r/20230810091510.13006-19-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/ti_wilink_st.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ti_wilink_st.h b/include/linux/ti_wilink_st.h index 44a7f9169ac6..10642d4844f0 100644 --- a/include/linux/ti_wilink_st.h +++ b/include/linux/ti_wilink_st.h @@ -271,7 +271,7 @@ long st_kim_stop(void *); void st_kim_complete(void *); void kim_st_list_protocols(struct st_data_s *, void *); -void st_kim_recv(void *, const unsigned char *, long); +void st_kim_recv(void *disc_data, const u8 *data, size_t count); /* -- cgit v1.2.3 From 5db35be97cca6dd250df5c8144fcf7429ceb7f12 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:53 +0200 Subject: tty: make char_buf_ptr()/flag_buf_ptr()'s offset unsigned The offset is meant from the beginning of data, so unsigned. Make it as such for clarity. All struct tty_buffer's members should be unsigned too -- see the next patch. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230810091510.13006-20-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index b11cc8c749d2..391a875be20c 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -22,12 +22,12 @@ struct tty_buffer { unsigned long data[]; }; -static inline u8 *char_buf_ptr(struct tty_buffer *b, int ofs) +static inline u8 *char_buf_ptr(struct tty_buffer *b, unsigned int ofs) { return ((u8 *)b->data) + ofs; } -static inline u8 *flag_buf_ptr(struct tty_buffer *b, int ofs) +static inline u8 *flag_buf_ptr(struct tty_buffer *b, unsigned int ofs) { return char_buf_ptr(b, ofs) + b->size; } -- cgit v1.2.3 From b97552eb064d2257f4d5657eb6f375fd82a8a481 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:14:54 +0200 Subject: tty: tty_buffer: make all offsets unsigned All these are supposed/expected to be unsigned as they are either counts or offsets. So switch to unsigned for clarity. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230810091510.13006-21-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index 391a875be20c..e45cba81d0e9 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -12,11 +12,11 @@ struct tty_buffer { struct tty_buffer *next; struct llist_node free; }; - int used; - int size; - int commit; - int lookahead; /* Lazy update on recv, can become less than "read" */ - int read; + unsigned int used; + unsigned int size; + unsigned int commit; + unsigned int lookahead; /* Lazy update on recv, can become less than "read" */ + unsigned int read; bool flags; /* Data points here */ unsigned long data[]; -- cgit v1.2.3 From 69851e4ab8feeb369119a44ddca430c0ee15f0d8 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:15:01 +0200 Subject: tty: propagate u8 data to tty_operations::write() Data are now typed as u8. Propagate this change to tty_operations::write(). Signed-off-by: "Jiri Slaby (SUSE)" Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Geert Uytterhoeven Cc: Richard Weinberger Cc: Anton Ivanov Cc: Johannes Berg Cc: Chris Zankel Cc: Max Filippov Cc: Arnd Bergmann Cc: Vaibhav Gupta Cc: Jens Taprogge Cc: Karsten Keil Cc: Scott Branden Cc: Ulf Hansson Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Alexander Gordeev Cc: Christian Borntraeger Cc: Sven Schnelle Cc: David Lin Cc: Johan Hovold Cc: Alex Elder Cc: Laurentiu Tudor Cc: Jiri Kosina Cc: David Sterba Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Arnaud Pouliquen Cc: Oliver Neukum Cc: Mathias Nyman Cc: Marcel Holtmann Cc: Johan Hedberg Cc: Luiz Augusto von Dentz Link: https://lore.kernel.org/r/20230810091510.13006-28-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_driver.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index e00034118c7b..a7bd8060ac96 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -356,8 +356,7 @@ struct tty_operations { void (*close)(struct tty_struct * tty, struct file * filp); void (*shutdown)(struct tty_struct *tty); void (*cleanup)(struct tty_struct *tty); - int (*write)(struct tty_struct * tty, - const unsigned char *buf, int count); + int (*write)(struct tty_struct *tty, const u8 *buf, int count); int (*put_char)(struct tty_struct *tty, unsigned char ch); void (*flush_chars)(struct tty_struct *tty); unsigned int (*write_room)(struct tty_struct *tty); -- cgit v1.2.3 From dcaafbe6ee3b39f2df11a1352f85172f8ade17a5 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:15:02 +0200 Subject: tty: propagate u8 data to tty_operations::put_char() Data are now typed as u8. Propagate this change to tty_operations::put_char(). Signed-off-by: "Jiri Slaby (SUSE)" Cc: Geert Uytterhoeven Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Alexander Gordeev Cc: Christian Borntraeger Cc: Sven Schnelle Cc: Karsten Keil Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Mathias Nyman Link: https://lore.kernel.org/r/20230810091510.13006-29-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index a7bd8060ac96..c5299d952e59 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -357,7 +357,7 @@ struct tty_operations { void (*shutdown)(struct tty_struct *tty); void (*cleanup)(struct tty_struct *tty); int (*write)(struct tty_struct *tty, const u8 *buf, int count); - int (*put_char)(struct tty_struct *tty, unsigned char ch); + int (*put_char)(struct tty_struct *tty, u8 ch); void (*flush_chars)(struct tty_struct *tty); unsigned int (*write_room)(struct tty_struct *tty); unsigned int (*chars_in_buffer)(struct tty_struct *tty); -- cgit v1.2.3 From 95713967ba52389f7cea75704d0cf048080ec218 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:15:03 +0200 Subject: tty: make tty_operations::write()'s count size_t Unify with the rest of the code. Use size_t for counts and ssize_t for retval. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230810091510.13006-30-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_driver.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index c5299d952e59..18beff0cec1a 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -72,8 +72,8 @@ struct serial_struct; * is closed for the last time freeing up the resources. This is * actually the second part of shutdown for routines that might sleep. * - * @write: ``int ()(struct tty_struct *tty, const unsigned char *buf, - * int count)`` + * @write: ``ssize_t ()(struct tty_struct *tty, const unsigned char *buf, + * size_t count)`` * * This routine is called by the kernel to write a series (@count) of * characters (@buf) to the @tty device. The characters may come from @@ -356,7 +356,7 @@ struct tty_operations { void (*close)(struct tty_struct * tty, struct file * filp); void (*shutdown)(struct tty_struct *tty); void (*cleanup)(struct tty_struct *tty); - int (*write)(struct tty_struct *tty, const u8 *buf, int count); + ssize_t (*write)(struct tty_struct *tty, const u8 *buf, size_t count); int (*put_char)(struct tty_struct *tty, u8 ch); void (*flush_chars)(struct tty_struct *tty); unsigned int (*write_room)(struct tty_struct *tty); -- cgit v1.2.3 From 49b8220cee4aa3d7aaaf42fd7f316b7f8fb710da Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Thu, 10 Aug 2023 11:15:05 +0200 Subject: tty: ldops: unify to u8 Some hooks in struct tty_ldisc_ops still reference buffers by 'unsigned char'. Unify to 'u8' as the rest of the tty layer does. Signed-off-by: "Jiri Slaby (SUSE)" Cc: Marcel Holtmann Cc: Johan Hedberg Cc: Luiz Augusto von Dentz Cc: Dmitry Torokhov Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20230810091510.13006-32-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_ldisc.h | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index a661d7df5497..af01e89074b2 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -90,8 +90,8 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * * Optional. * - * @read: [TTY] ``ssize_t ()(struct tty_struct *tty, struct file *file, - * unsigned char *buf, size_t nr)`` + * @read: [TTY] ``ssize_t ()(struct tty_struct *tty, struct file *file, u8 *buf, + * size_t nr)`` * * This function is called when the user requests to read from the @tty. * The line discipline will return whatever characters it has buffered up @@ -102,7 +102,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * Optional: %EIO unless provided. Can sleep. * * @write: [TTY] ``ssize_t ()(struct tty_struct *tty, struct file *file, - * const unsigned char *buf, size_t nr)`` + * const u8 *buf, size_t nr)`` * * This function is called when the user requests to write to the @tty. * The line discipline will deliver the characters to the low-level tty @@ -238,11 +238,10 @@ struct tty_ldisc_ops { int (*open)(struct tty_struct *tty); void (*close)(struct tty_struct *tty); void (*flush_buffer)(struct tty_struct *tty); - ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char *buf, size_t nr, - void **cookie, unsigned long offset); + ssize_t (*read)(struct tty_struct *tty, struct file *file, u8 *buf, + size_t nr, void **cookie, unsigned long offset); ssize_t (*write)(struct tty_struct *tty, struct file *file, - const unsigned char *buf, size_t nr); + const u8 *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, unsigned int cmd, unsigned long arg); int (*compat_ioctl)(struct tty_struct *tty, unsigned int cmd, -- cgit v1.2.3 From 01b8539655635288dcd46366806abfacbb9b1f6c Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 9 Aug 2023 22:05:56 +0800 Subject: bpf: Remove unused declaration bpf_link_new_file() Commit a3b80e107894 ("bpf: Allocate ID for bpf_link") removed the implementation but not the declaration. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230809140556.45836-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index db3fe5a61b05..cfabbcf47bdb 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2120,7 +2120,6 @@ void bpf_link_cleanup(struct bpf_link_primer *primer); void bpf_link_inc(struct bpf_link *link); void bpf_link_put(struct bpf_link *link); int bpf_link_new_fd(struct bpf_link *link); -struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd); struct bpf_link *bpf_link_get_from_fd(u32 ufd); struct bpf_link *bpf_link_get_curr_or_next(u32 *id); -- cgit v1.2.3 From c8afaa1b0f8bc93d013ab2ea6b9649958af3f1d3 Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Sat, 12 Aug 2023 18:15:54 +0200 Subject: locking: remove spin_lock_prefetch The only remaining consumer is new_inode, where it showed up in 2001 as commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1] with a changelog which does not mention it. Since then the line got only touched up to keep compiling. While it may have been of benefit back in the day, it is guaranteed to at best not get in the way in the multicore setting -- as the code performs *a lot* of work between the prefetch and actual lock acquire, any contention means the cacheline is already invalid by the time the routine calls spin_lock(). It adds spurious traffic, for short. On top of it prefetch is notoriously tricky to use for single-threaded purposes, making it questionable from the get go. As such, remove it. I admit upfront I did not see value in benchmarking this change, but I can do it if that is deemed appropriate. Removal from new_inode and of the entire thing are in the same patch as requested by Linus, so whatever weird looks can be directed at that guy. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1] Signed-off-by: Mateusz Guzik Signed-off-by: Linus Torvalds --- include/linux/prefetch.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/prefetch.h b/include/linux/prefetch.h index b83a3f944f28..b068e2e60939 100644 --- a/include/linux/prefetch.h +++ b/include/linux/prefetch.h @@ -25,11 +25,10 @@ struct page; prefetch() should be defined by the architecture, if not, the #define below provides a no-op define. - There are 3 prefetch() macros: + There are 2 prefetch() macros: prefetch(x) - prefetches the cacheline at "x" for read prefetchw(x) - prefetches the cacheline at "x" for write - spin_lock_prefetch(x) - prefetches the spinlock *x for taking there is also PREFETCH_STRIDE which is the architecure-preferred "lookahead" size for prefetching streamed operations. @@ -44,10 +43,6 @@ struct page; #define prefetchw(x) __builtin_prefetch(x,1) #endif -#ifndef ARCH_HAS_SPINLOCK_PREFETCH -#define spin_lock_prefetch(x) prefetchw(x) -#endif - #ifndef PREFETCH_STRIDE #define PREFETCH_STRIDE (4*L1_CACHE_BYTES) #endif -- cgit v1.2.3 From 59e09100836fdb618b107c37189d6001b5825872 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Fri, 11 Aug 2023 13:58:36 -0700 Subject: soc: qcom: aoss: Move length requirements from caller The existing implementation of qmp_send() requires the caller to provide a buffer which is of word-aligned. The underlying reason for this is that message ram only supports word accesses, but pushing this requirement onto the clients results in the same boiler plate code sprinkled in every call site. By using a temporary buffer in qmp_send() we can hide the underlying hardware limitations from the clients and allow them to pass their NUL-terminates C string directly. Signed-off-by: Bjorn Andersson Reviewed-by: Konrad Dybcio Link: https://lore.kernel.org/r/20230811205839.727373-2-quic_bjorande@quicinc.com Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/qcom_aoss.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/qcom_aoss.h b/include/linux/soc/qcom/qcom_aoss.h index 3c2a82e606f8..7a71406b6050 100644 --- a/include/linux/soc/qcom/qcom_aoss.h +++ b/include/linux/soc/qcom/qcom_aoss.h @@ -13,13 +13,13 @@ struct qmp; #if IS_ENABLED(CONFIG_QCOM_AOSS_QMP) -int qmp_send(struct qmp *qmp, const void *data, size_t len); +int qmp_send(struct qmp *qmp, const void *data); struct qmp *qmp_get(struct device *dev); void qmp_put(struct qmp *qmp); #else -static inline int qmp_send(struct qmp *qmp, const void *data, size_t len) +static inline int qmp_send(struct qmp *qmp, const void *data) { return -ENODEV; } -- cgit v1.2.3 From 8873d1e2f88afbe89c99d8f49f88934a2da2991f Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Fri, 11 Aug 2023 13:58:38 -0700 Subject: soc: qcom: aoss: Format string in qmp_send() The majority of callers to qmp_send() composes the message dynamically using some form of sprintf(), resulting in unnecessary complication and stack usage. By changing the interface of qmp_send() to take a format string and arguments, the duplicated composition of the commands can be moved to a single location. Reviewed-by: Konrad Dybcio Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20230811205839.727373-4-quic_bjorande@quicinc.com Signed-off-by: Bjorn Andersson --- include/linux/soc/qcom/qcom_aoss.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/qcom_aoss.h b/include/linux/soc/qcom/qcom_aoss.h index 7a71406b6050..7361ca028752 100644 --- a/include/linux/soc/qcom/qcom_aoss.h +++ b/include/linux/soc/qcom/qcom_aoss.h @@ -13,13 +13,13 @@ struct qmp; #if IS_ENABLED(CONFIG_QCOM_AOSS_QMP) -int qmp_send(struct qmp *qmp, const void *data); +int qmp_send(struct qmp *qmp, const char *fmt, ...); struct qmp *qmp_get(struct device *dev); void qmp_put(struct qmp *qmp); #else -static inline int qmp_send(struct qmp *qmp, const void *data) +static inline int qmp_send(struct qmp *qmp, const char *fmt, ...) { return -ENODEV; } -- cgit v1.2.3 From 83b5f0253b1ef352f4333c4fb2d24eff23045f6b Mon Sep 17 00:00:00 2001 From: Gabor Juhos Date: Fri, 11 Aug 2023 13:10:07 +0200 Subject: net: phy: Introduce PSGMII PHY interface mode The PSGMII interface is similar to QSGMII. The main difference is that the PSGMII interface combines five SGMII lines into a single link while in QSGMII only four lines are combined. Similarly to the QSGMII, this interface mode might also needs special handling within the MAC driver. It is commonly used by Qualcomm with their QCA807x PHY series and modern WiSoC-s. Add definitions for the PHY layer to allow to express this type of connection between the MAC and PHY. Signed-off-by: Gabor Juhos Signed-off-by: Robert Marko Signed-off-by: David S. Miller --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 3c1ceedd1b77..1351b802ffcf 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -110,6 +110,7 @@ extern const int phy_10gbit_features_array[1]; * @PHY_INTERFACE_MODE_XGMII: 10 gigabit media-independent interface * @PHY_INTERFACE_MODE_XLGMII:40 gigabit media-independent interface * @PHY_INTERFACE_MODE_MOCA: Multimedia over Coax + * @PHY_INTERFACE_MODE_PSGMII: Penta SGMII * @PHY_INTERFACE_MODE_QSGMII: Quad SGMII * @PHY_INTERFACE_MODE_TRGMII: Turbo RGMII * @PHY_INTERFACE_MODE_100BASEX: 100 BaseX @@ -147,6 +148,7 @@ typedef enum { PHY_INTERFACE_MODE_XGMII, PHY_INTERFACE_MODE_XLGMII, PHY_INTERFACE_MODE_MOCA, + PHY_INTERFACE_MODE_PSGMII, PHY_INTERFACE_MODE_QSGMII, PHY_INTERFACE_MODE_TRGMII, PHY_INTERFACE_MODE_100BASEX, @@ -254,6 +256,8 @@ static inline const char *phy_modes(phy_interface_t interface) return "xlgmii"; case PHY_INTERFACE_MODE_MOCA: return "moca"; + case PHY_INTERFACE_MODE_PSGMII: + return "psgmii"; case PHY_INTERFACE_MODE_QSGMII: return "qsgmii"; case PHY_INTERFACE_MODE_TRGMII: -- cgit v1.2.3 From 276e14e6c3993317257e1787e93b7166fbc30905 Mon Sep 17 00:00:00 2001 From: Illia Ostapyshyn Date: Tue, 13 Jun 2023 17:26:00 +0200 Subject: HID: input: Support devices sending Eraser without Invert Some digitizers (notably XP-Pen Artist 24) do not report the Invert usage when erasing. This causes the device to be permanently stuck with the BTN_TOOL_RUBBER tool after sending Eraser, as Invert is the only usage that can release the tool. In this state, Touch and Inrange are no longer reported to userspace, rendering the pen unusable. Prior to commit 87562fcd1342 ("HID: input: remove the need for HID_QUIRK_INVERT"), BTN_TOOL_RUBBER was never set and Eraser events were simply translated into BTN_TOUCH without causing an inconsistent state. Introduce HID_QUIRK_NOINVERT for such digitizers and detect them during hidinput_configure_usage(). This quirk causes the tool to be released as soon as Eraser is reported as not set. Set BTN_TOOL_RUBBER in input->keybit when mapping Eraser. Fixes: 87562fcd1342 ("HID: input: remove the need for HID_QUIRK_INVERT") Co-developed-by: Nils Fuhler Signed-off-by: Nils Fuhler Signed-off-by: Illia Ostapyshyn Signed-off-by: Jiri Kosina --- include/linux/hid.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 39e21e3815ad..9e8f87800e21 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -360,6 +360,7 @@ struct hid_item { #define HID_QUIRK_NO_OUTPUT_REPORTS_ON_INTR_EP BIT(18) #define HID_QUIRK_HAVE_SPECIAL_DRIVER BIT(19) #define HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE BIT(20) +#define HID_QUIRK_NOINVERT BIT(21) #define HID_QUIRK_FULLSPEED_INTERVAL BIT(28) #define HID_QUIRK_NO_INIT_REPORTS BIT(29) #define HID_QUIRK_NO_IGNORE BIT(30) -- cgit v1.2.3 From 574d06ceb88fb735a7d88c22e6531f4849aada51 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 18 Jun 2023 11:11:58 +0200 Subject: HID: Reorder fields in 'struct hid_input' Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct hid_input' from 72 to 64 bytes. It saves a few bytes of memory and is more cache-line friendly. Signed-off-by: Christophe JAILLET Signed-off-by: Jiri Kosina --- include/linux/hid.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 9e8f87800e21..be9e16cb8bd1 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -556,9 +556,9 @@ struct hid_input { struct hid_report *report; struct input_dev *input; const char *name; - bool registered; struct list_head reports; /* the list of reports */ unsigned int application; /* application usage for this input */ + bool registered; }; enum hid_type { -- cgit v1.2.3 From fadfcf36016100dc9da0f1ab062c758e41f76b11 Mon Sep 17 00:00:00 2001 From: Ivan Orlov Date: Tue, 20 Jun 2023 20:31:42 +0200 Subject: HID: roccat: make all 'class' structures const Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: Stefan Achatz Cc: Jiri Kosina Cc: Benjamin Tissoires Cc: linux-input@vger.kernel.org Suggested-by: Greg Kroah-Hartman Signed-off-by: Ivan Orlov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jiri Kosina --- include/linux/hid-roccat.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid-roccat.h b/include/linux/hid-roccat.h index 3214fb0815fc..753654fff07f 100644 --- a/include/linux/hid-roccat.h +++ b/include/linux/hid-roccat.h @@ -16,7 +16,7 @@ #ifdef __KERNEL__ -int roccat_connect(struct class *klass, struct hid_device *hid, +int roccat_connect(const struct class *klass, struct hid_device *hid, int report_size); void roccat_disconnect(int minor); int roccat_report_event(int minor, u8 const *data); -- cgit v1.2.3 From e062abaec65b970c4d7ecf26bc1558a1b6f92970 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 2 Aug 2023 13:57:03 +0200 Subject: super: remove get_tree_single_reconf() The get_tree_single_reconf() helper isn't used anywhere. Remove it. Reviewed-by: Josef Bacik Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Aleksa Sarai Message-Id: <20230802-vfs-super-exclusive-v2-1-95dc4e41b870@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs_context.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index ff6341e09925..851b3fe2549c 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -150,9 +150,6 @@ extern int get_tree_nodev(struct fs_context *fc, extern int get_tree_single(struct fs_context *fc, int (*fill_super)(struct super_block *sb, struct fs_context *fc)); -extern int get_tree_single_reconf(struct fs_context *fc, - int (*fill_super)(struct super_block *sb, - struct fs_context *fc)); extern int get_tree_keyed(struct fs_context *fc, int (*fill_super)(struct super_block *sb, struct fs_context *fc), -- cgit v1.2.3 From 22ed7ecdaefe0cac0c6e6295e83048af60435b13 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 2 Aug 2023 13:57:06 +0200 Subject: fs: add FSCONFIG_CMD_CREATE_EXCL Summary ======= This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to implement something like mount -t ext4 --exclusive /dev/sda /B which fails if a superblock for the requested filesystem does already exist: Before this patch ----------------- $ sudo ./move-mount -f xfs -o source=/dev/sda4 /A Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4) $ sudo ./move-mount -f xfs -o source=/dev/sda4 /B Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4) After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL -------------------------------------------------------------------------- $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4) $ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4) Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed Details ======= As mentioned on the list (cf. [1]-[3]) mount requests like mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new superblock has been created and mounted or an existing superblock has been reused and a bind-mount has been created. This becomes clear in the following example where two processes create the same mount for the same block device: P1 P2 fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000"); // wins and creates superblock fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) // finds compatible superblock of P1 // spins until P1 sets SB_BORN and grabs a reference fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs); move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B") Not just does P2 get a bind-mount but the mount options that P2 requestes are silently ignored. The VFS itself doesn't, can't and shouldn't enforce filesystem specific mount option compatibility. It only enforces incompatibility for read-only <-> read-write transitions: mount -t ext4 /dev/sda /A mount -t ext4 -o ro /dev/sda /B The read-only request will fail with EBUSY as the VFS can't just silently transition a superblock from read-write to read-only or vica versa without risking security issues. To userspace this silent superblock reuse can become a security issue in because there is currently no straightforward way for userspace to know that they did indeed manage to create a new superblock and didn't just reuse an existing one. This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that returns EBUSY if an existing superblock would be reused. Userspace that needs to be sure that it did create a new superblock with the requested mount options can request superblock creation using this command. If the command succeeds they can be sure that they did create a new superblock with the requested mount options. This requires the new mount api. With the old mount api it would be necessary to plumb this through every legacy filesystem's file_system_type->mount() method. If they want this feature they are most welcome to switch to the new mount api. Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on each high-level superblock creation helper: (1) get_tree_nodev() Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and FSCONFIG_CMD_CREATE_EXCL are equivalent. The binderfs or overlayfs filesystems are examples. (4) get_tree_keyed() Finds an existing superblock based on sb->s_fs_info. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. The mqueue or nfsd filesystems are examples. (2) get_tree_bdev() This effectively works like get_tree_keyed(). The ext4 or xfs filesystems are examples. (3) get_tree_single() Only one superblock of this filesystem type can ever exist. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY. The securityfs or configfs filesystems are examples. Note that some single-instance filesystems never destroy the superblock once it has been created during the first mount. For example, if securityfs has been mounted at least onces then the created superblock will never be destroyed again as long as there is still an LSM making use it. Consequently, even if securityfs is unmounted and the superblock seemingly destroyed it really isn't which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting reusing an existing superblock. This is acceptable thugh since special purpose filesystems such as this shouldn't have a need to use FSCONFIG_CMD_CREATE_EXCL anyway and if they do it's probably to make sure that mount options aren't ignored. Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on filesystems that make use of the low-level sget_fc() helper directly. They're all effectively variants on get_tree_keyed(), get_tree_bdev(), or get_tree_nodev(): (5) mtd_get_sb() Similar logic to get_tree_keyed(). (6) afs_get_tree() Similar logic to get_tree_keyed(). (7) ceph_get_tree() Similar logic to get_tree_keyed(). Already explicitly allows forcing the allocation of a new superblock via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev(). (8) fuse_get_tree_submount() Similar logic to get_tree_nodev(). (9) fuse_get_tree() Forces reuse of existing FUSE superblock. Forces reuse of existing superblock if passed in file refers to an existing FUSE connection. If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd referring to an existing FUSE connections this would cause the superblock reusal to fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. (10) fuse_get_tree() -> get_tree_nodev() Same logic as in get_tree_nodev(). (11) fuse_get_tree() -> get_tree_bdev() Same logic as in get_tree_bdev(). (12) virtio_fs_get_tree() Same logic as get_tree_keyed(). (13) gfs2_meta_get_tree() Forces reuse of existing gfs2 superblock. Mounting gfs2meta enforces that a gf2s superblock must already exist. If not, it will error out. Consequently, mounting gfs2meta with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified. (14) kernfs_get_tree() Similar logic to get_tree_keyed(). (15) nfs_get_tree_common() Similar logic to get_tree_keyed(). Already explicitly allows forcing the allocation of a new superblock via NFS_MOUNT_UNSHARED. This effectively turns it into get_tree_nodev(). Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner Reviewed-by: Josef Bacik Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Aleksa Sarai Message-Id: <20230802-vfs-super-exclusive-v2-4-95dc4e41b870@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs_context.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index 851b3fe2549c..a33a3b1d9016 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -109,6 +109,7 @@ struct fs_context { bool need_free:1; /* Need to call ops->free() */ bool global:1; /* Goes into &init_user_ns */ bool oldapi:1; /* Coming from mount(2) */ + bool exclusive:1; /* create new superblock, reject existing one */ }; struct fs_context_operations { -- cgit v1.2.3 From 87382eaddeed3d32afc9e4524f9488eb1bc999c2 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 10 Aug 2023 16:19:22 +0200 Subject: PCI/sysfs: Move declarations to linux/pci.h A couple of architectures build the __weak versions of pci_create_resource_files() and pci_remove_resource_files() but don't have prototypes for these, which causes warnings: drivers/pci/pci-sysfs.c:1253:12: error: no previous prototype for 'pci_create_resource_files' [-Werror=missing-prototypes] 1253 | int __weak pci_create_resource_files(struct pci_dev *dev) { return 0; } drivers/pci/pci-sysfs.c:1254:13: error: no previous prototype for 'pci_remove_resource_files' [-Werror=missing-prototypes] 1254 | void __weak pci_remove_resource_files(struct pci_dev *dev) { return; } Move the prototypes from alpha architecture into the global header to avoid these warnings for all of them. Link: https://lore.kernel.org/r/20230810141947.1236730-5-arnd@kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 0ff7500772e6..b4eae9f2e1fc 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2260,6 +2260,11 @@ int pcibios_alloc_irq(struct pci_dev *dev); void pcibios_free_irq(struct pci_dev *dev); resource_size_t pcibios_default_alignment(void); +#if !defined(HAVE_PCI_MMAP) && !defined(ARCH_GENERIC_PCI_MMAP_RESOURCE) +extern int pci_create_resource_files(struct pci_dev *dev); +extern void pci_remove_resource_files(struct pci_dev *dev); +#endif + #if defined(CONFIG_PCI_MMCONFIG) || defined(CONFIG_ACPI_MCFG) void __init pci_mmcfg_early_init(void); void __init pci_mmcfg_late_init(void); -- cgit v1.2.3 From a9f168e4c6e1f623dcf2640b9d76a4f61b9731e5 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 31 May 2023 13:50:21 +0300 Subject: net/mlx5: Check with FW that sync reset completed successfully Even if the PF driver had no error on his part of the sync reset flow, the firmware can see wider picture as it syncs all the PFs in the flow. So add at end of sync reset flow check with firmware by reading MFRL register and initialization segment that the flow had no issue from firmware point of view too. Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 87fd6f9ed82c..9aed7e9b9f29 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -10858,8 +10858,9 @@ enum { MLX5_MFRL_REG_RESET_STATE_IDLE = 0, MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION = 1, MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS = 2, - MLX5_MFRL_REG_RESET_STATE_TIMEOUT = 3, + MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT = 3, MLX5_MFRL_REG_RESET_STATE_NACK = 4, + MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT = 5, }; enum { -- cgit v1.2.3 From 0b4eb603d635ca47c1c372f69b4b96672e4c2c05 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Sun, 2 Jan 2022 14:57:39 +0200 Subject: net/mlx5: Remove unused CAPs mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 17 +---------------- include/linux/mlx5/mlx5_ifc.h | 43 ------------------------------------------- 2 files changed, 1 insertion(+), 59 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 80cc12a9a531..5041965250f6 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1208,9 +1208,7 @@ enum mlx5_cap_type { MLX5_CAP_FLOW_TABLE, MLX5_CAP_ESWITCH_FLOW_TABLE, MLX5_CAP_ESWITCH, - MLX5_CAP_RESERVED, - MLX5_CAP_VECTOR_CALC, - MLX5_CAP_QOS, + MLX5_CAP_QOS = 0xc, MLX5_CAP_DEBUG, MLX5_CAP_RESERVED_14, MLX5_CAP_DEV_MEM, @@ -1220,7 +1218,6 @@ enum mlx5_cap_type { MLX5_CAP_DEV_EVENT = 0x14, MLX5_CAP_IPSEC, MLX5_CAP_CRYPTO = 0x1a, - MLX5_CAP_DEV_SHAMPO = 0x1d, MLX5_CAP_MACSEC = 0x1f, MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, @@ -1239,7 +1236,6 @@ enum mlx5_pcam_feature_groups { enum mlx5_mcam_reg_groups { MLX5_MCAM_REGS_FIRST_128 = 0x0, - MLX5_MCAM_REGS_0x9080_0x90FF = 0x1, MLX5_MCAM_REGS_0x9100_0x917F = 0x2, MLX5_MCAM_REGS_NUM = 0x3, }; @@ -1416,10 +1412,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_ODP_MAX(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->max, cap) -#define MLX5_CAP_VECTOR_CALC(mdev, cap) \ - MLX5_GET(vector_calc_cap, \ - mdev->caps.hca[MLX5_CAP_VECTOR_CALC]->cur, cap) - #define MLX5_CAP_QOS(mdev, cap)\ MLX5_GET(qos_cap, mdev->caps.hca[MLX5_CAP_QOS]->cur, cap) @@ -1436,10 +1428,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \ mng_access_reg_cap_mask.access_regs.reg) -#define MLX5_CAP_MCAM_REG1(mdev, reg) \ - MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \ - mng_access_reg_cap_mask.access_regs1.reg) - #define MLX5_CAP_MCAM_REG2(mdev, reg) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \ mng_access_reg_cap_mask.access_regs2.reg) @@ -1485,9 +1473,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_CRYPTO(mdev, cap)\ MLX5_GET(crypto_cap, (mdev)->caps.hca[MLX5_CAP_CRYPTO]->cur, cap) -#define MLX5_CAP_DEV_SHAMPO(mdev, cap)\ - MLX5_GET(shampo_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_SHAMPO], cap) - #define MLX5_CAP_MACSEC(mdev, cap)\ MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 9aed7e9b9f29..08dcb1f43be7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1314,33 +1314,6 @@ struct mlx5_ifc_odp_cap_bits { u8 reserved_at_120[0x6E0]; }; -struct mlx5_ifc_calc_op { - u8 reserved_at_0[0x10]; - u8 reserved_at_10[0x9]; - u8 op_swap_endianness[0x1]; - u8 op_min[0x1]; - u8 op_xor[0x1]; - u8 op_or[0x1]; - u8 op_and[0x1]; - u8 op_max[0x1]; - u8 op_add[0x1]; -}; - -struct mlx5_ifc_vector_calc_cap_bits { - u8 calc_matrix[0x1]; - u8 reserved_at_1[0x1f]; - u8 reserved_at_20[0x8]; - u8 max_vec_count[0x8]; - u8 reserved_at_30[0xd]; - u8 max_chunk_size[0x3]; - struct mlx5_ifc_calc_op calc0; - struct mlx5_ifc_calc_op calc1; - struct mlx5_ifc_calc_op calc2; - struct mlx5_ifc_calc_op calc3; - - u8 reserved_at_c0[0x720]; -}; - struct mlx5_ifc_tls_cap_bits { u8 tls_1_2_aes_gcm_128[0x1]; u8 tls_1_3_aes_gcm_128[0x1]; @@ -3435,20 +3408,6 @@ struct mlx5_ifc_roce_addr_layout_bits { u8 reserved_at_e0[0x20]; }; -struct mlx5_ifc_shampo_cap_bits { - u8 reserved_at_0[0x3]; - u8 shampo_log_max_reservation_size[0x5]; - u8 reserved_at_8[0x3]; - u8 shampo_log_min_reservation_size[0x5]; - u8 shampo_min_mss_size[0x10]; - - u8 reserved_at_20[0x3]; - u8 shampo_max_log_headers_entry_size[0x5]; - u8 reserved_at_28[0x18]; - - u8 reserved_at_40[0x7c0]; -}; - struct mlx5_ifc_crypto_cap_bits { u8 reserved_at_0[0x3]; u8 synchronize_dek[0x1]; @@ -3484,14 +3443,12 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap; struct mlx5_ifc_e_switch_cap_bits e_switch_cap; struct mlx5_ifc_port_selection_cap_bits port_selection_cap; - struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap; struct mlx5_ifc_qos_cap_bits qos_cap; struct mlx5_ifc_debug_cap_bits debug_cap; struct mlx5_ifc_fpga_cap_bits fpga_cap; struct mlx5_ifc_tls_cap_bits tls_cap; struct mlx5_ifc_device_mem_cap_bits device_mem_cap; struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; - struct mlx5_ifc_shampo_cap_bits shampo_cap; struct mlx5_ifc_macsec_cap_bits macsec_cap; struct mlx5_ifc_crypto_cap_bits crypto_cap; u8 reserved_at_0[0x8000]; -- cgit v1.2.3 From a41cb59117fa12ee17cda5e5c781eecfcb15dc0f Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 11 Jul 2023 15:56:08 +0300 Subject: net/mlx5: Remove unused MAX HCA capabilities Each device cap has two modes: MAX and CUR. The driver maintains a cache of both modes of the capabilities. For most device caps, the MAX cap mode is never used. Hence, remove all driver queries of the MAX mode of the said caps as well as their helper MACROs. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 52 --------------------------------------------- include/linux/mlx5/driver.h | 1 - 2 files changed, 53 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 5041965250f6..93399802ba77 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1275,10 +1275,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->cur, cap) -#define MLX5_CAP_ETH_MAX(mdev, cap) \ - MLX5_GET(per_protocol_networking_offload_caps,\ - mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->max, cap) - #define MLX5_CAP_IPOIB_ENHANCED(mdev, cap) \ MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_IPOIB_ENHANCED_OFFLOADS]->cur, cap) @@ -1301,77 +1297,40 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP64_FLOWTABLE(mdev, cap) \ MLX5_GET64(flow_table_nic_cap, (mdev)->caps.hca[MLX5_CAP_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_nic_cap, mdev->caps.hca[MLX5_CAP_FLOW_TABLE]->max, cap) - #define MLX5_CAP_FLOWTABLE_NIC_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive.cap) -#define MLX5_CAP_FLOWTABLE_NIC_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive.cap) - #define MLX5_CAP_FLOWTABLE_NIC_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit.cap) -#define MLX5_CAP_FLOWTABLE_NIC_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_rdma.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_rdma.cap) - #define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \ MLX5_GET(flow_table_eswitch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_eswitch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->max, cap) - #define MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_nic_esw_fdb.cap) -#define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap) - #define MLX5_CAP_ESW_EGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_egress.cap) -#define MLX5_CAP_ESW_EGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_egress.cap) - #define MLX5_CAP_ESW_INGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_ingress.cap) -#define MLX5_CAP_ESW_INGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_ingress.cap) - #define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, ft_field_support_2_esw_fdb.cap) -#define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, ft_field_support_2_esw_fdb.cap) - #define MLX5_CAP_ESW(mdev, cap) \ MLX5_GET(e_switch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH]->cur, cap) @@ -1380,10 +1339,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET64(flow_table_eswitch_cap, \ (mdev)->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_MAX(mdev, cap) \ - MLX5_GET(e_switch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH]->max, cap) - #define MLX5_CAP_PORT_SELECTION(mdev, cap) \ MLX5_GET(port_selection_cap, \ mdev->caps.hca[MLX5_CAP_PORT_SELECTION]->cur, cap) @@ -1396,16 +1351,9 @@ enum mlx5_qcam_feature_groups { MLX5_GET(adv_virtualization_cap, \ mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap) -#define MLX5_CAP_ADV_VIRTUALIZATION_MAX(mdev, cap) \ - MLX5_GET(adv_virtualization_cap, \ - mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->max, cap) - #define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \ MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap) -#define MLX5_CAP_FLOWTABLE_PORT_SELECTION_MAX(mdev, cap) \ - MLX5_CAP_PORT_SELECTION_MAX(mdev, flow_table_properties_port_selection.cap) - #define MLX5_CAP_ODP(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->cur, cap) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e1c7e502a4fc..c9d82e74daaa 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1022,7 +1022,6 @@ bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev); -int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); -- cgit v1.2.3 From 7ba3792718709d410be5d971732b9251cbda67b6 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sun, 13 Aug 2023 14:26:34 -0400 Subject: block: Add some exports for bcachefs - bio_set_pages_dirty(), bio_check_pages_dirty() - dio path - blk_status_to_str() - error messages - bio_add_folio() - this should definitely be exported for everyone, it's the modern version of bio_add_page() Signed-off-by: Kent Overstreet Cc: linux-block@vger.kernel.org Cc: Jens Axboe Signed-off-by: Kent Overstreet Link: https://lore.kernel.org/r/20230813182636.2966159-2-kent.overstreet@linux.dev Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f5371b8482c..4feed1fc141f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -847,6 +847,7 @@ extern const char *blk_op_str(enum req_op op); int blk_status_to_errno(blk_status_t status); blk_status_t errno_to_blk_status(int errno); +const char *blk_status_to_str(blk_status_t status); /* only poll the hardware once, don't continue until a completion was found */ #define BLK_POLL_ONESHOT (1 << 0) -- cgit v1.2.3 From 649f070e69739d22c57c22dbce0788b72cd93fac Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sun, 13 Aug 2023 14:26:36 -0400 Subject: block: Bring back zero_fill_bio_iter This reverts 6f822e1b5d9dda3d20e87365de138046e3baa03a - this helper is used by bcachefs. Signed-off-by: Kent Overstreet Cc: Jens Axboe Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20230813182636.2966159-4-kent.overstreet@linux.dev Signed-off-by: Jens Axboe --- include/linux/bio.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index c4f5b5228105..8b99210eb7fb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -488,7 +488,12 @@ extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter, extern void bio_copy_data(struct bio *dst, struct bio *src); extern void bio_free_pages(struct bio *bio); void guard_bio_eod(struct bio *bio); -void zero_fill_bio(struct bio *bio); +void zero_fill_bio_iter(struct bio *bio, struct bvec_iter iter); + +static inline void zero_fill_bio(struct bio *bio) +{ + zero_fill_bio_iter(bio, bio->bi_iter); +} static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { -- cgit v1.2.3 From 67d5404d274376890d6d095a10e6565854918f8e Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 19 Jul 2023 15:50:07 -0700 Subject: torture: Add a kthread-creation callback to _torture_create_kthread() This commit adds a kthread-creation callback to the _torture_create_kthread() function, which allows callers of a new torture_create_kthread_cb() macro to specify a function to be invoked after the kthread is created but before it is awakened for the first time. Signed-off-by: Paul E. McKenney Cc: Dietmar Eggemann Cc: Josh Triplett Cc: Juri Lelli Cc: Valentin Schneider Cc: Dietmar Eggemann Cc: kernel-team@android.com Reviewed-by: Joel Fernandes (Google) Acked-by: John Stultz --- include/linux/torture.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/torture.h b/include/linux/torture.h index 7038104463e4..bb466eec01e4 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -108,12 +108,15 @@ bool torture_must_stop(void); bool torture_must_stop_irq(void); void torture_kthread_stopping(char *title); int _torture_create_kthread(int (*fn)(void *arg), void *arg, char *s, char *m, - char *f, struct task_struct **tp); + char *f, struct task_struct **tp, void (*cbf)(struct task_struct *tp)); void _torture_stop_kthread(char *m, struct task_struct **tp); #define torture_create_kthread(n, arg, tp) \ _torture_create_kthread(n, (arg), #n, "Creating " #n " task", \ - "Failed to create " #n, &(tp)) + "Failed to create " #n, &(tp), NULL) +#define torture_create_kthread_cb(n, arg, tp, cbf) \ + _torture_create_kthread(n, (arg), #n, "Creating " #n " task", \ + "Failed to create " #n, &(tp), cbf) #define torture_stop_kthread(n, tp) \ _torture_stop_kthread("Stopping " #n " task", &(tp)) -- cgit v1.2.3 From bb48cf1679d294d4fd3bfaa88289ed9004cbb025 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Mon, 14 Aug 2023 13:59:08 -0500 Subject: bpf: Document struct bpf_struct_ops fields Subsystems that want to implement a struct bpf_struct_ops structure to enable struct_ops maps must currently reverse engineer how the structure works. Given that this is meant to be a way for subsystem maintainers to extend their subsystems using BPF, let's document it to make it a bit easier on them. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230814185908.700553-3-void@manifault.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb..eced6400f778 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1550,6 +1550,53 @@ struct bpf_struct_ops_value; struct btf_member; #define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64 +/** + * struct bpf_struct_ops - A structure of callbacks allowing a subsystem to + * define a BPF_MAP_TYPE_STRUCT_OPS map type composed + * of BPF_PROG_TYPE_STRUCT_OPS progs. + * @verifier_ops: A structure of callbacks that are invoked by the verifier + * when determining whether the struct_ops progs in the + * struct_ops map are valid. + * @init: A callback that is invoked a single time, and before any other + * callback, to initialize the structure. A nonzero return value means + * the subsystem could not be initialized. + * @check_member: When defined, a callback invoked by the verifier to allow + * the subsystem to determine if an entry in the struct_ops map + * is valid. A nonzero return value means that the map is + * invalid and should be rejected by the verifier. + * @init_member: A callback that is invoked for each member of the struct_ops + * map to allow the subsystem to initialize the member. A nonzero + * value means the member could not be initialized. This callback + * is exclusive with the @type, @type_id, @value_type, and + * @value_id fields. + * @reg: A callback that is invoked when the struct_ops map has been + * initialized and is being attached to. Zero means the struct_ops map + * has been successfully registered and is live. A nonzero return value + * means the struct_ops map could not be registered. + * @unreg: A callback that is invoked when the struct_ops map should be + * unregistered. + * @update: A callback that is invoked when the live struct_ops map is being + * updated to contain new values. This callback is only invoked when + * the struct_ops map is loaded with BPF_F_LINK. If not defined, the + * it is assumed that the struct_ops map cannot be updated. + * @validate: A callback that is invoked after all of the members have been + * initialized. This callback should perform static checks on the + * map, meaning that it should either fail or succeed + * deterministically. A struct_ops map that has been validated may + * not necessarily succeed in being registered if the call to @reg + * fails. For example, a valid struct_ops map may be loaded, but + * then fail to be registered due to there being another active + * struct_ops map on the system in the subsystem already. For this + * reason, if this callback is not defined, the check is skipped as + * the struct_ops map will have final verification performed in + * @reg. + * @type: BTF type. + * @value_type: Value type. + * @name: The name of the struct bpf_struct_ops object. + * @func_models: Func models + * @type_id: BTF type id. + * @value_id: BTF value id. + */ struct bpf_struct_ops { const struct bpf_verifier_ops *verifier_ops; int (*init)(struct btf *btf); -- cgit v1.2.3 From d80a8f1b58c2bc8d7c6bfb65401ea4f7ec8cddc2 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 8 Aug 2023 07:34:20 -0400 Subject: vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing When NFS superblocks are created by automounting, their LSM parameters aren't set in the fs_context struct prior to sget_fc() being called, leading to failure to match existing superblocks. This bug leads to messages like the following appearing in dmesg when fscache is enabled: NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1) Fix this by adding a new LSM hook to load fc->security for submount creation. Signed-off-by: David Howells Signed-off-by: Jeff Layton Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5 Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.") Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode") Tested-by: Jeff Layton Acked-by: Casey Schaufler Acked-by: "Christian Brauner (Microsoft)" Acked-by: Paul Moore Reviewed-by: Jeff Layton Message-Id: <20230808-master-v9-1-e0ecde888221@kernel.org> Signed-off-by: Christian Brauner --- include/linux/lsm_hook_defs.h | 1 + include/linux/security.h | 6 ++++++ 2 files changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 7308a1a7599b..af796986baee 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -54,6 +54,7 @@ LSM_HOOK(int, 0, bprm_creds_from_file, struct linux_binprm *bprm, struct file *f LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm) +LSM_HOOK(int, 0, fs_context_submount, struct fs_context *fc, struct super_block *reference) LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc, struct fs_context *src_sc) LSM_HOOK(int, -ENOPARAM, fs_context_parse_param, struct fs_context *fc, diff --git a/include/linux/security.h b/include/linux/security.h index 32828502f09e..bac98ea18f78 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -293,6 +293,7 @@ int security_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file); int security_bprm_check(struct linux_binprm *bprm); void security_bprm_committing_creds(struct linux_binprm *bprm); void security_bprm_committed_creds(struct linux_binprm *bprm); +int security_fs_context_submount(struct fs_context *fc, struct super_block *reference); int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc); int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param); int security_sb_alloc(struct super_block *sb); @@ -629,6 +630,11 @@ static inline void security_bprm_committed_creds(struct linux_binprm *bprm) { } +static inline int security_fs_context_submount(struct fs_context *fc, + struct super_block *reference) +{ + return 0; +} static inline int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc) { -- cgit v1.2.3 From 45e0d4b95b65fedd18aa3e1b6571e16a2e6e0aa9 Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Fri, 11 Aug 2023 21:48:14 +0200 Subject: vfs: fix up the assert in i_readcount_dec Drops a race where 2 threads could spot a positive value and both proceed to dec to -1, without reporting anything. Signed-off-by: Mateusz Guzik Message-Id: <20230811194814.1612336-1-mjguzik@gmail.com> Signed-off-by: Christian Brauner --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 11055d00fab8..57e2a0a9eea1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2607,8 +2607,7 @@ static inline bool inode_is_open_for_write(const struct inode *inode) #if defined(CONFIG_IMA) || defined(CONFIG_FILE_LOCKING) static inline void i_readcount_dec(struct inode *inode) { - BUG_ON(!atomic_read(&inode->i_readcount)); - atomic_dec(&inode->i_readcount); + BUG_ON(atomic_dec_return(&inode->i_readcount) < 0); } static inline void i_readcount_inc(struct inode *inode) { -- cgit v1.2.3 From 0242737dc4eb9f6e9a5ea594b3f93efa0b12f28d Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Mon, 14 Aug 2023 20:40:12 +0800 Subject: perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09 Some HiSilicon SMMU PMCG suffers the erratum 162001900 that the PMU disable control sometimes fail to disable the counters. This will lead to error or inaccurate data since before we enable the counters the counter's still counting for the event used in last perf session. This patch tries to fix this by hardening the global disable process. Before disable the PMU, writing an invalid event type (0xffff) to focibly stop the counters. Correspondingly restore each events on pmu::pmu_enable(). Signed-off-by: Yicong Yang Link: https://lore.kernel.org/r/20230814124012.58013-1-yangyicong@huawei.com Signed-off-by: Will Deacon --- include/linux/acpi_iort.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index ee7cb6aaff71..1cb65592c95d 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -21,6 +21,7 @@ */ #define IORT_SMMU_V3_PMCG_GENERIC 0x00000000 /* Generic SMMUv3 PMCG */ #define IORT_SMMU_V3_PMCG_HISI_HIP08 0x00000001 /* HiSilicon HIP08 PMCG */ +#define IORT_SMMU_V3_PMCG_HISI_HIP09 0x00000002 /* HiSilicon HIP09 PMCG */ int iort_register_domain_token(int trans_id, phys_addr_t base, struct fwnode_handle *fw_node); -- cgit v1.2.3 From 8e4672d6f902d5c4db1e87e8aa9f530149d85bc6 Mon Sep 17 00:00:00 2001 From: Khadija Kamran Date: Sat, 12 Aug 2023 20:31:08 +0500 Subject: lsm: constify the 'file' parameter in security_binder_transfer_file() SELinux registers the implementation for the "binder_transfer_file" hook. Looking at the function implementation we observe that the parameter "file" is not changing. Mark the "file" parameter of LSM hook security_binder_transfer_file() as "const" since it will not be changing in the LSM hook. Signed-off-by: Khadija Kamran [PM: subject line whitespace fix] Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 540caa703573..4bdddb52a8fe 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -32,7 +32,7 @@ LSM_HOOK(int, 0, binder_transaction, const struct cred *from, LSM_HOOK(int, 0, binder_transfer_binder, const struct cred *from, const struct cred *to) LSM_HOOK(int, 0, binder_transfer_file, const struct cred *from, - const struct cred *to, struct file *file) + const struct cred *to, const struct file *file) LSM_HOOK(int, 0, ptrace_access_check, struct task_struct *child, unsigned int mode) LSM_HOOK(int, 0, ptrace_traceme, struct task_struct *parent) diff --git a/include/linux/security.h b/include/linux/security.h index 7665f56d920a..dcb3604ffab8 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -269,7 +269,7 @@ int security_binder_transaction(const struct cred *from, int security_binder_transfer_binder(const struct cred *from, const struct cred *to); int security_binder_transfer_file(const struct cred *from, - const struct cred *to, struct file *file); + const struct cred *to, const struct file *file); int security_ptrace_access_check(struct task_struct *child, unsigned int mode); int security_ptrace_traceme(struct task_struct *parent); int security_capget(const struct task_struct *target, @@ -538,7 +538,7 @@ static inline int security_binder_transfer_binder(const struct cred *from, static inline int security_binder_transfer_file(const struct cred *from, const struct cred *to, - struct file *file) + const struct file *file) { return 0; } -- cgit v1.2.3 From 7a0fd5e1678505534573b3c14c6ff69ed8592596 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Fri, 11 Aug 2023 17:18:38 +0200 Subject: compiler_types: Introduce the Clang __preserve_most function attribute [1]: "On X86-64 and AArch64 targets, this attribute changes the calling convention of a function. The preserve_most calling convention attempts to make the code in the caller as unintrusive as possible. This convention behaves identically to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This alleviates the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn't apply for values returned in callee-saved registers. * On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Floating-point registers (XMMs/YMMs) are not preserved and need to be saved by the caller. * On AArch64 the callee preserve all general purpose registers, except x0-X8 and X16-X18." [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most Introduce the attribute to compiler_types.h as __preserve_most. Use of this attribute results in better code generation for calls to very rarely called functions, such as error-reporting functions, or rarely executed slow paths. Beware that the attribute conflicts with instrumentation calls inserted on function entry which do not use __preserve_most themselves. Notably, function tracing which assumes the normal C calling convention for the given architecture. Where the attribute is supported, __preserve_most will imply notrace. It is recommended to restrict use of the attribute to functions that should or already disable tracing. Note: The additional preprocessor check against architecture should not be necessary if __has_attribute() only returns true where supported; also see https://github.com/ClangBuiltLinux/linux/issues/1908. But until __has_attribute() does the right thing, we also guard by known-supported architectures to avoid build warnings on other architectures. The attribute may be supported by a future GCC version (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899). Signed-off-by: Marco Elver Reviewed-by: Miguel Ojeda Reviewed-by: Nick Desaulniers Acked-by: "Steven Rostedt (Google)" Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20230811151847.1594958-1-elver@google.com Signed-off-by: Kees Cook --- include/linux/compiler_types.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 547ea1ff806e..c523c6683789 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -106,6 +106,34 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } #define __cold #endif +/* + * On x86-64 and arm64 targets, __preserve_most changes the calling convention + * of a function to make the code in the caller as unintrusive as possible. This + * convention behaves identically to the C calling convention on how arguments + * and return values are passed, but uses a different set of caller- and callee- + * saved registers. + * + * The purpose is to alleviates the burden of saving and recovering a large + * register set before and after the call in the caller. This is beneficial for + * rarely taken slow paths, such as error-reporting functions that may be called + * from hot paths. + * + * Note: This may conflict with instrumentation inserted on function entry which + * does not use __preserve_most or equivalent convention (if in assembly). Since + * function tracing assumes the normal C calling convention, where the attribute + * is supported, __preserve_most implies notrace. It is recommended to restrict + * use of the attribute to functions that should or already disable tracing. + * + * Optional: not supported by gcc. + * + * clang: https://clang.llvm.org/docs/AttributeReference.html#preserve-most + */ +#if __has_attribute(__preserve_most__) && (defined(CONFIG_X86_64) || defined(CONFIG_ARM64)) +# define __preserve_most notrace __attribute__((__preserve_most__)) +#else +# define __preserve_most +#endif + /* Builtins */ /* -- cgit v1.2.3 From b16c42c8fde808b4f047d94f1f2aeda93487670d Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Fri, 11 Aug 2023 17:18:39 +0200 Subject: list_debug: Introduce inline wrappers for debug checks Turn the list debug checking functions __list_*_valid() into inline functions that wrap the out-of-line functions. Care is taken to ensure the inline wrappers are always inlined, so that additional compiler instrumentation (such as sanitizers) does not result in redundant outlining. This change is preparation for performing checks in the inline wrappers. No functional change intended. Signed-off-by: Marco Elver Link: https://lore.kernel.org/r/20230811151847.1594958-2-elver@google.com Signed-off-by: Kees Cook --- include/linux/list.h | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list.h b/include/linux/list.h index f10344dbad4d..130c6a1bb45c 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -39,10 +39,39 @@ static inline void INIT_LIST_HEAD(struct list_head *list) } #ifdef CONFIG_DEBUG_LIST -extern bool __list_add_valid(struct list_head *new, - struct list_head *prev, - struct list_head *next); -extern bool __list_del_entry_valid(struct list_head *entry); +/* + * Performs the full set of list corruption checks before __list_add(). + * On list corruption reports a warning, and returns false. + */ +extern bool __list_add_valid_or_report(struct list_head *new, + struct list_head *prev, + struct list_head *next); + +/* + * Performs list corruption checks before __list_add(). Returns false if a + * corruption is detected, true otherwise. + */ +static __always_inline bool __list_add_valid(struct list_head *new, + struct list_head *prev, + struct list_head *next) +{ + return __list_add_valid_or_report(new, prev, next); +} + +/* + * Performs the full set of list corruption checks before __list_del_entry(). + * On list corruption reports a warning, and returns false. + */ +extern bool __list_del_entry_valid_or_report(struct list_head *entry); + +/* + * Performs list corruption checks before __list_del_entry(). Returns false if a + * corruption is detected, true otherwise. + */ +static __always_inline bool __list_del_entry_valid(struct list_head *entry) +{ + return __list_del_entry_valid_or_report(entry); +} #else static inline bool __list_add_valid(struct list_head *new, struct list_head *prev, -- cgit v1.2.3 From aebc7b0d8d91bbc69e976909963046bc48bca4fd Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Fri, 11 Aug 2023 17:18:40 +0200 Subject: list: Introduce CONFIG_LIST_HARDENED Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The motivation behind this is that the option can be used as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025 are mitigated by the option [4]). The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel. Introduce CONFIG_LIST_HARDENED, which performs list pointer checking inline, and only upon list corruption calls the reporting slow path. To generate optimal machine code with CONFIG_LIST_HARDENED: 1. Elide checking for pointer values which upon dereference would result in an immediate access fault (i.e. minimal hardening checks). The trade-off is lower-quality error reports. 2. Use the __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path. Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case. 3. Because the inline checks are a subset of the full set of checks in __list_*_valid_or_report(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path. As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks. Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the Kconfig variables are changed to reflect that: DEBUG_LIST selects LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on DEBUG_LIST. Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases). Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4] Signed-off-by: Marco Elver Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com Signed-off-by: Kees Cook --- include/linux/list.h | 64 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 58 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list.h b/include/linux/list.h index 130c6a1bb45c..164b4d0e9d2a 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list) WRITE_ONCE(list->prev, list); } +#ifdef CONFIG_LIST_HARDENED + #ifdef CONFIG_DEBUG_LIST +# define __list_valid_slowpath +#else +# define __list_valid_slowpath __cold __preserve_most +#endif + /* * Performs the full set of list corruption checks before __list_add(). * On list corruption reports a warning, and returns false. */ -extern bool __list_add_valid_or_report(struct list_head *new, - struct list_head *prev, - struct list_head *next); +extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new, + struct list_head *prev, + struct list_head *next); /* * Performs list corruption checks before __list_add(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking + * inline to catch non-faulting corruptions, and only if a corruption is + * detected calls the reporting function __list_add_valid_or_report(). */ static __always_inline bool __list_add_valid(struct list_head *new, struct list_head *prev, struct list_head *next) { - return __list_add_valid_or_report(new, prev, next); + bool ret = true; + + if (!IS_ENABLED(CONFIG_DEBUG_LIST)) { + /* + * With the hardening version, elide checking if next and prev + * are NULL, since the immediate dereference of them below would + * result in a fault if NULL. + * + * With the reduced set of checks, we can afford to inline the + * checks, which also gives the compiler a chance to elide some + * of them completely if they can be proven at compile-time. If + * one of the pre-conditions does not hold, the slow-path will + * show a report which pre-condition failed. + */ + if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) + return true; + ret = false; + } + + ret &= __list_add_valid_or_report(new, prev, next); + return ret; } /* * Performs the full set of list corruption checks before __list_del_entry(). * On list corruption reports a warning, and returns false. */ -extern bool __list_del_entry_valid_or_report(struct list_head *entry); +extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry); /* * Performs list corruption checks before __list_del_entry(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_LIST_HARDENED only, performs minimal list integrity checking + * inline to catch non-faulting corruptions, and only if a corruption is + * detected calls the reporting function __list_del_entry_valid_or_report(). */ static __always_inline bool __list_del_entry_valid(struct list_head *entry) { - return __list_del_entry_valid_or_report(entry); + bool ret = true; + + if (!IS_ENABLED(CONFIG_DEBUG_LIST)) { + struct list_head *prev = entry->prev; + struct list_head *next = entry->next; + + /* + * With the hardening version, elide checking if next and prev + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate + * dereference of them below would result in a fault. + */ + if (likely(prev->next == entry && next->prev == entry)) + return true; + ret = false; + } + + ret &= __list_del_entry_valid_or_report(entry); + return ret; } #else static inline bool __list_add_valid(struct list_head *new, -- cgit v1.2.3 From 1e887723545e037b5e200e77edf79802f58fc818 Mon Sep 17 00:00:00 2001 From: Joel Granados Date: Wed, 9 Aug 2023 12:49:55 +0200 Subject: sysctl: Add ctl_table_size to ctl_table_header The new ctl_table_size element will hold the size of the ctl_table arrays contained in the ctl_table_header. This value should eventually be passed by the callers to the sysctl register infrastructure. And while this commit introduces the variable, it does not set nor use it because that requires case by case considerations for each caller. It provides two important things: (1) A place to put the result of the ctl_table array calculation when it gets introduced for each caller. And (2) the size that will be used as the additional stopping criteria in the list_for_each_table_entry macro (to be added when all the callers are migrated) Signed-off-by: Joel Granados Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 59d451f455bf..33252ad58ebe 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -159,12 +159,22 @@ struct ctl_node { struct ctl_table_header *header; }; -/* struct ctl_table_header is used to maintain dynamic lists of - struct ctl_table trees. */ +/** + * struct ctl_table_header - maintains dynamic lists of struct ctl_table trees + * @ctl_table: pointer to the first element in ctl_table array + * @ctl_table_size: number of elements pointed by @ctl_table + * @used: The entry will never be touched when equal to 0. + * @count: Upped every time something is added to @inodes and downed every time + * something is removed from inodes + * @nreg: When nreg drops to 0 the ctl_table_header will be unregistered. + * @rcu: Delays the freeing of the inode. Introduced with "unfuck proc_sysctl ->d_compare()" + * + */ struct ctl_table_header { union { struct { struct ctl_table *ctl_table; + int ctl_table_size; int used; int count; int nreg; -- cgit v1.2.3 From bff97cf11b261972cae90299432238cc9a9a6a51 Mon Sep 17 00:00:00 2001 From: Joel Granados Date: Wed, 9 Aug 2023 12:49:57 +0200 Subject: sysctl: Add a size arg to __register_sysctl_table We make these changes in order to prepare __register_sysctl_table and its callers for when we remove the sentinel element (empty element at the end of ctl_table arrays). We don't actually remove any sentinels in this commit, but we *do* make sure to use ARRAY_SIZE so the table_size is available when the removal occurs. We add a table_size argument to __register_sysctl_table and adjust callers, all of which pass ctl_table pointers and need an explicit call to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in order to forward the size of the array pointer received from the network register calls. The new table_size argument does not yet have any effect in the init_header call which is still dependent on the sentinel's presence. table_size *does* however drive the `kzalloc` allocation in __register_sysctl_table with no adverse effects as the allocated memory is either one element greater than the calculated ctl_table array (for the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size of the calculated ctl_table array (for the call from sysctl_net.c and register_sysctl). This approach will allows us to "just" remove the sentinel without further changes to __register_sysctl_table as table_size will represent the exact size for all the callers at that point. Signed-off-by: Joel Granados Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 33252ad58ebe..0495c858989f 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -226,7 +226,7 @@ extern void retire_sysctl_set(struct ctl_table_set *set); struct ctl_table_header *__register_sysctl_table( struct ctl_table_set *set, - const char *path, struct ctl_table *table); + const char *path, struct ctl_table *table, size_t table_size); struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table); void unregister_sysctl_table(struct ctl_table_header * table); -- cgit v1.2.3 From 9edbfe92a0a1355bae1e47c8f542ac0d39f19f8c Mon Sep 17 00:00:00 2001 From: Joel Granados Date: Wed, 9 Aug 2023 12:49:58 +0200 Subject: sysctl: Add size to register_sysctl This commit adds table_size to register_sysctl in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by either passing the table_size explicitly or using ARRAY_SIZE on the ctl_table arrays. We replace the register_syctl function with a macro that will add the ARRAY_SIZE to the new register_sysctl_sz function. In this way the callers that are already using an array of ctl_table structs do not change. For the callers that pass a ctl_table array pointer, we pass the table_size to register_sysctl_sz instead of the macro. Signed-off-by: Joel Granados Suggested-by: Greg Kroah-Hartman Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 0495c858989f..b1168ae281c9 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -215,6 +215,9 @@ struct ctl_path { const char *procname; }; +#define register_sysctl(path, table) \ + register_sysctl_sz(path, table, ARRAY_SIZE(table)) + #ifdef CONFIG_SYSCTL void proc_sys_poll_notify(struct ctl_table_poll *poll); @@ -227,7 +230,8 @@ extern void retire_sysctl_set(struct ctl_table_set *set); struct ctl_table_header *__register_sysctl_table( struct ctl_table_set *set, const char *path, struct ctl_table *table, size_t table_size); -struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table); +struct ctl_table_header *register_sysctl_sz(const char *path, struct ctl_table *table, + size_t table_size); void unregister_sysctl_table(struct ctl_table_header * table); extern int sysctl_init_bases(void); @@ -262,7 +266,9 @@ static inline struct ctl_table_header *register_sysctl_mount_point(const char *p return NULL; } -static inline struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table) +static inline struct ctl_table_header *register_sysctl_sz(const char *path, + struct ctl_table *table, + size_t table_size) { return NULL; } -- cgit v1.2.3 From 3bc269cfd3e119be84b69d95aec3a9e147016bb2 Mon Sep 17 00:00:00 2001 From: Joel Granados Date: Wed, 9 Aug 2023 12:49:59 +0200 Subject: sysctl: Add size arg to __register_sysctl_init This commit adds table_size to __register_sysctl_init in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do *not* remove any sentinels in this commit, we set things up by calculating the ctl_table array size with ARRAY_SIZE. We add a table_size argument to __register_sysctl_init and modify the register_sysctl_init macro to calculate the array size with ARRAY_SIZE. The original callers do not need to be updated as they will go through the new macro. Signed-off-by: Joel Granados Suggested-by: Greg Kroah-Hartman Signed-off-by: Luis Chamberlain --- include/linux/sysctl.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b1168ae281c9..09d7429d67c0 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -236,8 +236,9 @@ void unregister_sysctl_table(struct ctl_table_header * table); extern int sysctl_init_bases(void); extern void __register_sysctl_init(const char *path, struct ctl_table *table, - const char *table_name); -#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table) + const char *table_name, size_t table_size); +#define register_sysctl_init(path, table) \ + __register_sysctl_init(path, table, #table, ARRAY_SIZE(table)) extern struct ctl_table_header *register_sysctl_mount_point(const char *path); void do_sysctl_args(void); -- cgit v1.2.3 From ac8a52962164a50e693fa021d3564d7745b83a7f Mon Sep 17 00:00:00 2001 From: Abel Wu Date: Mon, 14 Aug 2023 15:09:11 +0800 Subject: net-memcg: Fix scope of sockmem pressure indicators Now there are two indicators of socket memory pressure sit inside struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating memory reclaim pressure in memcg->memory and ->tcpmem respectively. When in legacy mode (cgroupv1), the socket memory is charged into ->tcpmem which is independent of ->memory, so socket_pressure has nothing to do with socket's pressure at all. Things could be worse by taking socket_pressure into consideration in legacy mode, as a pressure in ->memory can lead to premature reclamation/throttling in socket. While for the default mode (cgroupv2), the socket memory is charged into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used. So {socket,tcpmem}_pressure are only used in default/legacy mode respectively for indicating socket memory pressure. This patch fixes the pieces of code that make mixed use of both. Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure") Signed-off-by: Abel Wu Acked-by: Shakeel Butt Signed-off-by: David S. Miller --- include/linux/memcontrol.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5818af8eca5a..dbf26bc89dd4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -284,6 +284,11 @@ struct mem_cgroup { atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; atomic_long_t memory_events_local[MEMCG_NR_MEMORY_EVENTS]; + /* + * Hint of reclaim pressure for socket memroy management. Note + * that this indicator should NOT be used in legacy cgroup mode + * where socket memory is accounted/charged separately. + */ unsigned long socket_pressure; /* Legacy tcp memory accounting */ @@ -1727,8 +1732,8 @@ void mem_cgroup_sk_alloc(struct sock *sk); void mem_cgroup_sk_free(struct sock *sk); static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure) - return true; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return !!memcg->tcpmem_pressure; do { if (time_before(jiffies, READ_ONCE(memcg->socket_pressure))) return true; -- cgit v1.2.3 From 83bc0982bf25edfcb261bebee51abc76ce8a975b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Tue, 18 Jul 2023 18:20:15 +0200 Subject: counter: Declare counter_priv() to be const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to the gcc manual functions "whose return value is not affected by changes to the observable state of the program and that have no observable effects on such state other than to return a value may lend themselves to optimizations such as common subexpression elimination. Declaring such functions with the 'const' attribute allows GCC to avoid emitting some calls in repeated invocations of the function with the same argument values." counter_priv() is such a function and so can be marked with the const function attribute. The effect for an arm allyesconfig build according to bloat-o-meter (on top of v6.5-rc2) is: add/remove: 0/1 grow/shrink: 2/14 up/down: 524/-1064 (-540) Function old new delta rz_mtu3_count_enable_write 632 1152 +520 stm32_count_enable_write 372 376 +4 ti_eqep_action_read 456 452 -4 stm32_lptim_cnt_action_write 400 392 -8 stm32_lptim_cnt_action_read 300 288 -12 rz_mtu3_count_write 296 284 -12 rz_mtu3_count_read 304 292 -12 rz_mtu3_count_function_read 212 200 -12 rz_mtu3_count_direction_read 268 256 -12 rz_mtu3_action_read 628 616 -12 rz_mtu3_count_function_write 328 312 -16 ecap_cnt_suspend 364 340 -24 ecap_cnt_resume 300 276 -24 rz_mtu3_count_ceiling_write 596 560 -36 rz_mtu3_count_enable_read 332 288 -44 rz_mtu3_count_ceiling_read 384 340 -44 rz_mtu3_initialize_counter 792 - -792 Total: Before=60715, After=60175, chg -0.89% Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20230718162015.3940148-1-u.kleine-koenig@pengutronix.de/ Signed-off-by: William Breathitt Gray --- include/linux/counter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index b63746637de2..702e9108bbb4 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -399,7 +399,7 @@ struct counter_device { struct mutex ops_exist_lock; }; -void *counter_priv(const struct counter_device *const counter); +void *counter_priv(const struct counter_device *const counter) __attribute_const__; struct counter_device *counter_alloc(size_t sizeof_priv); void counter_put(struct counter_device *const counter); -- cgit v1.2.3 From 53834a0c09252dea7918a9e1788bad880690900b Mon Sep 17 00:00:00 2001 From: Benjamin Gray Date: Tue, 1 Aug 2023 11:17:44 +1000 Subject: perf/hw_breakpoint: Remove arch breakpoint hooks PowerPC was the only user of these hooks, and has been refactored to no longer require them. There is no need to keep them around, so remove them to reduce complexity. Signed-off-by: Benjamin Gray Signed-off-by: Michael Ellerman Link: https://msgid.link/20230801011744.153973-8-bgray@linux.ibm.com --- include/linux/hw_breakpoint.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h index 7fbb45911273..db199d653dd1 100644 --- a/include/linux/hw_breakpoint.h +++ b/include/linux/hw_breakpoint.h @@ -90,9 +90,6 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp); extern int dbg_release_bp_slot(struct perf_event *bp); extern int reserve_bp_slot(struct perf_event *bp); extern void release_bp_slot(struct perf_event *bp); -int arch_reserve_bp_slot(struct perf_event *bp); -void arch_release_bp_slot(struct perf_event *bp); -void arch_unregister_hw_breakpoint(struct perf_event *bp); extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk); -- cgit v1.2.3 From 366d259ff597e81d90639ae21269b3f82cd4ebb7 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Wed, 2 Aug 2023 10:03:19 +0200 Subject: perf: Fix wrong comment about default event_idx Since commit c719f56092ad ("perf: Fix and clean up initialization of pmu::event_idx"), event_idx default implementation has returned 0, not idx + 1, so fix the comment that can be misleading. Signed-off-by: Alexandre Ghiti Reviewed-by: Andrew Jones Reviewed-by: Atish Patra --- include/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2166a69e3bf2..1269c96bc3b6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -445,7 +445,8 @@ struct pmu { /* * Will return the value for perf_event_mmap_page::index for this event, - * if no implementation is provided it will default to: event->hw.idx + 1. + * if no implementation is provided it will default to 0 (see + * perf_event_idx_default). */ int (*event_idx) (struct perf_event *event); /*optional */ -- cgit v1.2.3 From f117ae55b0198dd6fc36ce521e06d8b44a4bb203 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Wed, 2 Aug 2023 10:03:20 +0200 Subject: include: riscv: Fix wrong include guard in riscv_pmu.h The current include guard prevents the inclusion of asm/perf_event.h which uses the same include guard: fix the one in riscv_pmu.h so that it matches the file name. Signed-off-by: Alexandre Ghiti Reviewed-by: Conor Dooley Reviewed-by: Andrew Jones Reviewed-by: Atish Patra --- include/linux/perf/riscv_pmu.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 43fc892aa7d9..9f70d94942e0 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -6,8 +6,8 @@ * */ -#ifndef _ASM_RISCV_PERF_EVENT_H -#define _ASM_RISCV_PERF_EVENT_H +#ifndef _RISCV_PMU_H +#define _RISCV_PMU_H #include #include @@ -81,4 +81,4 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr); #endif /* CONFIG_RISCV_PMU */ -#endif /* _ASM_RISCV_PERF_EVENT_H */ +#endif /* _RISCV_PMU_H */ -- cgit v1.2.3 From d5ac062d82d87124ac75e4273e3887578a7fae60 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Wed, 2 Aug 2023 10:03:22 +0200 Subject: drivers: perf: Rename riscv pmu sbi driver That's just cosmetic, no functional changes. Signed-off-by: Alexandre Ghiti Reviewed-by: Andrew Jones Reviewed-by: Atish Patra --- include/linux/perf/riscv_pmu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 9f70d94942e0..5deeea0be7cb 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -21,7 +21,7 @@ #define RISCV_MAX_COUNTERS 64 #define RISCV_OP_UNSUPP (-EOPNOTSUPP) -#define RISCV_PMU_PDEV_NAME "riscv-pmu" +#define RISCV_PMU_SBI_PDEV_NAME "riscv-pmu-sbi" #define RISCV_PMU_LEGACY_PDEV_NAME "riscv-pmu-legacy" #define RISCV_PMU_STOP_FLAG_RESET 1 -- cgit v1.2.3 From 83c5e13b8cbbed9479cf568e03a5010d827e9781 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Wed, 2 Aug 2023 10:03:23 +0200 Subject: riscv: Prepare for user-space perf event mmap support Provide all the necessary bits in the generic riscv pmu driver to be able to mmap perf events in userspace: the heavy lifting lies in the driver backend, namely the legacy and sbi implementations. Note that arch_perf_update_userpage is almost a copy of arm64 code. Signed-off-by: Alexandre Ghiti Reviewed-by: Andrew Jones Reviewed-by: Atish Patra --- include/linux/perf/riscv_pmu.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 5deeea0be7cb..43282e22ebe1 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -55,6 +55,10 @@ struct riscv_pmu { void (*ctr_start)(struct perf_event *event, u64 init_val); void (*ctr_stop)(struct perf_event *event, unsigned long flag); int (*event_map)(struct perf_event *event, u64 *config); + void (*event_init)(struct perf_event *event); + void (*event_mapped)(struct perf_event *event, struct mm_struct *mm); + void (*event_unmapped)(struct perf_event *event, struct mm_struct *mm); + uint8_t (*csr_index)(struct perf_event *event); struct cpu_hw_events __percpu *hw_events; struct hlist_node node; -- cgit v1.2.3 From dd2e84bb3804103ad1c26a21deb4b35b0e166746 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:05 +0200 Subject: virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1 The two most problematic virtchnl structures are virtchnl_rss_key and virtchnl_rss_lut. Their "flex" arrays have the type of u8, thus, when allocating / checking, the actual size is calculated as `sizeof + nents - 1 byte`. But their sizeof() is not 1 byte larger than the size of such structure with proper flex array, it's two bytes larger due to the padding. That said, their size is always 1 byte larger unless there are no tail elements -- then it's +2 bytes. Add virtchnl_struct_size() macro which will handle this case (and later other cases as well). Make its calling conv the same as we call struct_size() to allow it to be drop-in, even though it's unlikely to become possible to switch to generic API. The macro will calculate a proper size of a structure with a flex array at the end, so that it becomes transparent for the compilers, but add the difference from the old values, so that the real size of sorta-ABI-messages doesn't change. Use it on the allocation side in IAVF and the receiving side (defined as static inline in virtchnl.h) for the mentioned two structures. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index c15221dcb75e..3ab207c14809 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -866,18 +866,20 @@ VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_promisc_info); struct virtchnl_rss_key { u16 vsi_id; u16 key_len; - u8 key[1]; /* RSS hash key, packed bytes */ + u8 key[]; /* RSS hash key, packed bytes */ }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_rss_key); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rss_key); +#define virtchnl_rss_key_LEGACY_SIZEOF 6 struct virtchnl_rss_lut { u16 vsi_id; u16 lut_entries; - u8 lut[1]; /* RSS lookup table */ + u8 lut[]; /* RSS lookup table */ }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_rss_lut); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rss_lut); +#define virtchnl_rss_lut_LEGACY_SIZEOF 6 /* VIRTCHNL_OP_GET_RSS_HENA_CAPS * VIRTCHNL_OP_SET_RSS_HENA @@ -1367,6 +1369,17 @@ struct virtchnl_fdir_del { VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); +#define __vss_byone(p, member, count, old) \ + (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) + +#define __vss(type, func, p, member, count) \ + struct type: func(p, member, count, type##_LEGACY_SIZEOF) + +#define virtchnl_struct_size(p, m, c) \ + _Generic(*p, \ + __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ + __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) + /** * virtchnl_vc_validate_vf_msg * @ver: Virtchnl version info @@ -1479,19 +1492,21 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, } break; case VIRTCHNL_OP_CONFIG_RSS_KEY: - valid_len = sizeof(struct virtchnl_rss_key); + valid_len = virtchnl_rss_key_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rss_key *vrk = (struct virtchnl_rss_key *)msg; - valid_len += vrk->key_len - 1; + valid_len = virtchnl_struct_size(vrk, key, + vrk->key_len); } break; case VIRTCHNL_OP_CONFIG_RSS_LUT: - valid_len = sizeof(struct virtchnl_rss_lut); + valid_len = virtchnl_rss_lut_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rss_lut *vrl = (struct virtchnl_rss_lut *)msg; - valid_len += vrl->lut_entries - 1; + valid_len = virtchnl_struct_size(vrl, lut, + vrl->lut_entries); } break; case VIRTCHNL_OP_GET_RSS_HENA_CAPS: -- cgit v1.2.3 From 5e7f59fa07f86f554c301c7a383bba54d5ef9819 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:06 +0200 Subject: virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1` There are five virtchnl structures, which are allocated and checked in the code as `nents + 1`, meaning that they always have memory for one excessive element regardless of their actual number. This comes from that their sizeof() includes space for 1 element and then they get allocated via struct_size() or its open-coded equivalents, passing the actual number of elements. Expand virtchnl_struct_size() to handle such structures and replace those 1-elem arrays with proper flex ones. Also fix several places which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the virtchnl_ether_addr_list size be computed automatically when there's no enough space for the whole list, otherwise we have to open-code reverse struct_size() logics. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 57 +++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 3ab207c14809..c1a20b533fc0 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -268,10 +268,11 @@ struct virtchnl_vf_resource { u32 rss_key_size; u32 rss_lut_size; - struct virtchnl_vsi_resource vsi_res[1]; + struct virtchnl_vsi_resource vsi_res[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(36, virtchnl_vf_resource); +VIRTCHNL_CHECK_STRUCT_LEN(20, virtchnl_vf_resource); +#define virtchnl_vf_resource_LEGACY_SIZEOF 36 /* VIRTCHNL_OP_CONFIG_TX_QUEUE * VF sends this message to set up parameters for one TX queue. @@ -340,10 +341,11 @@ struct virtchnl_vsi_queue_config_info { u16 vsi_id; u16 num_queue_pairs; u32 pad; - struct virtchnl_queue_pair_info qpair[1]; + struct virtchnl_queue_pair_info qpair[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_vsi_queue_config_info); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_vsi_queue_config_info); +#define virtchnl_vsi_queue_config_info_LEGACY_SIZEOF 72 /* VIRTCHNL_OP_REQUEST_QUEUES * VF sends this message to request the PF to allocate additional queues to @@ -385,10 +387,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_vector_map); struct virtchnl_irq_map_info { u16 num_vectors; - struct virtchnl_vector_map vecmap[1]; + struct virtchnl_vector_map vecmap[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(14, virtchnl_irq_map_info); +VIRTCHNL_CHECK_STRUCT_LEN(2, virtchnl_irq_map_info); +#define virtchnl_irq_map_info_LEGACY_SIZEOF 14 /* VIRTCHNL_OP_ENABLE_QUEUES * VIRTCHNL_OP_DISABLE_QUEUES @@ -459,10 +462,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_ether_addr); struct virtchnl_ether_addr_list { u16 vsi_id; u16 num_elements; - struct virtchnl_ether_addr list[1]; + struct virtchnl_ether_addr list[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_ether_addr_list); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_ether_addr_list); +#define virtchnl_ether_addr_list_LEGACY_SIZEOF 12 /* VIRTCHNL_OP_ADD_VLAN * VF sends this message to add one or more VLAN tag filters for receives. @@ -481,10 +485,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_ether_addr_list); struct virtchnl_vlan_filter_list { u16 vsi_id; u16 num_elements; - u16 vlan_id[1]; + u16 vlan_id[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_vlan_filter_list); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_vlan_filter_list); +#define virtchnl_vlan_filter_list_LEGACY_SIZEOF 6 /* This enum is used for all of the VIRTCHNL_VF_OFFLOAD_VLAN_V2_CAPS related * structures and opcodes. @@ -1372,11 +1377,19 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); #define __vss_byone(p, member, count, old) \ (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) +#define __vss_full(p, member, count, old) \ + (struct_size(p, member, count) + (old - struct_size(p, member, 0))) + #define __vss(type, func, p, member, count) \ struct type: func(p, member, count, type##_LEGACY_SIZEOF) #define virtchnl_struct_size(p, m, c) \ _Generic(*p, \ + __vss(virtchnl_vf_resource, __vss_full, p, m, c), \ + __vss(virtchnl_vsi_queue_config_info, __vss_full, p, m, c), \ + __vss(virtchnl_irq_map_info, __vss_full, p, m, c), \ + __vss(virtchnl_ether_addr_list, __vss_full, p, m, c), \ + __vss(virtchnl_vlan_filter_list, __vss_full, p, m, c), \ __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) @@ -1414,24 +1427,23 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, valid_len = sizeof(struct virtchnl_rxq_info); break; case VIRTCHNL_OP_CONFIG_VSI_QUEUES: - valid_len = sizeof(struct virtchnl_vsi_queue_config_info); + valid_len = virtchnl_vsi_queue_config_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vsi_queue_config_info *vqc = (struct virtchnl_vsi_queue_config_info *)msg; - valid_len += (vqc->num_queue_pairs * - sizeof(struct - virtchnl_queue_pair_info)); + valid_len = virtchnl_struct_size(vqc, qpair, + vqc->num_queue_pairs); if (vqc->num_queue_pairs == 0) err_msg_format = true; } break; case VIRTCHNL_OP_CONFIG_IRQ_MAP: - valid_len = sizeof(struct virtchnl_irq_map_info); + valid_len = virtchnl_irq_map_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_irq_map_info *vimi = (struct virtchnl_irq_map_info *)msg; - valid_len += (vimi->num_vectors * - sizeof(struct virtchnl_vector_map)); + valid_len = virtchnl_struct_size(vimi, vecmap, + vimi->num_vectors); if (vimi->num_vectors == 0) err_msg_format = true; } @@ -1442,23 +1454,24 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, break; case VIRTCHNL_OP_ADD_ETH_ADDR: case VIRTCHNL_OP_DEL_ETH_ADDR: - valid_len = sizeof(struct virtchnl_ether_addr_list); + valid_len = virtchnl_ether_addr_list_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_ether_addr_list *veal = (struct virtchnl_ether_addr_list *)msg; - valid_len += veal->num_elements * - sizeof(struct virtchnl_ether_addr); + valid_len = virtchnl_struct_size(veal, list, + veal->num_elements); if (veal->num_elements == 0) err_msg_format = true; } break; case VIRTCHNL_OP_ADD_VLAN: case VIRTCHNL_OP_DEL_VLAN: - valid_len = sizeof(struct virtchnl_vlan_filter_list); + valid_len = virtchnl_vlan_filter_list_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vlan_filter_list *vfl = (struct virtchnl_vlan_filter_list *)msg; - valid_len += vfl->num_elements * sizeof(u16); + valid_len = virtchnl_struct_size(vfl, vlan_id, + vfl->num_elements); if (vfl->num_elements == 0) err_msg_format = true; } -- cgit v1.2.3 From b0654e64dbaf62f565b5f2b4fbd92202e88dcba3 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:07 +0200 Subject: virtchnl: fix fake 1-elem arrays for structures allocated as `nents` Finally, fix 3 structures which are allocated technically correctly, i.e. the calculated size equals to the one that struct_size() would return, except for sizeof(). For &virtchnl_vlan_filter_list_v2, use the same approach when there are no enough space as taken previously for &virtchnl_vlan_filter_list, i.e. let the maximum size be calculated automatically instead of trying to guestimate it using maths. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index c1a20b533fc0..d0807ad43f93 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -716,10 +716,11 @@ struct virtchnl_vlan_filter_list_v2 { u16 vport_id; u16 num_elements; u8 pad[4]; - struct virtchnl_vlan_filter filters[1]; + struct virtchnl_vlan_filter filters[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_vlan_filter_list_v2); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_vlan_filter_list_v2); +#define virtchnl_vlan_filter_list_v2_LEGACY_SIZEOF 40 /* VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2 * VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 @@ -918,10 +919,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_channel_info); struct virtchnl_tc_info { u32 num_tc; u32 pad; - struct virtchnl_channel_info list[1]; + struct virtchnl_channel_info list[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_tc_info); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_tc_info); +#define virtchnl_tc_info_LEGACY_SIZEOF 24 /* VIRTCHNL_ADD_CLOUD_FILTER * VIRTCHNL_DEL_CLOUD_FILTER @@ -1059,10 +1061,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_rdma_qv_info); struct virtchnl_rdma_qvlist_info { u32 num_vectors; - struct virtchnl_rdma_qv_info qv_info[1]; + struct virtchnl_rdma_qv_info qv_info[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_rdma_qvlist_info); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rdma_qvlist_info); +#define virtchnl_rdma_qvlist_info_LEGACY_SIZEOF 16 /* VF reset states - these are written into the RSTAT register: * VFGEN_RSTAT on the VF @@ -1377,6 +1380,9 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); #define __vss_byone(p, member, count, old) \ (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) +#define __vss_byelem(p, member, count, old) \ + (struct_size(p, member, count - 1) + (old - struct_size(p, member, 0))) + #define __vss_full(p, member, count, old) \ (struct_size(p, member, count) + (old - struct_size(p, member, 0))) @@ -1390,6 +1396,9 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); __vss(virtchnl_irq_map_info, __vss_full, p, m, c), \ __vss(virtchnl_ether_addr_list, __vss_full, p, m, c), \ __vss(virtchnl_vlan_filter_list, __vss_full, p, m, c), \ + __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c), \ + __vss(virtchnl_tc_info, __vss_byelem, p, m, c), \ + __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c), \ __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) @@ -1495,13 +1504,13 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, case VIRTCHNL_OP_RELEASE_RDMA_IRQ_MAP: break; case VIRTCHNL_OP_CONFIG_RDMA_IRQ_MAP: - valid_len = sizeof(struct virtchnl_rdma_qvlist_info); + valid_len = virtchnl_rdma_qvlist_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rdma_qvlist_info *qv = (struct virtchnl_rdma_qvlist_info *)msg; - valid_len += ((qv->num_vectors - 1) * - sizeof(struct virtchnl_rdma_qv_info)); + valid_len = virtchnl_struct_size(qv, qv_info, + qv->num_vectors); } break; case VIRTCHNL_OP_CONFIG_RSS_KEY: @@ -1534,12 +1543,12 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, valid_len = sizeof(struct virtchnl_vf_res_request); break; case VIRTCHNL_OP_ENABLE_CHANNELS: - valid_len = sizeof(struct virtchnl_tc_info); + valid_len = virtchnl_tc_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_tc_info *vti = (struct virtchnl_tc_info *)msg; - valid_len += (vti->num_tc - 1) * - sizeof(struct virtchnl_channel_info); + valid_len = virtchnl_struct_size(vti, list, + vti->num_tc); if (vti->num_tc == 0) err_msg_format = true; } @@ -1566,13 +1575,13 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, break; case VIRTCHNL_OP_ADD_VLAN_V2: case VIRTCHNL_OP_DEL_VLAN_V2: - valid_len = sizeof(struct virtchnl_vlan_filter_list_v2); + valid_len = virtchnl_vlan_filter_list_v2_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vlan_filter_list_v2 *vfl = (struct virtchnl_vlan_filter_list_v2 *)msg; - valid_len += (vfl->num_elements - 1) * - sizeof(struct virtchnl_vlan_filter); + valid_len = virtchnl_struct_size(vfl, filters, + vfl->num_elements); if (vfl->num_elements == 0) { err_msg_format = true; -- cgit v1.2.3 From 9a4087fab303e7923ab839a6fe35059659a54649 Mon Sep 17 00:00:00 2001 From: Brett Creeley Date: Mon, 7 Aug 2023 13:57:48 -0700 Subject: vfio: Commonize combine_ranges for use in other VFIO drivers Currently only Mellanox uses the combine_ranges function. The new pds_vfio driver also needs this function. So, move it to a common location for other vendor drivers to use. Also, fix RCT ordering while moving/renaming the function. Cc: Yishai Hadas Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20230807205755.29579-2-brett.creeley@amd.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 5a1dee983f17..454e9295970c 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -283,6 +283,9 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm); +void vfio_combine_iova_ranges(struct rb_root_cached *root, u32 cur_nodes, + u32 req_nodes); + /* * External user API */ -- cgit v1.2.3 From b021d05e106e14b603a584b38ce62720e7d0f363 Mon Sep 17 00:00:00 2001 From: Brett Creeley Date: Mon, 7 Aug 2023 13:57:50 -0700 Subject: pds_core: Require callers of register/unregister to pass PF drvdata Pass a pointer to the PF's private data structure rather than bouncing in and out of the PF's PCI function address. Signed-off-by: Shannon Nelson Signed-off-by: Brett Creeley Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20230807205755.29579-4-brett.creeley@amd.com Signed-off-by: Alex Williamson --- include/linux/pds/pds_common.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h index 435c8e8161c2..04427dcc0a59 100644 --- a/include/linux/pds/pds_common.h +++ b/include/linux/pds/pds_common.h @@ -41,9 +41,11 @@ enum pds_core_vif_types { #define PDS_VDPA_DEV_NAME PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR +struct pdsc; + int pdsc_register_notify(struct notifier_block *nb); void pdsc_unregister_notify(struct notifier_block *nb); void *pdsc_get_pf_struct(struct pci_dev *vf_pdev); -int pds_client_register(struct pci_dev *pf_pdev, char *devname); -int pds_client_unregister(struct pci_dev *pf_pdev, u16 client_id); +int pds_client_register(struct pdsc *pf, char *devname); +int pds_client_unregister(struct pdsc *pf, u16 client_id); #endif /* _PDS_COMMON_H_ */ -- cgit v1.2.3 From 63f77a7161a2df9924eea9be3b6c63be10151252 Mon Sep 17 00:00:00 2001 From: Brett Creeley Date: Mon, 7 Aug 2023 13:57:51 -0700 Subject: vfio/pds: register with the pds_core PF The pds_core driver will supply adminq services, so find the PF and register with the DSC services. Use the following commands to enable a VF: echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20230807205755.29579-5-brett.creeley@amd.com Signed-off-by: Alex Williamson --- include/linux/pds/pds_common.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h index 04427dcc0a59..30581e2e04cc 100644 --- a/include/linux/pds/pds_common.h +++ b/include/linux/pds/pds_common.h @@ -34,12 +34,13 @@ enum pds_core_vif_types { #define PDS_DEV_TYPE_CORE_STR "Core" #define PDS_DEV_TYPE_VDPA_STR "vDPA" -#define PDS_DEV_TYPE_VFIO_STR "VFio" +#define PDS_DEV_TYPE_VFIO_STR "vfio" #define PDS_DEV_TYPE_ETH_STR "Eth" #define PDS_DEV_TYPE_RDMA_STR "RDMA" #define PDS_DEV_TYPE_LM_STR "LM" #define PDS_VDPA_DEV_NAME PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR +#define PDS_VFIO_LM_DEV_NAME PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_LM_STR "." PDS_DEV_TYPE_VFIO_STR struct pdsc; -- cgit v1.2.3 From bb500dbe2ac622551d98c0bb2735a68f59489c98 Mon Sep 17 00:00:00 2001 From: Brett Creeley Date: Mon, 7 Aug 2023 13:57:52 -0700 Subject: vfio/pds: Add VFIO live migration support Add live migration support via the VFIO subsystem. The migration implementation aligns with the definition from uapi/vfio.h and uses the pds_core PF's adminq for device configuration. The ability to suspend, resume, and transfer VF device state data is included along with the required admin queue command structures and implementations. PDS_LM_CMD_SUSPEND and PDS_LM_CMD_SUSPEND_STATUS are added to support the VF device suspend operation. PDS_LM_CMD_RESUME is added to support the VF device resume operation. PDS_LM_CMD_STATE_SIZE is added to determine the exact size of the VF device state data. PDS_LM_CMD_SAVE is added to get the VF device state data. PDS_LM_CMD_RESTORE is added to restore the VF device with the previously saved data from PDS_LM_CMD_SAVE. PDS_LM_CMD_HOST_VF_STATUS is added to notify the DSC/firmware when a migration is in/not-in progress from the host's perspective. The DSC/firmware can use this to clear/setup any necessary state related to a migration. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20230807205755.29579-6-brett.creeley@amd.com Signed-off-by: Alex Williamson --- include/linux/pds/pds_adminq.h | 197 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h index bcba7fda3cc9..9c79b3c8fc47 100644 --- a/include/linux/pds/pds_adminq.h +++ b/include/linux/pds/pds_adminq.h @@ -818,6 +818,194 @@ struct pds_vdpa_set_features_cmd { __le64 features; }; +#define PDS_LM_DEVICE_STATE_LENGTH 65536 +#define PDS_LM_CHECK_DEVICE_STATE_LENGTH(X) \ + PDS_CORE_SIZE_CHECK(union, PDS_LM_DEVICE_STATE_LENGTH, X) + +/* + * enum pds_lm_cmd_opcode - Live Migration Device commands + */ +enum pds_lm_cmd_opcode { + PDS_LM_CMD_HOST_VF_STATUS = 1, + + /* Device state commands */ + PDS_LM_CMD_STATE_SIZE = 16, + PDS_LM_CMD_SUSPEND = 18, + PDS_LM_CMD_SUSPEND_STATUS = 19, + PDS_LM_CMD_RESUME = 20, + PDS_LM_CMD_SAVE = 21, + PDS_LM_CMD_RESTORE = 22, +}; + +/** + * struct pds_lm_cmd - generic command + * @opcode: Opcode + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Structure padding to 60 Bytes + */ +struct pds_lm_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[56]; +}; + +/** + * struct pds_lm_state_size_cmd - STATE_SIZE command + * @opcode: Opcode + * @rsvd: Word boundary padding + * @vf_id: VF id + */ +struct pds_lm_state_size_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; +}; + +/** + * struct pds_lm_state_size_comp - STATE_SIZE command completion + * @status: Status of the command (enum pds_core_status_code) + * @rsvd: Word boundary padding + * @comp_index: Index in the desc ring for which this is the completion + * @size: Size of the device state + * @rsvd2: Word boundary padding + * @color: Color bit + */ +struct pds_lm_state_size_comp { + u8 status; + u8 rsvd; + __le16 comp_index; + union { + __le64 size; + u8 rsvd2[11]; + } __packed; + u8 color; +}; + +enum pds_lm_suspend_resume_type { + PDS_LM_SUSPEND_RESUME_TYPE_FULL = 0, + PDS_LM_SUSPEND_RESUME_TYPE_P2P = 1, +}; + +/** + * struct pds_lm_suspend_cmd - SUSPEND command + * @opcode: Opcode PDS_LM_CMD_SUSPEND + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of suspend (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_suspend_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_suspend_status_cmd - SUSPEND status command + * @opcode: Opcode PDS_AQ_CMD_LM_SUSPEND_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of suspend (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_suspend_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_resume_cmd - RESUME command + * @opcode: Opcode PDS_LM_CMD_RESUME + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of resume (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_resume_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_sg_elem - Transmit scatter-gather (SG) descriptor element + * @addr: DMA address of SG element data buffer + * @len: Length of SG element data buffer, in bytes + * @rsvd: Word boundary padding + */ +struct pds_lm_sg_elem { + __le64 addr; + __le32 len; + __le16 rsvd[2]; +}; + +/** + * struct pds_lm_save_cmd - SAVE command + * @opcode: Opcode PDS_LM_CMD_SAVE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Word boundary padding + * @sgl_addr: IOVA address of the SGL to dma the device state + * @num_sge: Total number of SG elements + */ +struct pds_lm_save_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[4]; + __le64 sgl_addr; + __le32 num_sge; +} __packed; + +/** + * struct pds_lm_restore_cmd - RESTORE command + * @opcode: Opcode PDS_LM_CMD_RESTORE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Word boundary padding + * @sgl_addr: IOVA address of the SGL to dma the device state + * @num_sge: Total number of SG elements + */ +struct pds_lm_restore_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[4]; + __le64 sgl_addr; + __le32 num_sge; +} __packed; + +/** + * union pds_lm_dev_state - device state information + * @words: Device state words + */ +union pds_lm_dev_state { + __le32 words[PDS_LM_DEVICE_STATE_LENGTH / sizeof(__le32)]; +}; + +enum pds_lm_host_vf_status { + PDS_LM_STA_NONE = 0, + PDS_LM_STA_IN_PROGRESS, + PDS_LM_STA_MAX, +}; + +/** + * struct pds_lm_host_vf_status_cmd - HOST_VF_STATUS command + * @opcode: Opcode PDS_LM_CMD_HOST_VF_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @status: Current LM status of host VF driver (enum pds_lm_host_status) + */ +struct pds_lm_host_vf_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 status; +}; + union pds_core_adminq_cmd { u8 opcode; u8 bytes[64]; @@ -844,6 +1032,13 @@ union pds_core_adminq_cmd { struct pds_vdpa_vq_init_cmd vdpa_vq_init; struct pds_vdpa_vq_reset_cmd vdpa_vq_reset; + struct pds_lm_suspend_cmd lm_suspend; + struct pds_lm_suspend_status_cmd lm_suspend_status; + struct pds_lm_resume_cmd lm_resume; + struct pds_lm_state_size_cmd lm_state_size; + struct pds_lm_save_cmd lm_save; + struct pds_lm_restore_cmd lm_restore; + struct pds_lm_host_vf_status_cmd lm_host_vf_status; }; union pds_core_adminq_comp { @@ -868,6 +1063,8 @@ union pds_core_adminq_comp { struct pds_vdpa_vq_init_comp vdpa_vq_init; struct pds_vdpa_vq_reset_comp vdpa_vq_reset; + + struct pds_lm_state_size_comp lm_state_size; }; #ifndef __CHECKER__ -- cgit v1.2.3 From f232836a9152c34ffd82bb5d5c242a1f6808be12 Mon Sep 17 00:00:00 2001 From: Brett Creeley Date: Mon, 7 Aug 2023 13:57:53 -0700 Subject: vfio/pds: Add support for dirty page tracking In order to support dirty page tracking, the driver has to implement the VFIO subsystem's vfio_log_ops. This includes log_start, log_stop, and log_read_and_clear. All of the tracker resources are allocated and dirty tracking on the device is started during log_start. The resources are cleaned up and dirty tracking on the device is stopped during log_stop. The dirty pages are determined and reported during log_read_and_clear. In order to support these callbacks admin queue commands are used. All of the adminq queue command structures and implementations are included as part of this patch. PDS_LM_CMD_DIRTY_STATUS is added to query the current status of dirty tracking on the device. This includes if it's enabled (i.e. number of regions being tracked from the device's perspective) and the maximum number of regions supported from the device's perspective. PDS_LM_CMD_DIRTY_ENABLE is added to enable dirty tracking on the specified number of regions and their iova ranges. PDS_LM_CMD_DIRTY_DISABLE is added to disable dirty tracking for all regions on the device. PDS_LM_CMD_READ_SEQ and PDS_LM_CMD_DIRTY_WRITE_ACK are added to support reading and acknowledging the currently dirtied pages. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20230807205755.29579-7-brett.creeley@amd.com Signed-off-by: Alex Williamson --- include/linux/pds/pds_adminq.h | 178 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 178 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h index 9c79b3c8fc47..4b4e9a98b37b 100644 --- a/include/linux/pds/pds_adminq.h +++ b/include/linux/pds/pds_adminq.h @@ -835,6 +835,13 @@ enum pds_lm_cmd_opcode { PDS_LM_CMD_RESUME = 20, PDS_LM_CMD_SAVE = 21, PDS_LM_CMD_RESTORE = 22, + + /* Dirty page tracking commands */ + PDS_LM_CMD_DIRTY_STATUS = 32, + PDS_LM_CMD_DIRTY_ENABLE = 33, + PDS_LM_CMD_DIRTY_DISABLE = 34, + PDS_LM_CMD_DIRTY_READ_SEQ = 35, + PDS_LM_CMD_DIRTY_WRITE_ACK = 36, }; /** @@ -992,6 +999,172 @@ enum pds_lm_host_vf_status { PDS_LM_STA_MAX, }; +/** + * struct pds_lm_dirty_region_info - Memory region info for STATUS and ENABLE + * @dma_base: Base address of the DMA-contiguous memory region + * @page_count: Number of pages in the memory region + * @page_size_log2: Log2 page size in the memory region + * @rsvd: Word boundary padding + */ +struct pds_lm_dirty_region_info { + __le64 dma_base; + __le32 page_count; + u8 page_size_log2; + u8 rsvd[3]; +}; + +/** + * struct pds_lm_dirty_status_cmd - DIRTY_STATUS command + * @opcode: Opcode PDS_LM_CMD_DIRTY_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @max_regions: Capacity of the region info buffer + * @rsvd2: Word boundary padding + * @regions_dma: DMA address of the region info buffer + * + * The minimum of max_regions (from the command) and num_regions (from the + * completion) of struct pds_lm_dirty_region_info will be written to + * regions_dma. + * + * The max_regions may be zero, in which case regions_dma is ignored. In that + * case, the completion will only report the maximum number of regions + * supported by the device, and the number of regions currently enabled. + */ +struct pds_lm_dirty_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 max_regions; + u8 rsvd2[3]; + __le64 regions_dma; +} __packed; + +/** + * enum pds_lm_dirty_bmp_type - Type of dirty page bitmap + * @PDS_LM_DIRTY_BMP_TYPE_NONE: No bitmap / disabled + * @PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK: Seq/Ack bitmap representation + */ +enum pds_lm_dirty_bmp_type { + PDS_LM_DIRTY_BMP_TYPE_NONE = 0, + PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK = 1, +}; + +/** + * struct pds_lm_dirty_status_comp - STATUS command completion + * @status: Status of the command (enum pds_core_status_code) + * @rsvd: Word boundary padding + * @comp_index: Index in the desc ring for which this is the completion + * @max_regions: Maximum number of regions supported by the device + * @num_regions: Number of regions currently enabled + * @bmp_type: Type of dirty bitmap representation + * @rsvd2: Word boundary padding + * @bmp_type_mask: Mask of supported bitmap types, bit index per type + * @rsvd3: Word boundary padding + * @color: Color bit + * + * This completion descriptor is used for STATUS, ENABLE, and DISABLE. + */ +struct pds_lm_dirty_status_comp { + u8 status; + u8 rsvd; + __le16 comp_index; + u8 max_regions; + u8 num_regions; + u8 bmp_type; + u8 rsvd2; + __le32 bmp_type_mask; + u8 rsvd3[3]; + u8 color; +}; + +/** + * struct pds_lm_dirty_enable_cmd - DIRTY_ENABLE command + * @opcode: Opcode PDS_LM_CMD_DIRTY_ENABLE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @bmp_type: Type of dirty bitmap representation + * @num_regions: Number of entries in the region info buffer + * @rsvd2: Word boundary padding + * @regions_dma: DMA address of the region info buffer + * + * The num_regions must be nonzero, and less than or equal to the maximum + * number of regions supported by the device. + * + * The memory regions should not overlap. + * + * The information should be initialized by the driver. The device may modify + * the information on successful completion, such as by size-aligning the + * number of pages in a region. + * + * The modified number of pages will be greater than or equal to the page count + * given in the enable command, and at least as coarsly aligned as the given + * value. For example, the count might be aligned to a multiple of 64, but + * if the value is already a multiple of 128 or higher, it will not change. + * If the driver requires its own minimum alignment of the number of pages, the + * driver should account for that already in the region info of this command. + * + * This command uses struct pds_lm_dirty_status_comp for its completion. + */ +struct pds_lm_dirty_enable_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 bmp_type; + u8 num_regions; + u8 rsvd2[2]; + __le64 regions_dma; +} __packed; + +/** + * struct pds_lm_dirty_disable_cmd - DIRTY_DISABLE command + * @opcode: Opcode PDS_LM_CMD_DIRTY_DISABLE + * @rsvd: Word boundary padding + * @vf_id: VF id + * + * Dirty page tracking will be disabled. This may be called in any state, as + * long as dirty page tracking is supported by the device, to ensure that dirty + * page tracking is disabled. + * + * This command uses struct pds_lm_dirty_status_comp for its completion. On + * success, num_regions will be zero. + */ +struct pds_lm_dirty_disable_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; +}; + +/** + * struct pds_lm_dirty_seq_ack_cmd - DIRTY_READ_SEQ or _WRITE_ACK command + * @opcode: Opcode PDS_LM_CMD_DIRTY_[READ_SEQ|WRITE_ACK] + * @rsvd: Word boundary padding + * @vf_id: VF id + * @off_bytes: Byte offset in the bitmap + * @len_bytes: Number of bytes to transfer + * @num_sge: Number of DMA scatter gather elements + * @rsvd2: Word boundary padding + * @sgl_addr: DMA address of scatter gather list + * + * Read bytes from the SEQ bitmap, or write bytes into the ACK bitmap. + * + * This command treats the entire bitmap as a byte buffer. It does not + * distinguish between guest memory regions. The driver should refer to the + * number of pages in each region, according to PDS_LM_CMD_DIRTY_STATUS, to + * determine the region boundaries in the bitmap. Each region will be + * represented by exactly the number of bits as the page count for that region, + * immediately following the last bit of the previous region. + */ +struct pds_lm_dirty_seq_ack_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + __le32 off_bytes; + __le32 len_bytes; + __le16 num_sge; + u8 rsvd2[2]; + __le64 sgl_addr; +} __packed; + /** * struct pds_lm_host_vf_status_cmd - HOST_VF_STATUS command * @opcode: Opcode PDS_LM_CMD_HOST_VF_STATUS @@ -1039,6 +1212,10 @@ union pds_core_adminq_cmd { struct pds_lm_save_cmd lm_save; struct pds_lm_restore_cmd lm_restore; struct pds_lm_host_vf_status_cmd lm_host_vf_status; + struct pds_lm_dirty_status_cmd lm_dirty_status; + struct pds_lm_dirty_enable_cmd lm_dirty_enable; + struct pds_lm_dirty_disable_cmd lm_dirty_disable; + struct pds_lm_dirty_seq_ack_cmd lm_dirty_seq_ack; }; union pds_core_adminq_comp { @@ -1065,6 +1242,7 @@ union pds_core_adminq_comp { struct pds_vdpa_vq_reset_comp vdpa_vq_reset; struct pds_lm_state_size_comp lm_state_size; + struct pds_lm_dirty_status_comp lm_dirty_status; }; #ifndef __CHECKER__ -- cgit v1.2.3 From ddeea494a16f32522bce16ee65f191d05d4b8282 Mon Sep 17 00:00:00 2001 From: Sven Schnelle Date: Wed, 16 Aug 2023 17:49:26 +0200 Subject: tracing/synthetic: Use union instead of casts The current code uses a lot of casts to access the fields member in struct synth_trace_events with different sizes. This makes the code hard to read, and had already introduced an endianness bug. Use a union and struct instead. Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Fixes: 00cf3d672a9dd ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 3930e676436c..1e8bbdb8da90 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -59,6 +59,17 @@ int trace_raw_output_prep(struct trace_iterator *iter, extern __printf(2, 3) void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...); +/* Used to find the offset and length of dynamic fields in trace events */ +struct trace_dynamic_info { +#ifdef CONFIG_CPU_BIG_ENDIAN + u16 offset; + u16 len; +#else + u16 len; + u16 offset; +#endif +}; + /* * The trace entry - the most basic unit of tracing. This is what * is printed in the end as a single line in the trace output, such as: -- cgit v1.2.3 From ed2b9e1b6d82c729b2757b449097f28f108b023b Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 16 Jun 2023 16:07:24 -0700 Subject: srcu,notifier: Remove #ifdefs in favor of SRCU Tiny srcu_usage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit removes two #ifdef directives from include/linux/notifier.h by causing SRCU Tiny to provide a dummy srcu_usage structure and a dummy __SRCU_USAGE_INIT() macro. Suggested-by: Linus Torvalds Cc: Matthias Brugger Cc: "Rafael J. Wysocki" Cc: "Michał Mirosław" Cc: Dmitry Osipenko Cc: Sachin Sant Cc: Joel Fernandes (Google) Signed-off-by: Paul E. McKenney --- include/linux/notifier.h | 11 ----------- include/linux/srcutiny.h | 4 ++++ 2 files changed, 4 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/notifier.h b/include/linux/notifier.h index 86544707236a..45702bdcbceb 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -73,9 +73,7 @@ struct raw_notifier_head { struct srcu_notifier_head { struct mutex mutex; -#ifdef CONFIG_TREE_SRCU struct srcu_usage srcuu; -#endif struct srcu_struct srcu; struct notifier_block __rcu *head; }; @@ -106,7 +104,6 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); #define RAW_NOTIFIER_INIT(name) { \ .head = NULL } -#ifdef CONFIG_TREE_SRCU #define SRCU_NOTIFIER_INIT(name, pcpu) \ { \ .mutex = __MUTEX_INITIALIZER(name.mutex), \ @@ -114,14 +111,6 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); .srcuu = __SRCU_USAGE_INIT(name.srcuu), \ .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \ } -#else -#define SRCU_NOTIFIER_INIT(name, pcpu) \ - { \ - .mutex = __MUTEX_INITIALIZER(name.mutex), \ - .head = NULL, \ - .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \ - } -#endif #define ATOMIC_NOTIFIER_HEAD(name) \ struct atomic_notifier_head name = \ diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h index ebd72491af99..447133171d95 100644 --- a/include/linux/srcutiny.h +++ b/include/linux/srcutiny.h @@ -48,6 +48,10 @@ void srcu_drive_gp(struct work_struct *wp); #define DEFINE_STATIC_SRCU(name) \ static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name) +// Dummy structure for srcu_notifier_head. +struct srcu_usage { }; +#define __SRCU_USAGE_INIT(name) { } + void synchronize_srcu(struct srcu_struct *ssp); /* -- cgit v1.2.3 From efd04f8a8b45b8b98704b5860e363bab239b8bae Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Thu, 6 Jul 2023 10:28:48 +0000 Subject: rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls When the objects managed by rculist_nulls are allocated with SLAB_TYPESAFE_BY_RCU, old readers may still hold references to an object even though it is just now being added, which means the modification of ->next is visible to readers. This patch therefore uses WRITE_ONCE() for assignments to ->next. Signed-off-by: Alan Huang Signed-off-by: Paul E. McKenney Reviewed-by: Joel Fernandes (Google) --- include/linux/rculist_nulls.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h index ba4c00dd8005..89186c499dd4 100644 --- a/include/linux/rculist_nulls.h +++ b/include/linux/rculist_nulls.h @@ -101,7 +101,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, { struct hlist_nulls_node *first = h->first; - n->next = first; + WRITE_ONCE(n->next, first); WRITE_ONCE(n->pprev, &h->first); rcu_assign_pointer(hlist_nulls_first_rcu(h), n); if (!is_a_nulls(first)) @@ -137,7 +137,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, last = i; if (last) { - n->next = last->next; + WRITE_ONCE(n->next, last->next); n->pprev = &last->next; rcu_assign_pointer(hlist_nulls_next_rcu(last), n); } else { -- cgit v1.2.3 From d890cfc25fe9421ffdff3a9ea678172addb36762 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Mon, 7 Aug 2023 09:31:06 +0200 Subject: rtc: ds2404: Convert to GPIO descriptors This converts the DS2404 to use GPIO descriptors instead of hard-coded global GPIO numbers. The platform data can be deleted because there are no in-tree users and it only contained GPIO numbers which are now passed using descriptor tables (or device tree or ACPI). The driver was rewritten to use a state container for the device driver state (struct ds2404 *chip) and pass that around instead of using a global singleton storage for the GPIO handles. When declaring GPIO descriptor tables or other hardware descriptions for the RTC driver, implementers should take care to flag the RESET line as active low, such as by using the GPIOD_ACTIVE_LOW flag in the descriptor table. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20230807-descriptors-rtc-v1-1-ce0f9187576e@linaro.org Signed-off-by: Alexandre Belloni --- include/linux/platform_data/rtc-ds2404.h | 20 -------------------- 1 file changed, 20 deletions(-) delete mode 100644 include/linux/platform_data/rtc-ds2404.h (limited to 'include/linux') diff --git a/include/linux/platform_data/rtc-ds2404.h b/include/linux/platform_data/rtc-ds2404.h deleted file mode 100644 index 22c53825528f..000000000000 --- a/include/linux/platform_data/rtc-ds2404.h +++ /dev/null @@ -1,20 +0,0 @@ -/* - * ds2404.h - platform data structure for the DS2404 RTC. - * - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (C) 2012 Sven Schnelle - */ - -#ifndef __LINUX_DS2404_H -#define __LINUX_DS2404_H - -struct ds2404_platform_data { - - unsigned int gpio_rst; - unsigned int gpio_clk; - unsigned int gpio_dq; -}; -#endif -- cgit v1.2.3 From afb48153220d35f330d0d979792920a31f7d9a81 Mon Sep 17 00:00:00 2001 From: Jean-Jacques Hiblot Date: Fri, 28 Jul 2023 17:37:28 +0200 Subject: leds: Provide devm_of_led_get_optional() Add an optional variant of devm_of_led_get(). It behaves the same as devm_of_led_get() except where the LED doesn't exist. In this case, instead of returning -ENOENT, the function returns NULL. Signed-off-by: Jean-Jacques Hiblot Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20230728153731.3742339-2-jjhiblot@traphandler.com Signed-off-by: Lee Jones --- include/linux/leds.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/leds.h b/include/linux/leds.h index 7d428100b42b..8740b4e47f88 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -313,6 +313,8 @@ extern struct led_classdev *of_led_get(struct device_node *np, int index); extern void led_put(struct led_classdev *led_cdev); struct led_classdev *__must_check devm_of_led_get(struct device *dev, int index); +struct led_classdev *__must_check devm_of_led_get_optional(struct device *dev, + int index); /** * led_blink_set - set blinking with software fallback -- cgit v1.2.3 From c7d80059b086c4986cd994a1973ec7a5d75f8eea Mon Sep 17 00:00:00 2001 From: Jean-Jacques Hiblot Date: Fri, 28 Jul 2023 17:37:29 +0200 Subject: leds: class: Store the color index in struct led_classdev Store the color of the LED so that it is not lost after the LED's name has been composed. This color information can then be exposed to the user space or used by the LED consumer. Signed-off-by: Jean-Jacques Hiblot Link: https://lore.kernel.org/r/20230728153731.3742339-3-jjhiblot@traphandler.com Signed-off-by: Lee Jones --- include/linux/leds.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/leds.h b/include/linux/leds.h index 8740b4e47f88..aa16dc2a8230 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -100,6 +100,7 @@ struct led_classdev { const char *name; unsigned int brightness; unsigned int max_brightness; + unsigned int color; int flags; /* Lower 16 bits reflect status */ -- cgit v1.2.3 From a1342c8027288e345cc5fd16c6800f9d4eb788ed Mon Sep 17 00:00:00 2001 From: David Matlack Date: Fri, 11 Aug 2023 04:51:14 +0000 Subject: KVM: Rename kvm_arch_flush_remote_tlb() to kvm_arch_flush_remote_tlbs() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rename kvm_arch_flush_remote_tlb() and the associated macro __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB to kvm_arch_flush_remote_tlbs() and __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS respectively. Making the name plural matches kvm_flush_remote_tlbs() and makes it more clear that this function can affect more than one remote TLB. No functional change intended. Signed-off-by: David Matlack Signed-off-by: Raghavendra Rao Ananta Reviewed-by: Gavin Shan Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Shaoqin Huang Acked-by: Sean Christopherson Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20230811045127.3308641-2-rananta@google.com --- include/linux/kvm_host.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9d3ac7720da9..e3f968b38ae9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1479,8 +1479,8 @@ static inline void kvm_arch_free_vm(struct kvm *kvm) } #endif -#ifndef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB -static inline int kvm_arch_flush_remote_tlb(struct kvm *kvm) +#ifndef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS +static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm) { return -ENOTSUPP; } -- cgit v1.2.3 From cfb0c08e80120928dda1e951718be135abd49bae Mon Sep 17 00:00:00 2001 From: Raghavendra Rao Ananta Date: Fri, 11 Aug 2023 04:51:15 +0000 Subject: KVM: Declare kvm_arch_flush_remote_tlbs() globally There's no reason for the architectures to declare kvm_arch_flush_remote_tlbs() in their own headers. Hence to avoid this duplication, make the declaration global, leaving the architectures to define only __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS as needed. Signed-off-by: Raghavendra Rao Ananta Reviewed-by: Shaoqin Huang Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20230811045127.3308641-3-rananta@google.com --- include/linux/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e3f968b38ae9..ade5d4500c2c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1484,6 +1484,8 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm) { return -ENOTSUPP; } +#else +int kvm_arch_flush_remote_tlbs(struct kvm *kvm); #endif #ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA -- cgit v1.2.3 From d4788996051e3c07fadc6d9b214073fcf78810a8 Mon Sep 17 00:00:00 2001 From: David Matlack Date: Fri, 11 Aug 2023 04:51:18 +0000 Subject: KVM: Allow range-based TLB invalidation from common code Make kvm_flush_remote_tlbs_range() visible in common code and create a default implementation that just invalidates the whole TLB. This paves the way for several future features/cleanups: - Introduction of range-based TLBI on ARM. - Eliminating kvm_arch_flush_remote_tlbs_memslot() - Moving the KVM/x86 TDP MMU to common code. No functional change intended. Signed-off-by: David Matlack Signed-off-by: Raghavendra Rao Ananta Reviewed-by: Gavin Shan Reviewed-by: Shaoqin Huang Reviewed-by: Anup Patel Acked-by: Sean Christopherson Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20230811045127.3308641-6-rananta@google.com --- include/linux/kvm_host.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ade5d4500c2c..89d2614e4b7a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1359,6 +1359,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode); void kvm_flush_remote_tlbs(struct kvm *kvm); +void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages); #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); @@ -1488,6 +1489,16 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm) int kvm_arch_flush_remote_tlbs(struct kvm *kvm); #endif +#ifndef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE +static inline int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, + gfn_t gfn, u64 nr_pages) +{ + return -EOPNOTSUPP; +} +#else +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages); +#endif + #ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA void kvm_arch_register_noncoherent_dma(struct kvm *kvm); void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm); -- cgit v1.2.3 From 619b5072443c05cf18c31b2c0320cdb42396d411 Mon Sep 17 00:00:00 2001 From: David Matlack Date: Fri, 11 Aug 2023 04:51:19 +0000 Subject: KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a range-based TLB invalidation where the range is defined by the memslot. Now that kvm_flush_remote_tlbs_range() can be called from common code we can just use that and drop a bunch of duplicate code from the arch directories. Note this adds a lockdep assertion for slots_lock being held when calling kvm_flush_remote_tlbs_memslot(), which was previously only asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(), but they all hold the slots_lock, so the lockdep assertion continues to hold true. Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating kvm_flush_remote_tlbs_memslot(), since it is no longer necessary. Signed-off-by: David Matlack Signed-off-by: Raghavendra Rao Ananta Reviewed-by: Gavin Shan Reviewed-by: Shaoqin Huang Acked-by: Anup Patel Acked-by: Sean Christopherson Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20230811045127.3308641-7-rananta@google.com --- include/linux/kvm_host.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 89d2614e4b7a..394db2ce11e2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1360,6 +1360,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode); void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages); +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, + const struct kvm_memory_slot *memslot); #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); @@ -1388,10 +1390,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, unsigned long mask); void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot); -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm, - const struct kvm_memory_slot *memslot); -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log); int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, int *is_dirty, struct kvm_memory_slot **memslot); -- cgit v1.2.3 From 6a4fec4f6d30a325a1b27be70729145484e6fe9f Mon Sep 17 00:00:00 2001 From: Liao Chang Date: Tue, 15 Aug 2023 09:40:17 +0800 Subject: cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases. The cpufreq framework used to use the zero of return value to reflect the cppc_cpufreq_get_rate() had failed to get current frequecy and treat all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a negative integer in error case, so it is better to convert the value to zero as the return value of cppc_cpufreq_get_rate(). Signed-off-by: Liao Chang Signed-off-by: Viresh Kumar --- include/linux/cpufreq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 172ff51c1b2a..99da27466b8f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -370,7 +370,7 @@ struct cpufreq_driver { int (*target_intermediate)(struct cpufreq_policy *policy, unsigned int index); - /* should be defined, if possible */ + /* should be defined, if possible, return 0 on error */ unsigned int (*get)(unsigned int cpu); /* Called to update policy limits on firmware notifications. */ -- cgit v1.2.3 From 9a99a996d1ec53819580eba93d60b38d72befe3d Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 17 Aug 2023 11:23:32 +0200 Subject: thermal: core: Introduce thermal_zone_device_exec() Introduce a new helper function, thermal_zone_device_exec(), that can be used by drivers to run a given callback routine under the zone lock. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 87837094d549..4d40bfa14cdd 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -323,6 +323,10 @@ int thermal_zone_unbind_cooling_device(struct thermal_zone_device *, int, struct thermal_cooling_device *); void thermal_zone_device_update(struct thermal_zone_device *, enum thermal_notify_event); +void thermal_zone_device_exec(struct thermal_zone_device *tz, + void (*cb)(struct thermal_zone_device *, + unsigned long), + unsigned long data); struct thermal_cooling_device *thermal_cooling_device_register(const char *, void *, const struct thermal_cooling_device_ops *); -- cgit v1.2.3 From cba440fab30106c9ebd30dbbb64d4fb7b997a8e8 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 7 Aug 2023 20:04:47 +0200 Subject: thermal: core: Add priv pointer to struct thermal_trip Add a new field called priv to struct thermal_trip to allow thermal drivers to store pointers to their local data associated with trip points. Signed-off-by: Rafael J. Wysocki Acked-by: Daniel Lezcano --- include/linux/thermal.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 4d40bfa14cdd..f1b43d8a7bd3 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -81,11 +81,13 @@ struct thermal_zone_device_ops { * @temperature: temperature value in miliCelsius * @hysteresis: relative hysteresis in miliCelsius * @type: trip point type + * @priv: pointer to driver data associated with this trip */ struct thermal_trip { int temperature; int hysteresis; enum thermal_trip_type type; + void *priv; }; struct thermal_cooling_device_ops { -- cgit v1.2.3 From 96b8b4365db44af0a94b91a805cd5beed45d24fa Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 7 Aug 2023 20:11:07 +0200 Subject: thermal: core: Rework and rename __for_each_thermal_trip() Rework the currently unused __for_each_thermal_trip() to pass original pointers to struct thermal_trip objects to the callback, so it can be used for updating trip data (e.g. temperatures), rename it to for_each_thermal_trip() and make it available to modular drivers. Suggested-by: Daniel Lezcano Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index f1b43d8a7bd3..f1d73642177f 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -289,6 +289,9 @@ int thermal_zone_get_trip(struct thermal_zone_device *tz, int trip_id, int thermal_zone_set_trip(struct thermal_zone_device *tz, int trip_id, const struct thermal_trip *trip); +int for_each_thermal_trip(struct thermal_zone_device *tz, + int (*cb)(struct thermal_trip *, void *), + void *data); int thermal_zone_get_num_trips(struct thermal_zone_device *tz); int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp); -- cgit v1.2.3 From 12a95123bfe1dd1a6020a35f5e67a560591bb02a Mon Sep 17 00:00:00 2001 From: Lucas Tanure Date: Fri, 4 Aug 2023 11:45:57 +0100 Subject: soundwire: bus: Allow SoundWire peripherals to register IRQ handlers Currently the in-band alerts for SoundWire peripherals can only be communicated to the driver through the interrupt_callback function. This however is slightly inconvenient for devices that wish to share IRQ handling code between SoundWire and I2C/SPI, the later would normally register an IRQ handler with the IRQ subsystem. However there is no reason the SoundWire in-band IRQs can not also be communicated as an actual IRQ to the driver. Add support for SoundWire peripherals to register a normal IRQ handler to receive SoundWire in-band alerts, allowing code to be shared across control buses. Note that we allow users to use both the interrupt_callback and the IRQ handler, this is useful for devices which must clear additional chip specific SoundWire registers that are not a part of the normal IRQ flow, or the SoundWire specification. Signed-off-by: Lucas Tanure Reviewed-by: Pierre-Louis Bossart Acked-by: Bard Liao Acked-by: Vinod Koul Signed-off-by: Charles Keepax Link: https://lore.kernel.org/r/20230804104602.395892-2-ckeepax@opensource.cirrus.com Signed-off-by: Lee Jones --- include/linux/soundwire/sdw.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index f523ceabd059..8923387a7405 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -6,6 +6,8 @@ #include #include +#include +#include #include #include @@ -370,6 +372,7 @@ struct sdw_dpn_prop { * @clock_reg_supported: the Peripheral implements the clock base and scale * registers introduced with the SoundWire 1.2 specification. SDCA devices * do not need to set this boolean property as the registers are required. + * @use_domain_irq: call actual IRQ handler on slave, as well as callback */ struct sdw_slave_prop { u32 mipi_revision; @@ -394,6 +397,7 @@ struct sdw_slave_prop { u8 scp_int1_mask; u32 quirks; bool clock_reg_supported; + bool use_domain_irq; }; #define SDW_SLAVE_QUIRKS_INVALID_INITIAL_PARITY BIT(0) @@ -641,6 +645,7 @@ struct sdw_slave_ops { * struct sdw_slave - SoundWire Slave * @id: MIPI device ID * @dev: Linux device + * @irq: IRQ number * @status: Status reported by the Slave * @bus: Bus handle * @prop: Slave properties @@ -670,6 +675,7 @@ struct sdw_slave_ops { struct sdw_slave { struct sdw_slave_id id; struct device dev; + int irq; enum sdw_slave_status status; struct sdw_bus *bus; struct sdw_slave_prop prop; @@ -885,6 +891,7 @@ struct sdw_master_ops { * is used to compute and program bus bandwidth, clock, frame shape, * transport and port parameters * @debugfs: Bus debugfs + * @domain: IRQ domain * @defer_msg: Defer message * @clk_stop_timeout: Clock stop timeout computed * @bank_switch_timeout: Bank switch timeout computed @@ -920,6 +927,8 @@ struct sdw_bus { #ifdef CONFIG_DEBUG_FS struct dentry *debugfs; #endif + struct irq_chip irq_chip; + struct irq_domain *domain; struct sdw_defer defer_msg; unsigned int clk_stop_timeout; u32 bank_switch_timeout; -- cgit v1.2.3 From ace6d14481386ec6c1b63cc2b24c71433a583dc2 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Fri, 4 Aug 2023 11:45:59 +0100 Subject: mfd: cs42l43: Add support for cs42l43 core driver The CS42L43 is an audio CODEC with integrated MIPI SoundWire interface (Version 1.2.1 compliant), I2C, SPI, and I2S/TDM interfaces designed for portable applications. It provides a high dynamic range, stereo DAC for headphone output, two integrated Class D amplifiers for loudspeakers, and two ADCs for wired headset microphone input or stereo line input. PDM inputs are provided for digital microphones. The MFD component registers and initialises the device and provides PM/system power management. Signed-off-by: Charles Keepax Link: https://lore.kernel.org/r/20230804104602.395892-4-ckeepax@opensource.cirrus.com Signed-off-by: Lee Jones --- include/linux/mfd/cs42l43-regs.h | 1184 ++++++++++++++++++++++++++++++++++++++ include/linux/mfd/cs42l43.h | 102 ++++ 2 files changed, 1286 insertions(+) create mode 100644 include/linux/mfd/cs42l43-regs.h create mode 100644 include/linux/mfd/cs42l43.h (limited to 'include/linux') diff --git a/include/linux/mfd/cs42l43-regs.h b/include/linux/mfd/cs42l43-regs.h new file mode 100644 index 000000000000..c39a49269cb7 --- /dev/null +++ b/include/linux/mfd/cs42l43-regs.h @@ -0,0 +1,1184 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * cs42l43 register definitions + * + * Copyright (c) 2022-2023 Cirrus Logic, Inc. and + * Cirrus Logic International Semiconductor Ltd. + */ + +#ifndef CS42L43_CORE_REGS_H +#define CS42L43_CORE_REGS_H + +/* Registers */ +#define CS42L43_GEN_INT_STAT_1 0x000000C0 +#define CS42L43_GEN_INT_MASK_1 0x000000C1 +#define CS42L43_DEVID 0x00003000 +#define CS42L43_REVID 0x00003004 +#define CS42L43_RELID 0x0000300C +#define CS42L43_SFT_RESET 0x00003020 +#define CS42L43_DRV_CTRL1 0x00006004 +#define CS42L43_DRV_CTRL3 0x0000600C +#define CS42L43_DRV_CTRL4 0x00006010 +#define CS42L43_DRV_CTRL_5 0x00006014 +#define CS42L43_GPIO_CTRL1 0x00006034 +#define CS42L43_GPIO_CTRL2 0x00006038 +#define CS42L43_GPIO_STS 0x0000603C +#define CS42L43_GPIO_FN_SEL 0x00006040 +#define CS42L43_MCLK_SRC_SEL 0x00007004 +#define CS42L43_CCM_BLK_CLK_CONTROL 0x00007010 +#define CS42L43_SAMPLE_RATE1 0x00007014 +#define CS42L43_SAMPLE_RATE2 0x00007018 +#define CS42L43_SAMPLE_RATE3 0x0000701C +#define CS42L43_SAMPLE_RATE4 0x00007020 +#define CS42L43_PLL_CONTROL 0x00007034 +#define CS42L43_FS_SELECT1 0x00007038 +#define CS42L43_FS_SELECT2 0x0000703C +#define CS42L43_FS_SELECT3 0x00007040 +#define CS42L43_FS_SELECT4 0x00007044 +#define CS42L43_PDM_CONTROL 0x0000704C +#define CS42L43_ASP_CLK_CONFIG1 0x00007058 +#define CS42L43_ASP_CLK_CONFIG2 0x0000705C +#define CS42L43_OSC_DIV_SEL 0x00007068 +#define CS42L43_ADC_B_CTRL1 0x00008000 +#define CS42L43_ADC_B_CTRL2 0x00008004 +#define CS42L43_DECIM_HPF_WNF_CTRL1 0x0000803C +#define CS42L43_DECIM_HPF_WNF_CTRL2 0x00008040 +#define CS42L43_DECIM_HPF_WNF_CTRL3 0x00008044 +#define CS42L43_DECIM_HPF_WNF_CTRL4 0x00008048 +#define CS42L43_DMIC_PDM_CTRL 0x0000804C +#define CS42L43_DECIM_VOL_CTRL_CH1_CH2 0x00008050 +#define CS42L43_DECIM_VOL_CTRL_CH3_CH4 0x00008054 +#define CS42L43_DECIM_VOL_CTRL_UPDATE 0x00008058 +#define CS42L43_INTP_VOLUME_CTRL1 0x00009008 +#define CS42L43_INTP_VOLUME_CTRL2 0x0000900C +#define CS42L43_AMP1_2_VOL_RAMP 0x00009010 +#define CS42L43_ASP_CTRL 0x0000A000 +#define CS42L43_ASP_FSYNC_CTRL1 0x0000A004 +#define CS42L43_ASP_FSYNC_CTRL2 0x0000A008 +#define CS42L43_ASP_FSYNC_CTRL3 0x0000A00C +#define CS42L43_ASP_FSYNC_CTRL4 0x0000A010 +#define CS42L43_ASP_DATA_CTRL 0x0000A018 +#define CS42L43_ASP_RX_EN 0x0000A020 +#define CS42L43_ASP_TX_EN 0x0000A024 +#define CS42L43_ASP_RX_CH1_CTRL 0x0000A028 +#define CS42L43_ASP_RX_CH2_CTRL 0x0000A02C +#define CS42L43_ASP_RX_CH3_CTRL 0x0000A030 +#define CS42L43_ASP_RX_CH4_CTRL 0x0000A034 +#define CS42L43_ASP_RX_CH5_CTRL 0x0000A038 +#define CS42L43_ASP_RX_CH6_CTRL 0x0000A03C +#define CS42L43_ASP_TX_CH1_CTRL 0x0000A068 +#define CS42L43_ASP_TX_CH2_CTRL 0x0000A06C +#define CS42L43_ASP_TX_CH3_CTRL 0x0000A070 +#define CS42L43_ASP_TX_CH4_CTRL 0x0000A074 +#define CS42L43_ASP_TX_CH5_CTRL 0x0000A078 +#define CS42L43_ASP_TX_CH6_CTRL 0x0000A07C +#define CS42L43_OTP_REVISION_ID 0x0000B02C +#define CS42L43_ASPTX1_INPUT 0x0000C200 +#define CS42L43_ASPTX2_INPUT 0x0000C210 +#define CS42L43_ASPTX3_INPUT 0x0000C220 +#define CS42L43_ASPTX4_INPUT 0x0000C230 +#define CS42L43_ASPTX5_INPUT 0x0000C240 +#define CS42L43_ASPTX6_INPUT 0x0000C250 +#define CS42L43_SWIRE_DP1_CH1_INPUT 0x0000C280 +#define CS42L43_SWIRE_DP1_CH2_INPUT 0x0000C290 +#define CS42L43_SWIRE_DP1_CH3_INPUT 0x0000C2A0 +#define CS42L43_SWIRE_DP1_CH4_INPUT 0x0000C2B0 +#define CS42L43_SWIRE_DP2_CH1_INPUT 0x0000C2C0 +#define CS42L43_SWIRE_DP2_CH2_INPUT 0x0000C2D0 +#define CS42L43_SWIRE_DP3_CH1_INPUT 0x0000C2E0 +#define CS42L43_SWIRE_DP3_CH2_INPUT 0x0000C2F0 +#define CS42L43_SWIRE_DP4_CH1_INPUT 0x0000C300 +#define CS42L43_SWIRE_DP4_CH2_INPUT 0x0000C310 +#define CS42L43_ASRC_INT1_INPUT1 0x0000C400 +#define CS42L43_ASRC_INT2_INPUT1 0x0000C410 +#define CS42L43_ASRC_INT3_INPUT1 0x0000C420 +#define CS42L43_ASRC_INT4_INPUT1 0x0000C430 +#define CS42L43_ASRC_DEC1_INPUT1 0x0000C440 +#define CS42L43_ASRC_DEC2_INPUT1 0x0000C450 +#define CS42L43_ASRC_DEC3_INPUT1 0x0000C460 +#define CS42L43_ASRC_DEC4_INPUT1 0x0000C470 +#define CS42L43_ISRC1INT1_INPUT1 0x0000C500 +#define CS42L43_ISRC1INT2_INPUT1 0x0000C510 +#define CS42L43_ISRC1DEC1_INPUT1 0x0000C520 +#define CS42L43_ISRC1DEC2_INPUT1 0x0000C530 +#define CS42L43_ISRC2INT1_INPUT1 0x0000C540 +#define CS42L43_ISRC2INT2_INPUT1 0x0000C550 +#define CS42L43_ISRC2DEC1_INPUT1 0x0000C560 +#define CS42L43_ISRC2DEC2_INPUT1 0x0000C570 +#define CS42L43_EQ1MIX_INPUT1 0x0000C580 +#define CS42L43_EQ1MIX_INPUT2 0x0000C584 +#define CS42L43_EQ1MIX_INPUT3 0x0000C588 +#define CS42L43_EQ1MIX_INPUT4 0x0000C58C +#define CS42L43_EQ2MIX_INPUT1 0x0000C590 +#define CS42L43_EQ2MIX_INPUT2 0x0000C594 +#define CS42L43_EQ2MIX_INPUT3 0x0000C598 +#define CS42L43_EQ2MIX_INPUT4 0x0000C59C +#define CS42L43_SPDIF1_INPUT1 0x0000C600 +#define CS42L43_SPDIF2_INPUT1 0x0000C610 +#define CS42L43_AMP1MIX_INPUT1 0x0000C620 +#define CS42L43_AMP1MIX_INPUT2 0x0000C624 +#define CS42L43_AMP1MIX_INPUT3 0x0000C628 +#define CS42L43_AMP1MIX_INPUT4 0x0000C62C +#define CS42L43_AMP2MIX_INPUT1 0x0000C630 +#define CS42L43_AMP2MIX_INPUT2 0x0000C634 +#define CS42L43_AMP2MIX_INPUT3 0x0000C638 +#define CS42L43_AMP2MIX_INPUT4 0x0000C63C +#define CS42L43_AMP3MIX_INPUT1 0x0000C640 +#define CS42L43_AMP3MIX_INPUT2 0x0000C644 +#define CS42L43_AMP3MIX_INPUT3 0x0000C648 +#define CS42L43_AMP3MIX_INPUT4 0x0000C64C +#define CS42L43_AMP4MIX_INPUT1 0x0000C650 +#define CS42L43_AMP4MIX_INPUT2 0x0000C654 +#define CS42L43_AMP4MIX_INPUT3 0x0000C658 +#define CS42L43_AMP4MIX_INPUT4 0x0000C65C +#define CS42L43_ASRC_INT_ENABLES 0x0000E000 +#define CS42L43_ASRC_DEC_ENABLES 0x0000E004 +#define CS42L43_PDNCNTL 0x00010000 +#define CS42L43_RINGSENSE_DEB_CTRL 0x0001001C +#define CS42L43_TIPSENSE_DEB_CTRL 0x00010020 +#define CS42L43_TIP_RING_SENSE_INTERRUPT_STATUS 0x00010028 +#define CS42L43_HS2 0x00010040 +#define CS42L43_HS_STAT 0x00010048 +#define CS42L43_MCU_SW_INTERRUPT 0x00010094 +#define CS42L43_STEREO_MIC_CTRL 0x000100A4 +#define CS42L43_STEREO_MIC_CLAMP_CTRL 0x000100C4 +#define CS42L43_BLOCK_EN2 0x00010104 +#define CS42L43_BLOCK_EN3 0x00010108 +#define CS42L43_BLOCK_EN4 0x0001010C +#define CS42L43_BLOCK_EN5 0x00010110 +#define CS42L43_BLOCK_EN6 0x00010114 +#define CS42L43_BLOCK_EN7 0x00010118 +#define CS42L43_BLOCK_EN8 0x0001011C +#define CS42L43_BLOCK_EN9 0x00010120 +#define CS42L43_BLOCK_EN10 0x00010124 +#define CS42L43_BLOCK_EN11 0x00010128 +#define CS42L43_TONE_CH1_CTRL 0x00010134 +#define CS42L43_TONE_CH2_CTRL 0x00010138 +#define CS42L43_MIC_DETECT_CONTROL_1 0x00011074 +#define CS42L43_DETECT_STATUS_1 0x0001107C +#define CS42L43_HS_BIAS_SENSE_AND_CLAMP_AUTOCONTROL 0x00011090 +#define CS42L43_MIC_DETECT_CONTROL_ANDROID 0x000110B0 +#define CS42L43_ISRC1_CTRL 0x00012004 +#define CS42L43_ISRC2_CTRL 0x00013004 +#define CS42L43_CTRL_REG 0x00014000 +#define CS42L43_FDIV_FRAC 0x00014004 +#define CS42L43_CAL_RATIO 0x00014008 +#define CS42L43_SPI_CLK_CONFIG1 0x00016004 +#define CS42L43_SPI_CONFIG1 0x00016010 +#define CS42L43_SPI_CONFIG2 0x00016014 +#define CS42L43_SPI_CONFIG3 0x00016018 +#define CS42L43_SPI_CONFIG4 0x00016024 +#define CS42L43_SPI_STATUS1 0x00016100 +#define CS42L43_SPI_STATUS2 0x00016104 +#define CS42L43_TRAN_CONFIG1 0x00016200 +#define CS42L43_TRAN_CONFIG2 0x00016204 +#define CS42L43_TRAN_CONFIG3 0x00016208 +#define CS42L43_TRAN_CONFIG4 0x0001620C +#define CS42L43_TRAN_CONFIG5 0x00016220 +#define CS42L43_TRAN_CONFIG6 0x00016224 +#define CS42L43_TRAN_CONFIG7 0x00016228 +#define CS42L43_TRAN_CONFIG8 0x0001622C +#define CS42L43_TRAN_STATUS1 0x00016300 +#define CS42L43_TRAN_STATUS2 0x00016304 +#define CS42L43_TRAN_STATUS3 0x00016308 +#define CS42L43_TX_DATA 0x00016400 +#define CS42L43_RX_DATA 0x00016600 +#define CS42L43_DACCNFG1 0x00017000 +#define CS42L43_DACCNFG2 0x00017004 +#define CS42L43_HPPATHVOL 0x0001700C +#define CS42L43_PGAVOL 0x00017014 +#define CS42L43_LOADDETRESULTS 0x00017018 +#define CS42L43_LOADDETENA 0x00017024 +#define CS42L43_CTRL 0x00017028 +#define CS42L43_COEFF_DATA_IN0 0x00018000 +#define CS42L43_COEFF_RD_WR0 0x00018008 +#define CS42L43_INIT_DONE0 0x00018010 +#define CS42L43_START_EQZ0 0x00018014 +#define CS42L43_MUTE_EQ_IN0 0x0001801C +#define CS42L43_DECIM_INT 0x0001B000 +#define CS42L43_EQ_INT 0x0001B004 +#define CS42L43_ASP_INT 0x0001B008 +#define CS42L43_PLL_INT 0x0001B00C +#define CS42L43_SOFT_INT 0x0001B010 +#define CS42L43_SWIRE_INT 0x0001B014 +#define CS42L43_MSM_INT 0x0001B018 +#define CS42L43_ACC_DET_INT 0x0001B01C +#define CS42L43_I2C_TGT_INT 0x0001B020 +#define CS42L43_SPI_MSTR_INT 0x0001B024 +#define CS42L43_SW_TO_SPI_BRIDGE_INT 0x0001B028 +#define CS42L43_OTP_INT 0x0001B02C +#define CS42L43_CLASS_D_AMP_INT 0x0001B030 +#define CS42L43_GPIO_INT 0x0001B034 +#define CS42L43_ASRC_INT 0x0001B038 +#define CS42L43_HPOUT_INT 0x0001B03C +#define CS42L43_DECIM_MASK 0x0001B0A0 +#define CS42L43_EQ_MIX_MASK 0x0001B0A4 +#define CS42L43_ASP_MASK 0x0001B0A8 +#define CS42L43_PLL_MASK 0x0001B0AC +#define CS42L43_SOFT_MASK 0x0001B0B0 +#define CS42L43_SWIRE_MASK 0x0001B0B4 +#define CS42L43_MSM_MASK 0x0001B0B8 +#define CS42L43_ACC_DET_MASK 0x0001B0BC +#define CS42L43_I2C_TGT_MASK 0x0001B0C0 +#define CS42L43_SPI_MSTR_MASK 0x0001B0C4 +#define CS42L43_SW_TO_SPI_BRIDGE_MASK 0x0001B0C8 +#define CS42L43_OTP_MASK 0x0001B0CC +#define CS42L43_CLASS_D_AMP_MASK 0x0001B0D0 +#define CS42L43_GPIO_INT_MASK 0x0001B0D4 +#define CS42L43_ASRC_MASK 0x0001B0D8 +#define CS42L43_HPOUT_MASK 0x0001B0DC +#define CS42L43_DECIM_INT_SHADOW 0x0001B300 +#define CS42L43_EQ_MIX_INT_SHADOW 0x0001B304 +#define CS42L43_ASP_INT_SHADOW 0x0001B308 +#define CS42L43_PLL_INT_SHADOW 0x0001B30C +#define CS42L43_SOFT_INT_SHADOW 0x0001B310 +#define CS42L43_SWIRE_INT_SHADOW 0x0001B314 +#define CS42L43_MSM_INT_SHADOW 0x0001B318 +#define CS42L43_ACC_DET_INT_SHADOW 0x0001B31C +#define CS42L43_I2C_TGT_INT_SHADOW 0x0001B320 +#define CS42L43_SPI_MSTR_INT_SHADOW 0x0001B324 +#define CS42L43_SW_TO_SPI_BRIDGE_SHADOW 0x0001B328 +#define CS42L43_OTP_INT_SHADOW 0x0001B32C +#define CS42L43_CLASS_D_AMP_INT_SHADOW 0x0001B330 +#define CS42L43_GPIO_SHADOW 0x0001B334 +#define CS42L43_ASRC_SHADOW 0x0001B338 +#define CS42L43_HP_OUT_SHADOW 0x0001B33C +#define CS42L43_BOOT_CONTROL 0x00101000 +#define CS42L43_BLOCK_EN 0x00101008 +#define CS42L43_SHUTTER_CONTROL 0x0010100C +#define CS42L43_MCU_SW_REV 0x00114000 +#define CS42L43_PATCH_START_ADDR 0x00114004 +#define CS42L43_NEED_CONFIGS 0x0011400C +#define CS42L43_BOOT_STATUS 0x0011401C +#define CS42L43_FW_SH_BOOT_CFG_NEED_CONFIGS 0x0011F8F8 +#define CS42L43_FW_MISSION_CTRL_NEED_CONFIGS 0x0011FE00 +#define CS42L43_FW_MISSION_CTRL_HAVE_CONFIGS 0x0011FE04 +#define CS42L43_FW_MISSION_CTRL_MM_CTRL_SELECTION 0x0011FE0C +#define CS42L43_FW_MISSION_CTRL_MM_MCU_CFG_REG 0x0011FE10 +#define CS42L43_MCU_RAM_MAX 0x0011FFFF + +/* CS42L43_DEVID */ +#define CS42L43_DEVID_VAL 0x00042A43 + +/* CS42L43_GEN_INT_STAT_1 */ +#define CS42L43_INT_STAT_GEN1_MASK 0x00000001 +#define CS42L43_INT_STAT_GEN1_SHIFT 0 + +/* CS42L43_SFT_RESET */ +#define CS42L43_SFT_RESET_MASK 0xFF000000 +#define CS42L43_SFT_RESET_SHIFT 24 + +#define CS42L43_SFT_RESET_VAL 0x5A000000 + +/* CS42L43_DRV_CTRL1 */ +#define CS42L43_ASP_DOUT_DRV_MASK 0x00038000 +#define CS42L43_ASP_DOUT_DRV_SHIFT 15 +#define CS42L43_ASP_FSYNC_DRV_MASK 0x00000E00 +#define CS42L43_ASP_FSYNC_DRV_SHIFT 9 +#define CS42L43_ASP_BCLK_DRV_MASK 0x000001C0 +#define CS42L43_ASP_BCLK_DRV_SHIFT 6 + +/* CS42L43_DRV_CTRL3 */ +#define CS42L43_I2C_ADDR_DRV_MASK 0x30000000 +#define CS42L43_I2C_ADDR_DRV_SHIFT 28 +#define CS42L43_I2C_SDA_DRV_MASK 0x0C000000 +#define CS42L43_I2C_SDA_DRV_SHIFT 26 +#define CS42L43_PDMOUT2_CLK_DRV_MASK 0x00E00000 +#define CS42L43_PDMOUT2_CLK_DRV_SHIFT 21 +#define CS42L43_PDMOUT2_DATA_DRV_MASK 0x001C0000 +#define CS42L43_PDMOUT2_DATA_DRV_SHIFT 18 +#define CS42L43_PDMOUT1_CLK_DRV_MASK 0x00038000 +#define CS42L43_PDMOUT1_CLK_DRV_SHIFT 15 +#define CS42L43_PDMOUT1_DATA_DRV_MASK 0x00007000 +#define CS42L43_PDMOUT1_DATA_DRV_SHIFT 12 +#define CS42L43_SPI_MISO_DRV_MASK 0x00000038 +#define CS42L43_SPI_MISO_DRV_SHIFT 3 + +/* CS42L43_DRV_CTRL4 */ +#define CS42L43_GPIO3_DRV_MASK 0x00000E00 +#define CS42L43_GPIO3_DRV_SHIFT 9 +#define CS42L43_GPIO2_DRV_MASK 0x000001C0 +#define CS42L43_GPIO2_DRV_SHIFT 6 +#define CS42L43_GPIO1_DRV_MASK 0x00000038 +#define CS42L43_GPIO1_DRV_SHIFT 3 + +/* CS42L43_DRV_CTRL_5 */ +#define CS42L43_I2C_SCL_DRV_MASK 0x18000000 +#define CS42L43_I2C_SCL_DRV_SHIFT 27 +#define CS42L43_SPI_SCK_DRV_MASK 0x07000000 +#define CS42L43_SPI_SCK_DRV_SHIFT 24 +#define CS42L43_SPI_MOSI_DRV_MASK 0x00E00000 +#define CS42L43_SPI_MOSI_DRV_SHIFT 21 +#define CS42L43_SPI_SSB_DRV_MASK 0x001C0000 +#define CS42L43_SPI_SSB_DRV_SHIFT 18 +#define CS42L43_ASP_DIN_DRV_MASK 0x000001C0 +#define CS42L43_ASP_DIN_DRV_SHIFT 6 + +/* CS42L43_GPIO_CTRL1 */ +#define CS42L43_GPIO3_POL_MASK 0x00040000 +#define CS42L43_GPIO3_POL_SHIFT 18 +#define CS42L43_GPIO2_POL_MASK 0x00020000 +#define CS42L43_GPIO2_POL_SHIFT 17 +#define CS42L43_GPIO1_POL_MASK 0x00010000 +#define CS42L43_GPIO1_POL_SHIFT 16 +#define CS42L43_GPIO3_LVL_MASK 0x00000400 +#define CS42L43_GPIO3_LVL_SHIFT 10 +#define CS42L43_GPIO2_LVL_MASK 0x00000200 +#define CS42L43_GPIO2_LVL_SHIFT 9 +#define CS42L43_GPIO1_LVL_MASK 0x00000100 +#define CS42L43_GPIO1_LVL_SHIFT 8 +#define CS42L43_GPIO3_DIR_MASK 0x00000004 +#define CS42L43_GPIO3_DIR_SHIFT 2 +#define CS42L43_GPIO2_DIR_MASK 0x00000002 +#define CS42L43_GPIO2_DIR_SHIFT 1 +#define CS42L43_GPIO1_DIR_MASK 0x00000001 +#define CS42L43_GPIO1_DIR_SHIFT 0 + +/* CS42L43_GPIO_CTRL2 */ +#define CS42L43_GPIO3_DEGLITCH_BYP_MASK 0x00000004 +#define CS42L43_GPIO3_DEGLITCH_BYP_SHIFT 2 +#define CS42L43_GPIO2_DEGLITCH_BYP_MASK 0x00000002 +#define CS42L43_GPIO2_DEGLITCH_BYP_SHIFT 1 +#define CS42L43_GPIO1_DEGLITCH_BYP_MASK 0x00000001 +#define CS42L43_GPIO1_DEGLITCH_BYP_SHIFT 0 + +/* CS42L43_GPIO_STS */ +#define CS42L43_GPIO3_STS_MASK 0x00000004 +#define CS42L43_GPIO3_STS_SHIFT 2 +#define CS42L43_GPIO2_STS_MASK 0x00000002 +#define CS42L43_GPIO2_STS_SHIFT 1 +#define CS42L43_GPIO1_STS_MASK 0x00000001 +#define CS42L43_GPIO1_STS_SHIFT 0 + +/* CS42L43_GPIO_FN_SEL */ +#define CS42L43_GPIO3_FN_SEL_MASK 0x00000004 +#define CS42L43_GPIO3_FN_SEL_SHIFT 2 +#define CS42L43_GPIO1_FN_SEL_MASK 0x00000001 +#define CS42L43_GPIO1_FN_SEL_SHIFT 0 + +/* CS42L43_MCLK_SRC_SEL */ +#define CS42L43_OSC_PLL_MCLK_SEL_MASK 0x00000001 +#define CS42L43_OSC_PLL_MCLK_SEL_SHIFT 0 + +/* CS42L43_SAMPLE_RATE1..CS42L43_SAMPLE_RATE4 */ +#define CS42L43_SAMPLE_RATE_MASK 0x0000001F +#define CS42L43_SAMPLE_RATE_SHIFT 0 + +/* CS42L43_PLL_CONTROL */ +#define CS42L43_PLL_REFCLK_EN_MASK 0x00000008 +#define CS42L43_PLL_REFCLK_EN_SHIFT 3 +#define CS42L43_PLL_REFCLK_DIV_MASK 0x00000006 +#define CS42L43_PLL_REFCLK_DIV_SHIFT 1 +#define CS42L43_PLL_REFCLK_SRC_MASK 0x00000001 +#define CS42L43_PLL_REFCLK_SRC_SHIFT 0 + +/* CS42L43_FS_SELECT1 */ +#define CS42L43_ASP_RATE_MASK 0x00000003 +#define CS42L43_ASP_RATE_SHIFT 0 + +/* CS42L43_FS_SELECT2 */ +#define CS42L43_ASRC_DEC_OUT_RATE_MASK 0x000000C0 +#define CS42L43_ASRC_DEC_OUT_RATE_SHIFT 6 +#define CS42L43_ASRC_INT_OUT_RATE_MASK 0x00000030 +#define CS42L43_ASRC_INT_OUT_RATE_SHIFT 4 +#define CS42L43_ASRC_DEC_IN_RATE_MASK 0x0000000C +#define CS42L43_ASRC_DEC_IN_RATE_SHIFT 2 +#define CS42L43_ASRC_INT_IN_RATE_MASK 0x00000003 +#define CS42L43_ASRC_INT_IN_RATE_SHIFT 0 + +/* CS42L43_FS_SELECT3 */ +#define CS42L43_HPOUT_RATE_MASK 0x0000C000 +#define CS42L43_HPOUT_RATE_SHIFT 14 +#define CS42L43_EQZ_RATE_MASK 0x00003000 +#define CS42L43_EQZ_RATE_SHIFT 12 +#define CS42L43_DIAGGEN_RATE_MASK 0x00000C00 +#define CS42L43_DIAGGEN_RATE_SHIFT 10 +#define CS42L43_DECIM_CH4_RATE_MASK 0x00000300 +#define CS42L43_DECIM_CH4_RATE_SHIFT 8 +#define CS42L43_DECIM_CH3_RATE_MASK 0x000000C0 +#define CS42L43_DECIM_CH3_RATE_SHIFT 6 +#define CS42L43_DECIM_CH2_RATE_MASK 0x00000030 +#define CS42L43_DECIM_CH2_RATE_SHIFT 4 +#define CS42L43_DECIM_CH1_RATE_MASK 0x0000000C +#define CS42L43_DECIM_CH1_RATE_SHIFT 2 +#define CS42L43_AMP1_2_RATE_MASK 0x00000003 +#define CS42L43_AMP1_2_RATE_SHIFT 0 + +/* CS42L43_FS_SELECT4 */ +#define CS42L43_SW_DP7_RATE_MASK 0x00C00000 +#define CS42L43_SW_DP7_RATE_SHIFT 22 +#define CS42L43_SW_DP6_RATE_MASK 0x00300000 +#define CS42L43_SW_DP6_RATE_SHIFT 20 +#define CS42L43_SPDIF_RATE_MASK 0x000C0000 +#define CS42L43_SPDIF_RATE_SHIFT 18 +#define CS42L43_SW_DP5_RATE_MASK 0x00030000 +#define CS42L43_SW_DP5_RATE_SHIFT 16 +#define CS42L43_SW_DP4_RATE_MASK 0x0000C000 +#define CS42L43_SW_DP4_RATE_SHIFT 14 +#define CS42L43_SW_DP3_RATE_MASK 0x00003000 +#define CS42L43_SW_DP3_RATE_SHIFT 12 +#define CS42L43_SW_DP2_RATE_MASK 0x00000C00 +#define CS42L43_SW_DP2_RATE_SHIFT 10 +#define CS42L43_SW_DP1_RATE_MASK 0x00000300 +#define CS42L43_SW_DP1_RATE_SHIFT 8 +#define CS42L43_ISRC2_LOW_RATE_MASK 0x000000C0 +#define CS42L43_ISRC2_LOW_RATE_SHIFT 6 +#define CS42L43_ISRC2_HIGH_RATE_MASK 0x00000030 +#define CS42L43_ISRC2_HIGH_RATE_SHIFT 4 +#define CS42L43_ISRC1_LOW_RATE_MASK 0x0000000C +#define CS42L43_ISRC1_LOW_RATE_SHIFT 2 +#define CS42L43_ISRC1_HIGH_RATE_MASK 0x00000003 +#define CS42L43_ISRC1_HIGH_RATE_SHIFT 0 + +/* CS42L43_PDM_CONTROL */ +#define CS42L43_PDM2_CLK_DIV_MASK 0x0000000C +#define CS42L43_PDM2_CLK_DIV_SHIFT 2 +#define CS42L43_PDM1_CLK_DIV_MASK 0x00000003 +#define CS42L43_PDM1_CLK_DIV_SHIFT 0 + +/* CS42L43_ASP_CLK_CONFIG1 */ +#define CS42L43_ASP_BCLK_N_MASK 0x03FF0000 +#define CS42L43_ASP_BCLK_N_SHIFT 16 +#define CS42L43_ASP_BCLK_M_MASK 0x000003FF +#define CS42L43_ASP_BCLK_M_SHIFT 0 + +/* CS42L43_ASP_CLK_CONFIG2 */ +#define CS42L43_ASP_MASTER_MODE_MASK 0x00000002 +#define CS42L43_ASP_MASTER_MODE_SHIFT 1 +#define CS42L43_ASP_BCLK_INV_MASK 0x00000001 +#define CS42L43_ASP_BCLK_INV_SHIFT 0 + +/* CS42L43_OSC_DIV_SEL */ +#define CS42L43_OSC_DIV2_EN_MASK 0x00000001 +#define CS42L43_OSC_DIV2_EN_SHIFT 0 + +/* CS42L43_ADC_B_CTRL1..CS42L43_ADC_B_CTRL1 */ +#define CS42L43_PGA_WIDESWING_MODE_EN_MASK 0x00000080 +#define CS42L43_PGA_WIDESWING_MODE_EN_SHIFT 7 +#define CS42L43_ADC_AIN_SEL_MASK 0x00000010 +#define CS42L43_ADC_AIN_SEL_SHIFT 4 +#define CS42L43_ADC_PGA_GAIN_MASK 0x0000000F +#define CS42L43_ADC_PGA_GAIN_SHIFT 0 + +/* CS42L43_DECIM_HPF_WNF_CTRL1..CS42L43_DECIM_HPF_WNF_CTRL4 */ +#define CS42L43_DECIM_WNF_CF_MASK 0x00000070 +#define CS42L43_DECIM_WNF_CF_SHIFT 4 +#define CS42L43_DECIM_WNF_EN_MASK 0x00000008 +#define CS42L43_DECIM_WNF_EN_SHIFT 3 +#define CS42L43_DECIM_HPF_CF_MASK 0x00000006 +#define CS42L43_DECIM_HPF_CF_SHIFT 1 +#define CS42L43_DECIM_HPF_EN_MASK 0x00000001 +#define CS42L43_DECIM_HPF_EN_SHIFT 0 + +/* CS42L43_DMIC_PDM_CTRL */ +#define CS42L43_PDM2R_INV_MASK 0x00000020 +#define CS42L43_PDM2R_INV_SHIFT 5 +#define CS42L43_PDM2L_INV_MASK 0x00000010 +#define CS42L43_PDM2L_INV_SHIFT 4 +#define CS42L43_PDM1R_INV_MASK 0x00000008 +#define CS42L43_PDM1R_INV_SHIFT 3 +#define CS42L43_PDM1L_INV_MASK 0x00000004 +#define CS42L43_PDM1L_INV_SHIFT 2 + +/* CS42L43_DECIM_VOL_CTRL_CH1_CH2 */ +#define CS42L43_DECIM2_MUTE_MASK 0x80000000 +#define CS42L43_DECIM2_MUTE_SHIFT 31 +#define CS42L43_DECIM2_VOL_MASK 0x3FC00000 +#define CS42L43_DECIM2_VOL_SHIFT 22 +#define CS42L43_DECIM2_VD_RAMP_MASK 0x00380000 +#define CS42L43_DECIM2_VD_RAMP_SHIFT 19 +#define CS42L43_DECIM2_VI_RAMP_MASK 0x00070000 +#define CS42L43_DECIM2_VI_RAMP_SHIFT 16 +#define CS42L43_DECIM1_MUTE_MASK 0x00008000 +#define CS42L43_DECIM1_MUTE_SHIFT 15 +#define CS42L43_DECIM1_VOL_MASK 0x00003FC0 +#define CS42L43_DECIM1_VOL_SHIFT 6 +#define CS42L43_DECIM1_VD_RAMP_MASK 0x00000038 +#define CS42L43_DECIM1_VD_RAMP_SHIFT 3 +#define CS42L43_DECIM1_VI_RAMP_MASK 0x00000007 +#define CS42L43_DECIM1_VI_RAMP_SHIFT 0 + +/* CS42L43_DECIM_VOL_CTRL_CH3_CH4 */ +#define CS42L43_DECIM4_MUTE_MASK 0x80000000 +#define CS42L43_DECIM4_MUTE_SHIFT 31 +#define CS42L43_DECIM4_VOL_MASK 0x3FC00000 +#define CS42L43_DECIM4_VOL_SHIFT 22 +#define CS42L43_DECIM4_VD_RAMP_MASK 0x00380000 +#define CS42L43_DECIM4_VD_RAMP_SHIFT 19 +#define CS42L43_DECIM4_VI_RAMP_MASK 0x00070000 +#define CS42L43_DECIM4_VI_RAMP_SHIFT 16 +#define CS42L43_DECIM3_MUTE_MASK 0x00008000 +#define CS42L43_DECIM3_MUTE_SHIFT 15 +#define CS42L43_DECIM3_VOL_MASK 0x00003FC0 +#define CS42L43_DECIM3_VOL_SHIFT 6 +#define CS42L43_DECIM3_VD_RAMP_MASK 0x00000038 +#define CS42L43_DECIM3_VD_RAMP_SHIFT 3 +#define CS42L43_DECIM3_VI_RAMP_MASK 0x00000007 +#define CS42L43_DECIM3_VI_RAMP_SHIFT 0 + +/* CS42L43_DECIM_VOL_CTRL_UPDATE */ +#define CS42L43_DECIM4_VOL_UPDATE_MASK 0x00000008 +#define CS42L43_DECIM4_VOL_UPDATE_SHIFT 3 +#define CS42L43_DECIM3_VOL_UPDATE_MASK 0x00000004 +#define CS42L43_DECIM3_VOL_UPDATE_SHIFT 2 +#define CS42L43_DECIM2_VOL_UPDATE_MASK 0x00000002 +#define CS42L43_DECIM2_VOL_UPDATE_SHIFT 1 +#define CS42L43_DECIM1_VOL_UPDATE_MASK 0x00000001 +#define CS42L43_DECIM1_VOL_UPDATE_SHIFT 0 + +/* CS42L43_INTP_VOLUME_CTRL1..CS42L43_INTP_VOLUME_CTRL2 */ +#define CS42L43_AMP1_2_VU_MASK 0x00000200 +#define CS42L43_AMP1_2_VU_SHIFT 9 +#define CS42L43_AMP_MUTE_MASK 0x00000100 +#define CS42L43_AMP_MUTE_SHIFT 8 +#define CS42L43_AMP_VOL_MASK 0x000000FF +#define CS42L43_AMP_VOL_SHIFT 0 + +/* CS42L43_AMP1_2_VOL_RAMP */ +#define CS42L43_AMP1_2_VD_RAMP_MASK 0x00000070 +#define CS42L43_AMP1_2_VD_RAMP_SHIFT 4 +#define CS42L43_AMP1_2_VI_RAMP_MASK 0x00000007 +#define CS42L43_AMP1_2_VI_RAMP_SHIFT 0 + +/* CS42L43_ASP_CTRL */ +#define CS42L43_ASP_FSYNC_MODE_MASK 0x00000004 +#define CS42L43_ASP_FSYNC_MODE_SHIFT 2 +#define CS42L43_ASP_BCLK_EN_MASK 0x00000002 +#define CS42L43_ASP_BCLK_EN_SHIFT 1 +#define CS42L43_ASP_FSYNC_EN_MASK 0x00000001 +#define CS42L43_ASP_FSYNC_EN_SHIFT 0 + +/* CS42L43_ASP_FSYNC_CTRL1 */ +#define CS42L43_ASP_FSYNC_M_MASK 0x0007FFFF +#define CS42L43_ASP_FSYNC_M_SHIFT 0 + +/* CS42L43_ASP_FSYNC_CTRL3 */ +#define CS42L43_ASP_FSYNC_IN_INV_MASK 0x00000002 +#define CS42L43_ASP_FSYNC_IN_INV_SHIFT 1 +#define CS42L43_ASP_FSYNC_OUT_INV_MASK 0x00000001 +#define CS42L43_ASP_FSYNC_OUT_INV_SHIFT 0 + +/* CS42L43_ASP_FSYNC_CTRL4 */ +#define CS42L43_ASP_NUM_BCLKS_PER_FSYNC_MASK 0x00001FFE +#define CS42L43_ASP_NUM_BCLKS_PER_FSYNC_SHIFT 1 + +/* CS42L43_ASP_DATA_CTRL */ +#define CS42L43_ASP_FSYNC_FRAME_START_PHASE_MASK 0x00000008 +#define CS42L43_ASP_FSYNC_FRAME_START_PHASE_SHIFT 3 +#define CS42L43_ASP_FSYNC_FRAME_START_DLY_MASK 0x00000007 +#define CS42L43_ASP_FSYNC_FRAME_START_DLY_SHIFT 0 + +/* CS42L43_ASP_RX_EN */ +#define CS42L43_ASP_RX_CH6_EN_MASK 0x00000020 +#define CS42L43_ASP_RX_CH6_EN_SHIFT 5 +#define CS42L43_ASP_RX_CH5_EN_MASK 0x00000010 +#define CS42L43_ASP_RX_CH5_EN_SHIFT 4 +#define CS42L43_ASP_RX_CH4_EN_MASK 0x00000008 +#define CS42L43_ASP_RX_CH4_EN_SHIFT 3 +#define CS42L43_ASP_RX_CH3_EN_MASK 0x00000004 +#define CS42L43_ASP_RX_CH3_EN_SHIFT 2 +#define CS42L43_ASP_RX_CH2_EN_MASK 0x00000002 +#define CS42L43_ASP_RX_CH2_EN_SHIFT 1 +#define CS42L43_ASP_RX_CH1_EN_MASK 0x00000001 +#define CS42L43_ASP_RX_CH1_EN_SHIFT 0 + +/* CS42L43_ASP_TX_EN */ +#define CS42L43_ASP_TX_CH6_EN_MASK 0x00000020 +#define CS42L43_ASP_TX_CH6_EN_SHIFT 5 +#define CS42L43_ASP_TX_CH5_EN_MASK 0x00000010 +#define CS42L43_ASP_TX_CH5_EN_SHIFT 4 +#define CS42L43_ASP_TX_CH4_EN_MASK 0x00000008 +#define CS42L43_ASP_TX_CH4_EN_SHIFT 3 +#define CS42L43_ASP_TX_CH3_EN_MASK 0x00000004 +#define CS42L43_ASP_TX_CH3_EN_SHIFT 2 +#define CS42L43_ASP_TX_CH2_EN_MASK 0x00000002 +#define CS42L43_ASP_TX_CH2_EN_SHIFT 1 +#define CS42L43_ASP_TX_CH1_EN_MASK 0x00000001 +#define CS42L43_ASP_TX_CH1_EN_SHIFT 0 + +/* CS42L43_ASP_RX_CH1_CTRL..CS42L43_ASP_TX_CH6_CTRL */ +#define CS42L43_ASP_CH_WIDTH_MASK 0x001F0000 +#define CS42L43_ASP_CH_WIDTH_SHIFT 16 +#define CS42L43_ASP_CH_SLOT_MASK 0x00001FFE +#define CS42L43_ASP_CH_SLOT_SHIFT 1 +#define CS42L43_ASP_CH_SLOT_PHASE_MASK 0x00000001 +#define CS42L43_ASP_CH_SLOT_PHASE_SHIFT 0 + +/* CS42L43_ASPTX1_INPUT..CS42L43_AMP4MIX_INPUT4 */ +#define CS42L43_MIXER_VOL_MASK 0x00FE0000 +#define CS42L43_MIXER_VOL_SHIFT 17 +#define CS42L43_MIXER_SRC_MASK 0x000001FF +#define CS42L43_MIXER_SRC_SHIFT 0 + +/* CS42L43_ASRC_INT_ENABLES */ +#define CS42L43_ASRC_INT4_EN_MASK 0x00000008 +#define CS42L43_ASRC_INT4_EN_SHIFT 3 +#define CS42L43_ASRC_INT3_EN_MASK 0x00000004 +#define CS42L43_ASRC_INT3_EN_SHIFT 2 +#define CS42L43_ASRC_INT2_EN_MASK 0x00000002 +#define CS42L43_ASRC_INT2_EN_SHIFT 1 +#define CS42L43_ASRC_INT1_EN_MASK 0x00000001 +#define CS42L43_ASRC_INT1_EN_SHIFT 0 + +/* CS42L43_ASRC_DEC_ENABLES */ +#define CS42L43_ASRC_DEC4_EN_MASK 0x00000008 +#define CS42L43_ASRC_DEC4_EN_SHIFT 3 +#define CS42L43_ASRC_DEC3_EN_MASK 0x00000004 +#define CS42L43_ASRC_DEC3_EN_SHIFT 2 +#define CS42L43_ASRC_DEC2_EN_MASK 0x00000002 +#define CS42L43_ASRC_DEC2_EN_SHIFT 1 +#define CS42L43_ASRC_DEC1_EN_MASK 0x00000001 +#define CS42L43_ASRC_DEC1_EN_SHIFT 0 + +/* CS42L43_PDNCNTL */ +#define CS42L43_RING_SENSE_EN_MASK 0x00000002 +#define CS42L43_RING_SENSE_EN_SHIFT 1 + +/* CS42L43_RINGSENSE_DEB_CTRL */ +#define CS42L43_RINGSENSE_INV_MASK 0x00000080 +#define CS42L43_RINGSENSE_INV_SHIFT 7 +#define CS42L43_RINGSENSE_PULLUP_PDNB_MASK 0x00000040 +#define CS42L43_RINGSENSE_PULLUP_PDNB_SHIFT 6 +#define CS42L43_RINGSENSE_FALLING_DB_TIME_MASK 0x00000038 +#define CS42L43_RINGSENSE_FALLING_DB_TIME_SHIFT 3 +#define CS42L43_RINGSENSE_RISING_DB_TIME_MASK 0x00000007 +#define CS42L43_RINGSENSE_RISING_DB_TIME_SHIFT 0 + +/* CS42L43_TIPSENSE_DEB_CTRL */ +#define CS42L43_TIPSENSE_INV_MASK 0x00000080 +#define CS42L43_TIPSENSE_INV_SHIFT 7 +#define CS42L43_TIPSENSE_FALLING_DB_TIME_MASK 0x00000038 +#define CS42L43_TIPSENSE_FALLING_DB_TIME_SHIFT 3 +#define CS42L43_TIPSENSE_RISING_DB_TIME_MASK 0x00000007 +#define CS42L43_TIPSENSE_RISING_DB_TIME_SHIFT 0 + +/* CS42L43_TIP_RING_SENSE_INTERRUPT_STATUS */ +#define CS42L43_TIPSENSE_UNPLUG_DB_STS_MASK 0x00000008 +#define CS42L43_TIPSENSE_UNPLUG_DB_STS_SHIFT 3 +#define CS42L43_TIPSENSE_PLUG_DB_STS_MASK 0x00000004 +#define CS42L43_TIPSENSE_PLUG_DB_STS_SHIFT 2 +#define CS42L43_RINGSENSE_UNPLUG_DB_STS_MASK 0x00000002 +#define CS42L43_RINGSENSE_UNPLUG_DB_STS_SHIFT 1 +#define CS42L43_RINGSENSE_PLUG_DB_STS_MASK 0x00000001 +#define CS42L43_RINGSENSE_PLUG_DB_STS_SHIFT 0 + +/* CS42L43_HS2 */ +#define CS42L43_HS_CLAMP_DISABLE_MASK 0x10000000 +#define CS42L43_HS_CLAMP_DISABLE_SHIFT 28 +#define CS42L43_HSBIAS_RAMP_MASK 0x0C000000 +#define CS42L43_HSBIAS_RAMP_SHIFT 26 +#define CS42L43_HSDET_MODE_MASK 0x00018000 +#define CS42L43_HSDET_MODE_SHIFT 15 +#define CS42L43_HSDET_MANUAL_MODE_MASK 0x00006000 +#define CS42L43_HSDET_MANUAL_MODE_SHIFT 13 +#define CS42L43_AUTO_HSDET_TIME_MASK 0x00000700 +#define CS42L43_AUTO_HSDET_TIME_SHIFT 8 +#define CS42L43_AMP3_4_GNDREF_HS3_SEL_MASK 0x00000080 +#define CS42L43_AMP3_4_GNDREF_HS3_SEL_SHIFT 7 +#define CS42L43_AMP3_4_GNDREF_HS4_SEL_MASK 0x00000040 +#define CS42L43_AMP3_4_GNDREF_HS4_SEL_SHIFT 6 +#define CS42L43_HSBIAS_GNDREF_HS3_SEL_MASK 0x00000020 +#define CS42L43_HSBIAS_GNDREF_HS3_SEL_SHIFT 5 +#define CS42L43_HSBIAS_GNDREF_HS4_SEL_MASK 0x00000010 +#define CS42L43_HSBIAS_GNDREF_HS4_SEL_SHIFT 4 +#define CS42L43_HSBIAS_OUT_HS3_SEL_MASK 0x00000008 +#define CS42L43_HSBIAS_OUT_HS3_SEL_SHIFT 3 +#define CS42L43_HSBIAS_OUT_HS4_SEL_MASK 0x00000004 +#define CS42L43_HSBIAS_OUT_HS4_SEL_SHIFT 2 +#define CS42L43_HSGND_HS3_SEL_MASK 0x00000002 +#define CS42L43_HSGND_HS3_SEL_SHIFT 1 +#define CS42L43_HSGND_HS4_SEL_MASK 0x00000001 +#define CS42L43_HSGND_HS4_SEL_SHIFT 0 + +/* CS42L43_HS_STAT */ +#define CS42L43_HSDET_TYPE_STS_MASK 0x00000007 +#define CS42L43_HSDET_TYPE_STS_SHIFT 0 + +/* CS42L43_MCU_SW_INTERRUPT */ +#define CS42L43_CONTROL_IND_MASK 0x00000004 +#define CS42L43_CONTROL_IND_SHIFT 2 +#define CS42L43_CONFIGS_IND_MASK 0x00000002 +#define CS42L43_CONFIGS_IND_SHIFT 1 +#define CS42L43_PATCH_IND_MASK 0x00000001 +#define CS42L43_PATCH_IND_SHIFT 0 + +/* CS42L43_STEREO_MIC_CTRL */ +#define CS42L43_HS2_BIAS_SENSE_EN_MASK 0x00000020 +#define CS42L43_HS2_BIAS_SENSE_EN_SHIFT 5 +#define CS42L43_HS1_BIAS_SENSE_EN_MASK 0x00000010 +#define CS42L43_HS1_BIAS_SENSE_EN_SHIFT 4 +#define CS42L43_HS2_BIAS_EN_MASK 0x00000008 +#define CS42L43_HS2_BIAS_EN_SHIFT 3 +#define CS42L43_HS1_BIAS_EN_MASK 0x00000004 +#define CS42L43_HS1_BIAS_EN_SHIFT 2 +#define CS42L43_JACK_STEREO_CONFIG_MASK 0x00000003 +#define CS42L43_JACK_STEREO_CONFIG_SHIFT 0 + +/* CS42L43_STEREO_MIC_CLAMP_CTRL */ +#define CS42L43_SMIC_HPAMP_CLAMP_DIS_FRC_VAL_MASK 0x00000002 +#define CS42L43_SMIC_HPAMP_CLAMP_DIS_FRC_VAL_SHIFT 1 +#define CS42L43_SMIC_HPAMP_CLAMP_DIS_FRC_MASK 0x00000001 +#define CS42L43_SMIC_HPAMP_CLAMP_DIS_FRC_SHIFT 0 + +/* CS42L43_BLOCK_EN2 */ +#define CS42L43_SPI_MSTR_EN_MASK 0x00000001 +#define CS42L43_SPI_MSTR_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN3 */ +#define CS42L43_PDM2_DIN_R_EN_MASK 0x00000020 +#define CS42L43_PDM2_DIN_R_EN_SHIFT 5 +#define CS42L43_PDM2_DIN_L_EN_MASK 0x00000010 +#define CS42L43_PDM2_DIN_L_EN_SHIFT 4 +#define CS42L43_PDM1_DIN_R_EN_MASK 0x00000008 +#define CS42L43_PDM1_DIN_R_EN_SHIFT 3 +#define CS42L43_PDM1_DIN_L_EN_MASK 0x00000004 +#define CS42L43_PDM1_DIN_L_EN_SHIFT 2 +#define CS42L43_ADC2_EN_MASK 0x00000002 +#define CS42L43_ADC2_EN_SHIFT 1 +#define CS42L43_ADC1_EN_MASK 0x00000001 +#define CS42L43_ADC1_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN4 */ +#define CS42L43_ASRC_DEC_BANK_EN_MASK 0x00000002 +#define CS42L43_ASRC_DEC_BANK_EN_SHIFT 1 +#define CS42L43_ASRC_INT_BANK_EN_MASK 0x00000001 +#define CS42L43_ASRC_INT_BANK_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN5 */ +#define CS42L43_ISRC2_BANK_EN_MASK 0x00000002 +#define CS42L43_ISRC2_BANK_EN_SHIFT 1 +#define CS42L43_ISRC1_BANK_EN_MASK 0x00000001 +#define CS42L43_ISRC1_BANK_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN6 */ +#define CS42L43_MIXER_EN_MASK 0x00000001 +#define CS42L43_MIXER_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN7 */ +#define CS42L43_EQ_EN_MASK 0x00000001 +#define CS42L43_EQ_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN8 */ +#define CS42L43_HP_EN_MASK 0x00000001 +#define CS42L43_HP_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN9 */ +#define CS42L43_TONE_EN_MASK 0x00000001 +#define CS42L43_TONE_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN10 */ +#define CS42L43_AMP2_EN_MASK 0x00000002 +#define CS42L43_AMP2_EN_SHIFT 1 +#define CS42L43_AMP1_EN_MASK 0x00000001 +#define CS42L43_AMP1_EN_SHIFT 0 + +/* CS42L43_BLOCK_EN11 */ +#define CS42L43_SPDIF_EN_MASK 0x00000001 +#define CS42L43_SPDIF_EN_SHIFT 0 + +/* CS42L43_TONE_CH1_CTRL..CS42L43_TONE_CH2_CTRL */ +#define CS42L43_TONE_FREQ_MASK 0x00000070 +#define CS42L43_TONE_FREQ_SHIFT 4 +#define CS42L43_TONE_SEL_MASK 0x0000000F +#define CS42L43_TONE_SEL_SHIFT 0 + +/* CS42L43_MIC_DETECT_CONTROL_1 */ +#define CS42L43_BUTTON_DETECT_MODE_MASK 0x00000018 +#define CS42L43_BUTTON_DETECT_MODE_SHIFT 3 +#define CS42L43_HSBIAS_MODE_MASK 0x00000006 +#define CS42L43_HSBIAS_MODE_SHIFT 1 +#define CS42L43_MIC_LVL_DET_DISABLE_MASK 0x00000001 +#define CS42L43_MIC_LVL_DET_DISABLE_SHIFT 0 + +/* CS42L43_DETECT_STATUS_1 */ +#define CS42L43_HSDET_DC_STS_MASK 0x01FF0000 +#define CS42L43_HSDET_DC_STS_SHIFT 16 +#define CS42L43_JACKDET_STS_MASK 0x00000080 +#define CS42L43_JACKDET_STS_SHIFT 7 +#define CS42L43_HSBIAS_CLAMP_STS_MASK 0x00000040 +#define CS42L43_HSBIAS_CLAMP_STS_SHIFT 6 + +/* CS42L43_HS_BIAS_SENSE_AND_CLAMP_AUTOCONTROL */ +#define CS42L43_JACKDET_MODE_MASK 0xC0000000 +#define CS42L43_JACKDET_MODE_SHIFT 30 +#define CS42L43_JACKDET_INV_MASK 0x20000000 +#define CS42L43_JACKDET_INV_SHIFT 29 +#define CS42L43_JACKDET_DB_TIME_MASK 0x03000000 +#define CS42L43_JACKDET_DB_TIME_SHIFT 24 +#define CS42L43_S0_AUTO_ADCMUTE_DISABLE_MASK 0x00800000 +#define CS42L43_S0_AUTO_ADCMUTE_DISABLE_SHIFT 23 +#define CS42L43_HSBIAS_SENSE_EN_MASK 0x00000080 +#define CS42L43_HSBIAS_SENSE_EN_SHIFT 7 +#define CS42L43_AUTO_HSBIAS_CLAMP_EN_MASK 0x00000040 +#define CS42L43_AUTO_HSBIAS_CLAMP_EN_SHIFT 6 +#define CS42L43_JACKDET_SENSE_EN_MASK 0x00000020 +#define CS42L43_JACKDET_SENSE_EN_SHIFT 5 +#define CS42L43_HSBIAS_SENSE_TRIP_MASK 0x00000007 +#define CS42L43_HSBIAS_SENSE_TRIP_SHIFT 0 + +/* CS42L43_MIC_DETECT_CONTROL_ANDROID */ +#define CS42L43_HSDET_LVL_COMBWIDTH_MASK 0xC0000000 +#define CS42L43_HSDET_LVL_COMBWIDTH_SHIFT 30 +#define CS42L43_HSDET_LVL2_THRESH_MASK 0x01FF0000 +#define CS42L43_HSDET_LVL2_THRESH_SHIFT 16 +#define CS42L43_HSDET_LVL1_THRESH_MASK 0x000001FF +#define CS42L43_HSDET_LVL1_THRESH_SHIFT 0 + +/* CS42L43_ISRC1_CTRL..CS42L43_ISRC2_CTRL */ +#define CS42L43_ISRC_INT2_EN_MASK 0x00000200 +#define CS42L43_ISRC_INT2_EN_SHIFT 9 +#define CS42L43_ISRC_INT1_EN_MASK 0x00000100 +#define CS42L43_ISRC_INT1_EN_SHIFT 8 +#define CS42L43_ISRC_DEC2_EN_MASK 0x00000002 +#define CS42L43_ISRC_DEC2_EN_SHIFT 1 +#define CS42L43_ISRC_DEC1_EN_MASK 0x00000001 +#define CS42L43_ISRC_DEC1_EN_SHIFT 0 + +/* CS42L43_CTRL_REG */ +#define CS42L43_PLL_MODE_BYPASS_500_MASK 0x00000004 +#define CS42L43_PLL_MODE_BYPASS_500_SHIFT 2 +#define CS42L43_PLL_MODE_BYPASS_1029_MASK 0x00000002 +#define CS42L43_PLL_MODE_BYPASS_1029_SHIFT 1 +#define CS42L43_PLL_EN_MASK 0x00000001 +#define CS42L43_PLL_EN_SHIFT 0 + +/* CS42L43_FDIV_FRAC */ +#define CS42L43_PLL_DIV_INT_MASK 0xFF000000 +#define CS42L43_PLL_DIV_INT_SHIFT 24 +#define CS42L43_PLL_DIV_FRAC_BYTE2_MASK 0x00FF0000 +#define CS42L43_PLL_DIV_FRAC_BYTE2_SHIFT 16 +#define CS42L43_PLL_DIV_FRAC_BYTE1_MASK 0x0000FF00 +#define CS42L43_PLL_DIV_FRAC_BYTE1_SHIFT 8 +#define CS42L43_PLL_DIV_FRAC_BYTE0_MASK 0x000000FF +#define CS42L43_PLL_DIV_FRAC_BYTE0_SHIFT 0 + +/* CS42L43_CAL_RATIO */ +#define CS42L43_PLL_CAL_RATIO_MASK 0x000000FF +#define CS42L43_PLL_CAL_RATIO_SHIFT 0 + +/* CS42L43_SPI_CLK_CONFIG1 */ +#define CS42L43_SCLK_DIV_MASK 0x0000000F +#define CS42L43_SCLK_DIV_SHIFT 0 + +/* CS42L43_SPI_CONFIG1 */ +#define CS42L43_SPI_SS_IDLE_DUR_MASK 0x0F000000 +#define CS42L43_SPI_SS_IDLE_DUR_SHIFT 24 +#define CS42L43_SPI_SS_DELAY_DUR_MASK 0x000F0000 +#define CS42L43_SPI_SS_DELAY_DUR_SHIFT 16 +#define CS42L43_SPI_THREE_WIRE_MASK 0x00000100 +#define CS42L43_SPI_THREE_WIRE_SHIFT 8 +#define CS42L43_SPI_DPHA_MASK 0x00000040 +#define CS42L43_SPI_DPHA_SHIFT 6 +#define CS42L43_SPI_CPHA_MASK 0x00000020 +#define CS42L43_SPI_CPHA_SHIFT 5 +#define CS42L43_SPI_CPOL_MASK 0x00000010 +#define CS42L43_SPI_CPOL_SHIFT 4 +#define CS42L43_SPI_SS_SEL_MASK 0x00000007 +#define CS42L43_SPI_SS_SEL_SHIFT 0 + +/* CS42L43_SPI_CONFIG2 */ +#define CS42L43_SPI_SS_FRC_MASK 0x00000001 +#define CS42L43_SPI_SS_FRC_SHIFT 0 + +/* CS42L43_SPI_CONFIG3 */ +#define CS42L43_SPI_WDT_ENA_MASK 0x00000001 +#define CS42L43_SPI_WDT_ENA_SHIFT 0 + +/* CS42L43_SPI_CONFIG4 */ +#define CS42L43_SPI_STALL_ENA_MASK 0x00010000 +#define CS42L43_SPI_STALL_ENA_SHIFT 16 + +/* CS42L43_SPI_STATUS1 */ +#define CS42L43_SPI_ABORT_STS_MASK 0x00000002 +#define CS42L43_SPI_ABORT_STS_SHIFT 1 +#define CS42L43_SPI_DONE_STS_MASK 0x00000001 +#define CS42L43_SPI_DONE_STS_SHIFT 0 + +/* CS42L43_SPI_STATUS2 */ +#define CS42L43_SPI_RX_DONE_STS_MASK 0x00000010 +#define CS42L43_SPI_RX_DONE_STS_SHIFT 4 +#define CS42L43_SPI_TX_DONE_STS_MASK 0x00000001 +#define CS42L43_SPI_TX_DONE_STS_SHIFT 0 + +/* CS42L43_TRAN_CONFIG1 */ +#define CS42L43_SPI_START_MASK 0x00000001 +#define CS42L43_SPI_START_SHIFT 0 + +/* CS42L43_TRAN_CONFIG2 */ +#define CS42L43_SPI_ABORT_MASK 0x00000001 +#define CS42L43_SPI_ABORT_SHIFT 0 + +/* CS42L43_TRAN_CONFIG3 */ +#define CS42L43_SPI_WORD_SIZE_MASK 0x00070000 +#define CS42L43_SPI_WORD_SIZE_SHIFT 16 +#define CS42L43_SPI_CMD_MASK 0x00000003 +#define CS42L43_SPI_CMD_SHIFT 0 + +/* CS42L43_TRAN_CONFIG4 */ +#define CS42L43_SPI_TX_LENGTH_MASK 0x0000FFFF +#define CS42L43_SPI_TX_LENGTH_SHIFT 0 + +/* CS42L43_TRAN_CONFIG5 */ +#define CS42L43_SPI_RX_LENGTH_MASK 0x0000FFFF +#define CS42L43_SPI_RX_LENGTH_SHIFT 0 + +/* CS42L43_TRAN_CONFIG6 */ +#define CS42L43_SPI_TX_BLOCK_LENGTH_MASK 0x0000000F +#define CS42L43_SPI_TX_BLOCK_LENGTH_SHIFT 0 + +/* CS42L43_TRAN_CONFIG7 */ +#define CS42L43_SPI_RX_BLOCK_LENGTH_MASK 0x0000000F +#define CS42L43_SPI_RX_BLOCK_LENGTH_SHIFT 0 + +/* CS42L43_TRAN_CONFIG8 */ +#define CS42L43_SPI_RX_DONE_MASK 0x00000010 +#define CS42L43_SPI_RX_DONE_SHIFT 4 +#define CS42L43_SPI_TX_DONE_MASK 0x00000001 +#define CS42L43_SPI_TX_DONE_SHIFT 0 + +/* CS42L43_TRAN_STATUS1 */ +#define CS42L43_SPI_BUSY_STS_MASK 0x00000100 +#define CS42L43_SPI_BUSY_STS_SHIFT 8 +#define CS42L43_SPI_RX_REQUEST_MASK 0x00000010 +#define CS42L43_SPI_RX_REQUEST_SHIFT 4 +#define CS42L43_SPI_TX_REQUEST_MASK 0x00000001 +#define CS42L43_SPI_TX_REQUEST_SHIFT 0 + +/* CS42L43_TRAN_STATUS2 */ +#define CS42L43_SPI_TX_BYTE_COUNT_MASK 0x0000FFFF +#define CS42L43_SPI_TX_BYTE_COUNT_SHIFT 0 + +/* CS42L43_TRAN_STATUS3 */ +#define CS42L43_SPI_RX_BYTE_COUNT_MASK 0x0000FFFF +#define CS42L43_SPI_RX_BYTE_COUNT_SHIFT 0 + +/* CS42L43_TX_DATA */ +#define CS42L43_SPI_TX_DATA_MASK 0xFFFFFFFF +#define CS42L43_SPI_TX_DATA_SHIFT 0 + +/* CS42L43_RX_DATA */ +#define CS42L43_SPI_RX_DATA_MASK 0xFFFFFFFF +#define CS42L43_SPI_RX_DATA_SHIFT 0 + +/* CS42L43_DACCNFG1 */ +#define CS42L43_HP_MSTR_VOL_CTRL_EN_MASK 0x00000008 +#define CS42L43_HP_MSTR_VOL_CTRL_EN_SHIFT 3 +#define CS42L43_AMP4_INV_MASK 0x00000002 +#define CS42L43_AMP4_INV_SHIFT 1 +#define CS42L43_AMP3_INV_MASK 0x00000001 +#define CS42L43_AMP3_INV_SHIFT 0 + +/* CS42L43_DACCNFG2 */ +#define CS42L43_HP_AUTO_CLAMP_DISABLE_MASK 0x00000002 +#define CS42L43_HP_AUTO_CLAMP_DISABLE_SHIFT 1 +#define CS42L43_HP_HPF_EN_MASK 0x00000001 +#define CS42L43_HP_HPF_EN_SHIFT 0 + +/* CS42L43_HPPATHVOL */ +#define CS42L43_AMP4_PATH_VOL_MASK 0x01FF0000 +#define CS42L43_AMP4_PATH_VOL_SHIFT 16 +#define CS42L43_AMP3_PATH_VOL_MASK 0x000001FF +#define CS42L43_AMP3_PATH_VOL_SHIFT 0 + +/* CS42L43_PGAVOL */ +#define CS42L43_HP_PATH_VOL_RAMP_MASK 0x0003C000 +#define CS42L43_HP_PATH_VOL_RAMP_SHIFT 14 +#define CS42L43_HP_PATH_VOL_ZC_MASK 0x00002000 +#define CS42L43_HP_PATH_VOL_ZC_SHIFT 13 +#define CS42L43_HP_PATH_VOL_SFT_MASK 0x00001000 +#define CS42L43_HP_PATH_VOL_SFT_SHIFT 12 +#define CS42L43_HP_DIG_VOL_RAMP_MASK 0x00000F00 +#define CS42L43_HP_DIG_VOL_RAMP_SHIFT 8 +#define CS42L43_HP_ANA_VOL_RAMP_MASK 0x0000000F +#define CS42L43_HP_ANA_VOL_RAMP_SHIFT 0 + +/* CS42L43_LOADDETRESULTS */ +#define CS42L43_AMP3_RES_DET_MASK 0x00000003 +#define CS42L43_AMP3_RES_DET_SHIFT 0 + +/* CS42L43_LOADDETENA */ +#define CS42L43_HPLOAD_DET_EN_MASK 0x00000001 +#define CS42L43_HPLOAD_DET_EN_SHIFT 0 + +/* CS42L43_CTRL */ +#define CS42L43_ADPTPWR_MODE_MASK 0x00000007 +#define CS42L43_ADPTPWR_MODE_SHIFT 0 + +/* CS42L43_COEFF_RD_WR0 */ +#define CS42L43_WRITE_MODE_MASK 0x00000002 +#define CS42L43_WRITE_MODE_SHIFT 1 + +/* CS42L43_INIT_DONE0 */ +#define CS42L43_INITIALIZE_DONE_MASK 0x00000001 +#define CS42L43_INITIALIZE_DONE_SHIFT 0 + +/* CS42L43_START_EQZ0 */ +#define CS42L43_START_FILTER_MASK 0x00000001 +#define CS42L43_START_FILTER_SHIFT 0 + +/* CS42L43_MUTE_EQ_IN0 */ +#define CS42L43_MUTE_EQ_CH2_MASK 0x00000002 +#define CS42L43_MUTE_EQ_CH2_SHIFT 1 +#define CS42L43_MUTE_EQ_CH1_MASK 0x00000001 +#define CS42L43_MUTE_EQ_CH1_SHIFT 0 + +/* CS42L43_PLL_INT */ +#define CS42L43_PLL_LOST_LOCK_INT_MASK 0x00000002 +#define CS42L43_PLL_LOST_LOCK_INT_SHIFT 1 +#define CS42L43_PLL_READY_INT_MASK 0x00000001 +#define CS42L43_PLL_READY_INT_SHIFT 0 + +/* CS42L43_SOFT_INT */ +#define CS42L43_CONTROL_APPLIED_INT_MASK 0x00000010 +#define CS42L43_CONTROL_APPLIED_INT_SHIFT 4 +#define CS42L43_CONTROL_WARN_INT_MASK 0x00000008 +#define CS42L43_CONTROL_WARN_INT_SHIFT 3 +#define CS42L43_PATCH_WARN_INT_MASK 0x00000002 +#define CS42L43_PATCH_WARN_INT_SHIFT 1 +#define CS42L43_PATCH_APPLIED_INT_MASK 0x00000001 +#define CS42L43_PATCH_APPLIED_INT_SHIFT 0 + +/* CS42L43_MSM_INT */ +#define CS42L43_HP_STARTUP_DONE_INT_MASK 0x00000800 +#define CS42L43_HP_STARTUP_DONE_INT_SHIFT 11 +#define CS42L43_HP_SHUTDOWN_DONE_INT_MASK 0x00000400 +#define CS42L43_HP_SHUTDOWN_DONE_INT_SHIFT 10 +#define CS42L43_HSDET_DONE_INT_MASK 0x00000200 +#define CS42L43_HSDET_DONE_INT_SHIFT 9 +#define CS42L43_TIPSENSE_UNPLUG_DB_INT_MASK 0x00000080 +#define CS42L43_TIPSENSE_UNPLUG_DB_INT_SHIFT 7 +#define CS42L43_TIPSENSE_PLUG_DB_INT_MASK 0x00000040 +#define CS42L43_TIPSENSE_PLUG_DB_INT_SHIFT 6 +#define CS42L43_RINGSENSE_UNPLUG_DB_INT_MASK 0x00000020 +#define CS42L43_RINGSENSE_UNPLUG_DB_INT_SHIFT 5 +#define CS42L43_RINGSENSE_PLUG_DB_INT_MASK 0x00000010 +#define CS42L43_RINGSENSE_PLUG_DB_INT_SHIFT 4 +#define CS42L43_TIPSENSE_UNPLUG_PDET_INT_MASK 0x00000008 +#define CS42L43_TIPSENSE_UNPLUG_PDET_INT_SHIFT 3 +#define CS42L43_TIPSENSE_PLUG_PDET_INT_MASK 0x00000004 +#define CS42L43_TIPSENSE_PLUG_PDET_INT_SHIFT 2 +#define CS42L43_RINGSENSE_UNPLUG_PDET_INT_MASK 0x00000002 +#define CS42L43_RINGSENSE_UNPLUG_PDET_INT_SHIFT 1 +#define CS42L43_RINGSENSE_PLUG_PDET_INT_MASK 0x00000001 +#define CS42L43_RINGSENSE_PLUG_PDET_INT_SHIFT 0 + +/* CS42L43_ACC_DET_INT */ +#define CS42L43_HS2_BIAS_SENSE_INT_MASK 0x00000800 +#define CS42L43_HS2_BIAS_SENSE_INT_SHIFT 11 +#define CS42L43_HS1_BIAS_SENSE_INT_MASK 0x00000400 +#define CS42L43_HS1_BIAS_SENSE_INT_SHIFT 10 +#define CS42L43_DC_DETECT1_FALSE_INT_MASK 0x00000080 +#define CS42L43_DC_DETECT1_FALSE_INT_SHIFT 7 +#define CS42L43_DC_DETECT1_TRUE_INT_MASK 0x00000040 +#define CS42L43_DC_DETECT1_TRUE_INT_SHIFT 6 +#define CS42L43_HSBIAS_CLAMPED_INT_MASK 0x00000008 +#define CS42L43_HSBIAS_CLAMPED_INT_SHIFT 3 +#define CS42L43_HS3_4_BIAS_SENSE_INT_MASK 0x00000001 +#define CS42L43_HS3_4_BIAS_SENSE_INT_SHIFT 0 + +/* CS42L43_SPI_MSTR_INT */ +#define CS42L43_IRQ_SPI_STALLING_INT_MASK 0x00000004 +#define CS42L43_IRQ_SPI_STALLING_INT_SHIFT 2 +#define CS42L43_IRQ_SPI_STS_INT_MASK 0x00000002 +#define CS42L43_IRQ_SPI_STS_INT_SHIFT 1 +#define CS42L43_IRQ_SPI_BLOCK_INT_MASK 0x00000001 +#define CS42L43_IRQ_SPI_BLOCK_INT_SHIFT 0 + +/* CS42L43_SW_TO_SPI_BRIDGE_INT */ +#define CS42L43_SW2SPI_BUF_OVF_UDF_INT_MASK 0x00000001 +#define CS42L43_SW2SPI_BUF_OVF_UDF_INT_SHIFT 0 + +/* CS42L43_CLASS_D_AMP_INT */ +#define CS42L43_AMP2_CLK_STOP_FAULT_INT_MASK 0x00002000 +#define CS42L43_AMP2_CLK_STOP_FAULT_INT_SHIFT 13 +#define CS42L43_AMP1_CLK_STOP_FAULT_INT_MASK 0x00001000 +#define CS42L43_AMP1_CLK_STOP_FAULT_INT_SHIFT 12 +#define CS42L43_AMP2_VDDSPK_FAULT_INT_MASK 0x00000800 +#define CS42L43_AMP2_VDDSPK_FAULT_INT_SHIFT 11 +#define CS42L43_AMP1_VDDSPK_FAULT_INT_MASK 0x00000400 +#define CS42L43_AMP1_VDDSPK_FAULT_INT_SHIFT 10 +#define CS42L43_AMP2_SHUTDOWN_DONE_INT_MASK 0x00000200 +#define CS42L43_AMP2_SHUTDOWN_DONE_INT_SHIFT 9 +#define CS42L43_AMP1_SHUTDOWN_DONE_INT_MASK 0x00000100 +#define CS42L43_AMP1_SHUTDOWN_DONE_INT_SHIFT 8 +#define CS42L43_AMP2_STARTUP_DONE_INT_MASK 0x00000080 +#define CS42L43_AMP2_STARTUP_DONE_INT_SHIFT 7 +#define CS42L43_AMP1_STARTUP_DONE_INT_MASK 0x00000040 +#define CS42L43_AMP1_STARTUP_DONE_INT_SHIFT 6 +#define CS42L43_AMP2_THERM_SHDN_INT_MASK 0x00000020 +#define CS42L43_AMP2_THERM_SHDN_INT_SHIFT 5 +#define CS42L43_AMP1_THERM_SHDN_INT_MASK 0x00000010 +#define CS42L43_AMP1_THERM_SHDN_INT_SHIFT 4 +#define CS42L43_AMP2_THERM_WARN_INT_MASK 0x00000008 +#define CS42L43_AMP2_THERM_WARN_INT_SHIFT 3 +#define CS42L43_AMP1_THERM_WARN_INT_MASK 0x00000004 +#define CS42L43_AMP1_THERM_WARN_INT_SHIFT 2 +#define CS42L43_AMP2_SCDET_INT_MASK 0x00000002 +#define CS42L43_AMP2_SCDET_INT_SHIFT 1 +#define CS42L43_AMP1_SCDET_INT_MASK 0x00000001 +#define CS42L43_AMP1_SCDET_INT_SHIFT 0 + +/* CS42L43_GPIO_INT */ +#define CS42L43_GPIO3_FALL_INT_MASK 0x00000020 +#define CS42L43_GPIO3_FALL_INT_SHIFT 5 +#define CS42L43_GPIO3_RISE_INT_MASK 0x00000010 +#define CS42L43_GPIO3_RISE_INT_SHIFT 4 +#define CS42L43_GPIO2_FALL_INT_MASK 0x00000008 +#define CS42L43_GPIO2_FALL_INT_SHIFT 3 +#define CS42L43_GPIO2_RISE_INT_MASK 0x00000004 +#define CS42L43_GPIO2_RISE_INT_SHIFT 2 +#define CS42L43_GPIO1_FALL_INT_MASK 0x00000002 +#define CS42L43_GPIO1_FALL_INT_SHIFT 1 +#define CS42L43_GPIO1_RISE_INT_MASK 0x00000001 +#define CS42L43_GPIO1_RISE_INT_SHIFT 0 + +/* CS42L43_HPOUT_INT */ +#define CS42L43_HP_ILIMIT_INT_MASK 0x00000002 +#define CS42L43_HP_ILIMIT_INT_SHIFT 1 +#define CS42L43_HP_LOADDET_DONE_INT_MASK 0x00000001 +#define CS42L43_HP_LOADDET_DONE_INT_SHIFT 0 + +/* CS42L43_BOOT_CONTROL */ +#define CS42L43_LOCK_HW_STS_MASK 0x00000002 +#define CS42L43_LOCK_HW_STS_SHIFT 1 + +/* CS42L43_BLOCK_EN */ +#define CS42L43_MCU_EN_MASK 0x00000001 +#define CS42L43_MCU_EN_SHIFT 0 + +/* CS42L43_SHUTTER_CONTROL */ +#define CS42L43_STATUS_SPK_SHUTTER_MUTE_MASK 0x00008000 +#define CS42L43_STATUS_SPK_SHUTTER_MUTE_SHIFT 15 +#define CS42L43_SPK_SHUTTER_CFG_MASK 0x00000F00 +#define CS42L43_SPK_SHUTTER_CFG_SHIFT 8 +#define CS42L43_STATUS_MIC_SHUTTER_MUTE_MASK 0x00000080 +#define CS42L43_STATUS_MIC_SHUTTER_MUTE_SHIFT 7 +#define CS42L43_MIC_SHUTTER_CFG_MASK 0x0000000F +#define CS42L43_MIC_SHUTTER_CFG_SHIFT 0 + +/* CS42L43_MCU_SW_REV */ +#define CS42L43_BIOS_SUBMINOR_REV_MASK 0xFF000000 +#define CS42L43_BIOS_SUBMINOR_REV_SHIFT 24 +#define CS42L43_BIOS_MINOR_REV_MASK 0x00F00000 +#define CS42L43_BIOS_MINOR_REV_SHIFT 20 +#define CS42L43_BIOS_MAJOR_REV_MASK 0x000F0000 +#define CS42L43_BIOS_MAJOR_REV_SHIFT 16 +#define CS42L43_FW_SUBMINOR_REV_MASK 0x0000FF00 +#define CS42L43_FW_SUBMINOR_REV_SHIFT 8 +#define CS42L43_FW_MINOR_REV_MASK 0x000000F0 +#define CS42L43_FW_MINOR_REV_SHIFT 4 +#define CS42L43_FW_MAJOR_REV_MASK 0x0000000F +#define CS42L43_FW_MAJOR_REV_SHIFT 0 + +/* CS42L43_NEED_CONFIGS */ +#define CS42L43_FW_PATCH_NEED_CFG_MASK 0x80000000 +#define CS42L43_FW_PATCH_NEED_CFG_SHIFT 31 + +/* CS42L43_FW_MISSION_CTRL_MM_CTRL_SELECTION */ +#define CS42L43_FW_MM_CTRL_MCU_SEL_MASK 0x00000001 +#define CS42L43_FW_MM_CTRL_MCU_SEL_SHIFT 0 + +/* CS42L43_FW_MISSION_CTRL_MM_MCU_CFG_REG */ +#define CS42L43_FW_MISSION_CTRL_MM_MCU_CFG_DISABLE_VAL 0xF05AA50F + +#endif /* CS42L43_CORE_REGS_H */ diff --git a/include/linux/mfd/cs42l43.h b/include/linux/mfd/cs42l43.h new file mode 100644 index 000000000000..cf8263aab41b --- /dev/null +++ b/include/linux/mfd/cs42l43.h @@ -0,0 +1,102 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * CS42L43 core driver external data + * + * Copyright (C) 2022-2023 Cirrus Logic, Inc. and + * Cirrus Logic International Semiconductor Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef CS42L43_CORE_EXT_H +#define CS42L43_CORE_EXT_H + +#define CS42L43_N_SUPPLIES 3 + +enum cs42l43_irq_numbers { + CS42L43_PLL_LOST_LOCK, + CS42L43_PLL_READY, + + CS42L43_HP_STARTUP_DONE, + CS42L43_HP_SHUTDOWN_DONE, + CS42L43_HSDET_DONE, + CS42L43_TIPSENSE_UNPLUG_DB, + CS42L43_TIPSENSE_PLUG_DB, + CS42L43_RINGSENSE_UNPLUG_DB, + CS42L43_RINGSENSE_PLUG_DB, + CS42L43_TIPSENSE_UNPLUG_PDET, + CS42L43_TIPSENSE_PLUG_PDET, + CS42L43_RINGSENSE_UNPLUG_PDET, + CS42L43_RINGSENSE_PLUG_PDET, + + CS42L43_HS2_BIAS_SENSE, + CS42L43_HS1_BIAS_SENSE, + CS42L43_DC_DETECT1_FALSE, + CS42L43_DC_DETECT1_TRUE, + CS42L43_HSBIAS_CLAMPED, + CS42L43_HS3_4_BIAS_SENSE, + + CS42L43_AMP2_CLK_STOP_FAULT, + CS42L43_AMP1_CLK_STOP_FAULT, + CS42L43_AMP2_VDDSPK_FAULT, + CS42L43_AMP1_VDDSPK_FAULT, + CS42L43_AMP2_SHUTDOWN_DONE, + CS42L43_AMP1_SHUTDOWN_DONE, + CS42L43_AMP2_STARTUP_DONE, + CS42L43_AMP1_STARTUP_DONE, + CS42L43_AMP2_THERM_SHDN, + CS42L43_AMP1_THERM_SHDN, + CS42L43_AMP2_THERM_WARN, + CS42L43_AMP1_THERM_WARN, + CS42L43_AMP2_SCDET, + CS42L43_AMP1_SCDET, + + CS42L43_GPIO3_FALL, + CS42L43_GPIO3_RISE, + CS42L43_GPIO2_FALL, + CS42L43_GPIO2_RISE, + CS42L43_GPIO1_FALL, + CS42L43_GPIO1_RISE, + + CS42L43_HP_ILIMIT, + CS42L43_HP_LOADDET_DONE, +}; + +struct cs42l43 { + struct device *dev; + struct regmap *regmap; + struct sdw_slave *sdw; + + struct regulator *vdd_p; + struct regulator *vdd_d; + struct regulator_bulk_data core_supplies[CS42L43_N_SUPPLIES]; + + struct gpio_desc *reset; + + int irq; + struct regmap_irq_chip irq_chip; + struct regmap_irq_chip_data *irq_data; + + struct work_struct boot_work; + struct completion device_attach; + struct completion device_detach; + struct completion firmware_download; + int firmware_error; + + unsigned int sdw_freq; + /* Lock to gate control of the PLL and its sources. */ + struct mutex pll_lock; + + bool sdw_pll_active; + bool attached; + bool hw_lock; +}; + +#endif /* CS42L43_CORE_EXT_H */ -- cgit v1.2.3 From 588b51ddc7dc96c4876e71bc155bd38a6bd59a14 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Sat, 22 Jul 2023 10:55:05 +0800 Subject: ACPI: Remove unused extern declaration acpi_paddr_to_node() This is never used since commit 1e3590e2e4a3 ("[PATCH] pgdat allocation for new node add (get node id by acpi)"). Signed-off-by: YueHaibing Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 641dc4843987..58a0fdf68ca2 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -477,8 +477,6 @@ static inline int acpi_get_node(acpi_handle handle) return 0; } #endif -extern int acpi_paddr_to_node(u64 start_addr, u64 size); - extern int pnpacpi_disabled; #define PXM_INVAL (-1) -- cgit v1.2.3 From 3e1efe2b67d3d38116ec010968dbcd89d29e4561 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 28 Jul 2023 17:41:44 -0700 Subject: KVM: Wrap kvm_{gfn,hva}_range.pte in a per-action union Wrap kvm_{gfn,hva}_range.pte in a union so that future notifier events can pass event specific information up and down the stack without needing to constantly expand and churn the APIs. Lockless aging of SPTEs will pass around a bitmap, and support for memory attributes will pass around the new attributes for the range. Add a "KVM_NO_ARG" placeholder to simplify handling events without an argument (creating a dummy union variable is midly annoying). Opportunstically drop explicit zero-initialization of the "pte" field, as omitting the field (now a union) has the same effect. Cc: Yu Zhao Link: https://lore.kernel.org/all/CAOUHufagkd2Jk3_HrVoFFptRXM=hX2CV8f+M-dka-hJU4bP8kw@mail.gmail.com Reviewed-by: Oliver Upton Acked-by: Yu Zhao Link: https://lore.kernel.org/r/20230729004144.1054885-1-seanjc@google.com Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9d3ac7720da9..9125d0ab642d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -256,11 +256,15 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif #ifdef KVM_ARCH_WANT_MMU_NOTIFIER +union kvm_mmu_notifier_arg { + pte_t pte; +}; + struct kvm_gfn_range { struct kvm_memory_slot *slot; gfn_t start; gfn_t end; - pte_t pte; + union kvm_mmu_notifier_arg arg; bool may_block; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); -- cgit v1.2.3 From b616be6b97688f2f2bd7c4a47ab32f27f94fb2a9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 14:21:58 +0000 Subject: net: do not allow gso_size to be set to GSO_BY_FRAGS One missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again [1] Do not allow gso_size to be set to GSO_BY_FRAGS (0xffff), because this magic value is used by the kernel. [1] general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 0 PID: 5039 Comm: syz-executor401 Not tainted 6.5.0-rc5-next-20230809-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:skb_segment+0x1a52/0x3ef0 net/core/skbuff.c:4500 Code: 00 00 00 e9 ab eb ff ff e8 6b 96 5d f9 48 8b 84 24 00 01 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e ea 21 00 00 48 8b 84 24 00 01 RSP: 0018:ffffc90003d3f1c8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 000000000001fffe RCX: 0000000000000000 RDX: 000000000000000e RSI: ffffffff882a3115 RDI: 0000000000000070 RBP: ffffc90003d3f378 R08: 0000000000000005 R09: 000000000000ffff R10: 000000000000ffff R11: 5ee4a93e456187d6 R12: 000000000001ffc6 R13: dffffc0000000000 R14: 0000000000000008 R15: 000000000000ffff FS: 00005555563f2380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020020000 CR3: 000000001626d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: udp6_ufo_fragment+0x9d2/0xd50 net/ipv6/udp_offload.c:109 ipv6_gso_segment+0x5c4/0x17b0 net/ipv6/ip6_offload.c:120 skb_mac_gso_segment+0x292/0x610 net/core/gso.c:53 __skb_gso_segment+0x339/0x710 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x3a5/0xf10 net/core/dev.c:3625 __dev_queue_xmit+0x8f0/0x3d60 net/core/dev.c:4329 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24c7/0x5570 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:727 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:750 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2496 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2550 __sys_sendmsg+0x117/0x1e0 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7ff27cdb34d9 Fixes: 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Xin Long Cc: "Michael S. Tsirkin" Cc: Jason Wang Reviewed-by: Willem de Bruijn Reviewed-by: Marcelo Ricardo Leitner Reviewed-by: Xuan Zhuo Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/virtio_net.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index bdf8de2cdd93..7b4dd69555e4 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -155,6 +155,10 @@ retry: if (gso_type & SKB_GSO_UDP) nh_off -= thlen; + /* Kernel has a special handling for GSO_BY_FRAGS. */ + if (gso_size == GSO_BY_FRAGS) + return -EINVAL; + /* Too small packets are not really GSO ones. */ if (skb->len - nh_off > gso_size) { shinfo->gso_size = gso_size; -- cgit v1.2.3 From 1f8403953f05af591ab72cf749b9b9b837ea9595 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 14 Aug 2023 22:03:39 +0800 Subject: KVM: Remove unused kvm_device_{get,put}() declarations Commit 07f0a7bdec5c ("kvm: destroy emulated devices on VM exit") removed the functions but not these declarations. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230814140339.47732-1-yuehaibing@huawei.com [sean: split to separate patch] Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9125d0ab642d..a973a7cb45e0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2152,8 +2152,6 @@ struct kvm_device_ops { int (*mmap)(struct kvm_device *dev, struct vm_area_struct *vma); }; -void kvm_device_get(struct kvm_device *dev); -void kvm_device_put(struct kvm_device *dev); struct kvm_device *kvm_device_from_filp(struct file *filp); int kvm_register_device_ops(const struct kvm_device_ops *ops, u32 type); void kvm_unregister_device_ops(u32 type); -- cgit v1.2.3 From 458933d33af2cb3663bd8c0080c1efd1f9483db4 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 14 Aug 2023 22:03:39 +0800 Subject: KVM: Remove unused kvm_make_cpus_request_mask() declaration Commit 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") declared but never implemented kvm_make_cpus_request_mask() as kvm_make_vcpus_request_mask() already existed. Note, KVM's APIs are painfully inconsistent, as the inclusive variant uses "vcpus", whereas the exclusive/all variants use "cpus", which is likely what led to the spurious declaration. The "vcpus" terminology is more correct, especially since the helpers will kick _physical_ CPUs by calling kvm_kick_many_cpus(). But that's a cleanup for the future. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230814140339.47732-1-yuehaibing@huawei.com [sean: split to separate patch, call out inconsistent naming] Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a973a7cb45e0..199698a5ffa6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -190,8 +190,6 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); bool kvm_make_all_cpus_request_except(struct kvm *kvm, unsigned int req, struct kvm_vcpu *except); -bool kvm_make_cpus_request_mask(struct kvm *kvm, unsigned int req, - unsigned long *vcpu_bitmap); #define KVM_USERSPACE_IRQ_SOURCE_ID 0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 -- cgit v1.2.3 From c8248faf3ca276ebdf60f003b3e04bf764daba91 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 13:06:03 -0700 Subject: Compiler Attributes: counted_by: Adjust name and identifier expansion GCC and Clang's current RFCs name this attribute "counted_by", and have moved away from using a string for the member name. Update the kernel's macros to match. Additionally provide a UAPI no-op macro for UAPI structs that will gain annotations. Cc: Miguel Ojeda Cc: Nick Desaulniers Fixes: dd06e72e68bc ("Compiler Attributes: Add __counted_by macro") Acked-by: Miguel Ojeda Reviewed-by: Nathan Chancellor Link: https://lore.kernel.org/r/20230817200558.never.077-kees@kernel.org Signed-off-by: Kees Cook --- include/linux/compiler_attributes.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 00efa35c350f..28566624f008 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -94,6 +94,19 @@ # define __copy(symbol) #endif +/* + * Optional: only supported since gcc >= 14 + * Optional: only supported since clang >= 18 + * + * gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 + * clang: https://reviews.llvm.org/D148381 + */ +#if __has_attribute(__counted_by__) +# define __counted_by(member) __attribute__((__counted_by__(member))) +#else +# define __counted_by(member) +#endif + /* * Optional: not supported by gcc * Optional: only supported since clang >= 14.0 @@ -129,19 +142,6 @@ # define __designated_init #endif -/* - * Optional: only supported since gcc >= 14 - * Optional: only supported since clang >= 17 - * - * gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 - * clang: https://reviews.llvm.org/D148381 - */ -#if __has_attribute(__element_count__) -# define __counted_by(member) __attribute__((__element_count__(#member))) -#else -# define __counted_by(member) -#endif - /* * Optional: only supported since clang >= 14.0 * -- cgit v1.2.3 From 35acbdfaf17c94d64ee336282f21b2981676748a Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 18 Aug 2023 20:42:27 +0800 Subject: regulator: db8500-prcmu: Remove unused declaration power_state_active_is_enabled() Commit 38e968380b27 ("regulators/db8500: split off shared dbx500 code") removed this but not its declaration. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230818124227.15084-1-yuehaibing@huawei.com Signed-off-by: Mark Brown --- include/linux/regulator/db8500-prcmu.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regulator/db8500-prcmu.h b/include/linux/regulator/db8500-prcmu.h index f90df9ee703e..d58ff273157e 100644 --- a/include/linux/regulator/db8500-prcmu.h +++ b/include/linux/regulator/db8500-prcmu.h @@ -35,10 +35,4 @@ enum db8500_regulator_id { DB8500_NUM_REGULATORS }; -/* - * Exported interface for CPUIdle only. This function is called with all - * interrupts turned off. - */ -int power_state_active_is_enabled(void); - #endif -- cgit v1.2.3 From 92766e1b953d6e419684b39f55dab574287dd144 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Fri, 18 Aug 2023 03:10:29 -0700 Subject: iommu: Move dev_iommu_ops() to private header dev_iommu_ops() is essentially only used in iommu subsystem, so move to a private header to avoid being abused by other drivers. Link: https://lore.kernel.org/r/20230818101033.4100-2-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Lu Baolu Reviewed-by: Jason Gunthorpe Signed-off-by: Yi Liu Signed-off-by: Jason Gunthorpe --- include/linux/iommu.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..e0245aa82b75 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -450,17 +450,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) }; } -static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) -{ - /* - * Assume that valid ops must be installed if iommu_probe_device() - * has succeeded. The device ops are essentially for internal use - * within the IOMMU subsystem itself, so we should be able to trust - * ourselves not to misuse the helper. - */ - return dev->iommu->iommu_dev->ops; -} - extern int bus_iommu_probe(const struct bus_type *bus); extern bool iommu_present(const struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); -- cgit v1.2.3 From 60fedb262bbc632ab58bfdec7f6e47b2f94992d3 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Fri, 18 Aug 2023 03:10:30 -0700 Subject: iommu: Add new iommu op to get iommu hardware information Introduce a new iommu op to get the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace. This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there's room in the design to grow a generic parameter set as well. No wrapper API is added as it is supposed to be used by iommufd only. Different IOMMU hardware would have different hardware information. So the information reported differs as well. To let the external user understand the difference, enum iommu_hw_info_type is defined. For the iommu drivers that are capable to report hardware information, it should have a unique iommu_hw_info_type and return to caller. For the driver doesn't report hardware information, caller just uses IOMMU_HW_INFO_TYPE_NONE if a type is required. Link: https://lore.kernel.org/r/20230818101033.4100-3-yi.l.liu@intel.com Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Co-developed-by: Nicolin Chen Signed-off-by: Nicolin Chen Signed-off-by: Yi Liu Signed-off-by: Jason Gunthorpe --- include/linux/iommu.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e0245aa82b75..bd6a1110b294 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -228,6 +228,10 @@ struct iommu_iotlb_gather { /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability + * @hw_info: report iommu hardware information. The data buffer returned by this + * op is allocated in the iommu driver and freed by the caller after + * use. The information type is one of enum iommu_hw_info_type defined + * in include/uapi/linux/iommufd.h. * @domain_alloc: allocate iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling @@ -257,6 +261,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap); + void *(*hw_info)(struct device *dev, u32 *length, u32 *type); /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); -- cgit v1.2.3 From d11a69873d9a7435fe6a48531e165ab80a8b1221 Mon Sep 17 00:00:00 2001 From: Tomislav Novak Date: Mon, 5 Jun 2023 12:19:23 -0700 Subject: hw_breakpoint: fix single-stepping when using bpf_overflow_handler Arm platforms use is_default_overflow_handler() to determine if the hw_breakpoint code should single-step over the breakpoint trigger or let the custom handler deal with it. Since bpf_overflow_handler() currently isn't recognized as a default handler, attaching a BPF program to a PERF_TYPE_BREAKPOINT event causes it to keep firing (the instruction triggering the data abort exception is never skipped). For example: # bpftrace -e 'watchpoint:0x10000:4:w { print("hit") }' -c ./test Attaching 1 probe... hit hit [...] ^C (./test performs a single 4-byte store to 0x10000) This patch replaces the check with uses_default_overflow_handler(), which accounts for the bpf_overflow_handler() case by also testing if one of the perf_event_output functions gets invoked indirectly, via orig_default_handler. Signed-off-by: Tomislav Novak Tested-by: Samuel Gosselin # arm64 Reviewed-by: Catalin Marinas Acked-by: Alexei Starovoitov Link: https://lore.kernel.org/linux-arm-kernel/20220923203644.2731604-1-tnovak@fb.com/ Link: https://lore.kernel.org/r/20230605191923.1219974-1-tnovak@meta.com Signed-off-by: Will Deacon --- include/linux/perf_event.h | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2166a69e3bf2..e657916c9509 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1316,15 +1316,31 @@ extern int perf_event_output(struct perf_event *event, struct pt_regs *regs); static inline bool -is_default_overflow_handler(struct perf_event *event) +__is_default_overflow_handler(perf_overflow_handler_t overflow_handler) { - if (likely(event->overflow_handler == perf_event_output_forward)) + if (likely(overflow_handler == perf_event_output_forward)) return true; - if (unlikely(event->overflow_handler == perf_event_output_backward)) + if (unlikely(overflow_handler == perf_event_output_backward)) return true; return false; } +#define is_default_overflow_handler(event) \ + __is_default_overflow_handler((event)->overflow_handler) + +#ifdef CONFIG_BPF_SYSCALL +static inline bool uses_default_overflow_handler(struct perf_event *event) +{ + if (likely(is_default_overflow_handler(event))) + return true; + + return __is_default_overflow_handler(event->orig_overflow_handler); +} +#else +#define uses_default_overflow_handler(event) \ + is_default_overflow_handler(event) +#endif + extern void perf_event_header__init_id(struct perf_event_header *header, struct perf_sample_data *data, -- cgit v1.2.3 From 1aa3d0274a4aac338ee45a3dfc3b17c944bcc2bc Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Thu, 17 Aug 2023 11:24:03 +0530 Subject: arm_pmu: acpi: Add a representative platform device for TRBE ACPI TRBE does not have a HID for identification which could create and add a platform device into the platform bus. Also without a platform device, it cannot be probed and bound to a platform driver. This creates a dummy platform device for TRBE after ascertaining that ACPI provides required interrupts uniformly across all cpus on the system. This device gets created inside drivers/perf/arm_pmu_acpi.c to accommodate TRBE being built as a module. Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual Link: https://lore.kernel.org/r/20230817055405.249630-3-anshuman.khandual@arm.com Signed-off-by: Will Deacon --- include/linux/perf/arm_pmu.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index a0801f68762b..143fbc10ecfe 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -187,5 +187,6 @@ void armpmu_free_irq(int irq, int cpu); #endif /* CONFIG_ARM_PMU */ #define ARMV8_SPE_PDEV_NAME "arm,spe-v1" +#define ARMV8_TRBE_PDEV_NAME "arm,trbe" #endif /* __ARM_PMU_H__ */ -- cgit v1.2.3 From fad9c80e6371ee04a3fa5728efe20b88d8e4cccd Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 23 May 2023 22:51:01 +0200 Subject: maple_tree: fix a few documentation issues The documentation of mt_next() claims that it starts the search at the provided index. That's incorrect as it starts the search after the provided index. The documentation of mt_find() is slightly confusing. "Handles locking" is not really helpful as it does not explain how the "locking" works. Also the documentation of index talks about a range, while in reality the index is updated on a succesful search to the index of the found entry plus one. Fix similar issues for mt_find_after() and mt_prev(). Reword the confusing "Note: Will not return the zero entry." comment on mt_for_each() and document @__index correctly. Link: https://lkml.kernel.org/r/87ttw2n556.ffs@tglx Signed-off-by: Thomas Gleixner Reviewed-by: Liam R. Howlett Cc: Matthew Wilcox Cc: Shanker Donthineni Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 295548cca8b3..6e5bd2c9875d 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -662,10 +662,11 @@ void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max); * mt_for_each - Iterate over each entry starting at index until max. * @__tree: The Maple Tree * @__entry: The current entry - * @__index: The index to update to track the location in the tree + * @__index: The index to start the search from. Subsequently used as iterator. * @__max: The maximum limit for @index * - * Note: Will not return the zero entry. + * This iterator skips all entries, which resolve to a NULL pointer, + * e.g. entries which has been reserved with XA_ZERO_ENTRY. */ #define mt_for_each(__tree, __entry, __index, __max) \ for (__entry = mt_find(__tree, &(__index), __max); \ -- cgit v1.2.3 From fc1878ec70ede56ee48f2d65525d4f7c6888b496 Mon Sep 17 00:00:00 2001 From: ZhangPeng Date: Sat, 1 Jul 2023 11:28:53 +0800 Subject: mm: remove page_rmapping() After converting the last user to folio_raw_mapping(), we can safely remove the function. Link: https://lkml.kernel.org/r/20230701032853.258697-3-zhangpeng362@huawei.com Signed-off-by: ZhangPeng Reviewed-by: Sidhartha Kumar Reviewed-by: Matthew Wilcox (Oracle) Cc: Kefeng Wang Cc: Nanyong Sun Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 406ab9ea818f..6d150990e35c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2170,7 +2170,6 @@ static inline void *folio_address(const struct folio *folio) return page_address(&folio->page); } -extern void *page_rmapping(struct page *page); extern pgoff_t __page_file_index(struct page *page); /* -- cgit v1.2.3 From 527ed4f7d902d362471a93e1a4afb604c18ceb48 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Fri, 30 Jun 2023 14:22:52 +0800 Subject: mm: remove arguments of show_mem() All callers of show_mem() pass 0 and NULL, so we can remove the two arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem(). Link: https://lkml.kernel.org/r/20230630062253.189440-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Cc: Christophe Leroy Cc: Greg Kroah-Hartman Cc: Matthew Wilcox Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 6d150990e35c..8c3e3eec9008 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3068,9 +3068,9 @@ extern void mem_init(void); extern void __init mmap_init(void); extern void __show_mem(unsigned int flags, nodemask_t *nodemask, int max_zone_idx); -static inline void show_mem(unsigned int flags, nodemask_t *nodemask) +static inline void show_mem(void) { - __show_mem(flags, nodemask, MAX_NR_ZONES - 1); + __show_mem(0, NULL, MAX_NR_ZONES - 1); } extern long si_mem_available(void); extern void si_meminfo(struct sysinfo * val); -- cgit v1.2.3 From 1279aa0656bbebbeecbd3bec0dd50faf35e5db38 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Fri, 30 Jun 2023 14:22:53 +0800 Subject: mm: make show_free_areas() static All callers of show_free_areas() pass 0 and NULL, so we can directly use show_mem() instead of show_free_areas(0, NULL), which could make show_free_areas() a static function. Link: https://lkml.kernel.org/r/20230630062253.189440-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Cc: Christophe Leroy Cc: Greg Kroah-Hartman Cc: Matthew Wilcox Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- include/linux/mm.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 8c3e3eec9008..d1ad22980ebe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2236,18 +2236,6 @@ extern void pagefault_out_of_memory(void); #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1)) #define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1)) -/* - * Flags passed to show_mem() and show_free_areas() to suppress output in - * various contexts. - */ -#define SHOW_MEM_FILTER_NODES (0x0001u) /* disallowed nodes */ - -extern void __show_free_areas(unsigned int flags, nodemask_t *nodemask, int max_zone_idx); -static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodemask) -{ - __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1); -} - /* * Parameter block passed down to zap_pte_range in exceptional cases. */ -- cgit v1.2.3 From 5502ea44f5ade35d32a397353956bc026b870400 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 28 Jun 2023 17:53:05 -0400 Subject: mm/hugetlb: add page_mask for hugetlb_follow_page_mask() follow_page() doesn't need it, but we'll start to need it when unifying gup for hugetlb. Link: https://lkml.kernel.org/r/20230628215310.73782-4-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: James Houghton Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kirill A . Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ca3c8e10f24a..9f282f370d96 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -131,7 +131,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *, struct vm_area_struct *); struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags); + unsigned long address, unsigned int flags, + unsigned int *page_mask); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, unsigned long *, unsigned long *, long, unsigned int, int *); @@ -297,8 +298,9 @@ static inline void adjust_range_if_pmd_sharing_possible( { } -static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) +static inline struct page *hugetlb_follow_page_mask( + struct vm_area_struct *vma, unsigned long address, unsigned int flags, + unsigned int *page_mask) { BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ } -- cgit v1.2.3 From 4849807114b83e1897381ed3f851632f376a0b7e Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 28 Jun 2023 17:53:08 -0400 Subject: mm/gup: retire follow_hugetlb_page() Now __get_user_pages() should be well prepared to handle thp completely, as long as hugetlb gup requests even without the hugetlb's special path. Time to retire follow_hugetlb_page(). Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal. Link: https://lkml.kernel.org/r/20230628215310.73782-7-peterx@redhat.com Signed-off-by: Peter Xu Acked-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: James Houghton Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kirill A . Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 9f282f370d96..9bc3c2d71b71 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -133,9 +133,6 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask); -long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, - struct page **, unsigned long *, unsigned long *, - long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *, zap_flags_t); @@ -305,15 +302,6 @@ static inline struct page *hugetlb_follow_page_mask( BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ } -static inline long follow_hugetlb_page(struct mm_struct *mm, - struct vm_area_struct *vma, struct page **pages, - unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) -{ - BUG(); - return 0; -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, -- cgit v1.2.3 From a524fcfe190da16bbf1311b6636f51d81f35d59a Mon Sep 17 00:00:00 2001 From: Bean Huo Date: Mon, 26 Jun 2023 07:55:18 +0200 Subject: fs: convert block_commit_write to return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit block_commit_write() always returns 0, this patch changes it to return void. Link: https://lkml.kernel.org/r/20230626055518.842392-3-beanhuo@iokpp.de Signed-off-by: Bean Huo Reviewed-by: Jan Kara Acked-by: Theodore Ts'o Reviewed-by: Christoph Hellwig Reviewed-by: Matthew Wilcox (Oracle) Cc: Al Viro Cc: Andreas Dilger Cc: Christian Brauner Cc: Joel Becker Cc: Joseph Qi Cc: Luís Henriques Cc: Mark Fasheh Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6cb3e9af78c9..a7377877ff4e 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -288,7 +288,7 @@ int cont_write_begin(struct file *, struct address_space *, loff_t, unsigned, struct page **, void **, get_block_t *, loff_t *); int generic_cont_expand_simple(struct inode *inode, loff_t size); -int block_commit_write(struct page *page, unsigned from, unsigned to); +void block_commit_write(struct page *page, unsigned int from, unsigned int to); int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, get_block_t get_block); /* Convert errno to return value from ->page_mkwrite() call */ -- cgit v1.2.3 From 3fade62b62e84dd8dbf6e92d494b0e7eca750c43 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Sun, 25 Jun 2023 10:13:23 +0800 Subject: mm/mm_init.c: remove obsolete macro HASH_SMALL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit HASH_SMALL only works when parameter numentries is 0. But the sole caller futex_init() never calls alloc_large_system_hash() with numentries set to 0. So HASH_SMALL is obsolete and remove it. Link: https://lkml.kernel.org/r/20230625021323.849147-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Mike Rapoport (IBM) Cc: André Almeida Cc: Darren Hart Cc: Davidlohr Bueso Cc: Ingo Molnar Cc: Miaohe Lin Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/memblock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memblock.h b/include/linux/memblock.h index f71ff9f0ec81..0d031fbfea25 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -581,9 +581,7 @@ extern void *alloc_large_system_hash(const char *tablename, unsigned long high_limit); #define HASH_EARLY 0x00000001 /* Allocating during early boot? */ -#define HASH_SMALL 0x00000002 /* sub-page allocation allowed, min - * shift passed via *_hash_shift */ -#define HASH_ZERO 0x00000004 /* Zero allocated hash table */ +#define HASH_ZERO 0x00000002 /* Zero allocated hash table */ /* Only NUMA needs hash distribution. 64bit NUMA architectures have * sufficient vmalloc space. -- cgit v1.2.3 From 79271476b3362a9e69adae949a520647f8af3559 Mon Sep 17 00:00:00 2001 From: xu xin Date: Tue, 13 Jun 2023 11:09:28 +0800 Subject: ksm: support unsharing KSM-placed zero pages Patch series "ksm: support tracking KSM-placed zero-pages", v10. The core idea of this patch set is to enable users to perceive the number of any pages merged by KSM, regardless of whether use_zero_page switch has been turned on, so that users can know how much free memory increase is really due to their madvise(MERGEABLE) actions. But the problem is, when enabling use_zero_pages, all empty pages will be merged with kernel zero pages instead of with each other as use_zero_pages is disabled, and then these zero-pages are no longer monitored by KSM. The motivations to do this is seen at: https://lore.kernel.org/lkml/202302100915227721315@zte.com.cn/ In one word, we hope to implement the support for KSM-placed zero pages tracking without affecting the feature of use_zero_pages, so that app developer can also benefit from knowing the actual KSM profit by getting KSM-placed zero pages to optimize applications eventually when /sys/kernel/mm/ksm/use_zero_pages is enabled. This patch (of 5): When use_zero_pages of ksm is enabled, madvise(addr, len, MADV_UNMERGEABLE) and other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger unsharing will *not* actually unshare the shared zeropage as placed by KSM (which is against the MADV_UNMERGEABLE documentation). As these KSM-placed zero pages are out of the control of KSM, the related counts of ksm pages don't expose how many zero pages are placed by KSM (these special zero pages are different from those initially mapped zero pages, because the zero pages mapped to MADV_UNMERGEABLE areas are expected to be a complete and unshared page). To not blindly unshare all shared zero_pages in applicable VMAs, the patch use pte_mkdirty (related with architecture) to mark KSM-placed zero pages. Thus, MADV_UNMERGEABLE will only unshare those KSM-placed zero pages. In addition, we'll reuse this mechanism to reliably identify KSM-placed ZeroPages to properly account for them (e.g., calculating the KSM profit that includes zeropages) in the latter patches. The patch will not degrade the performance of use_zero_pages as it doesn't change the way of merging empty pages in use_zero_pages's feature. Link: https://lkml.kernel.org/r/202306131104554703428@zte.com.cn Link: https://lkml.kernel.org/r/20230613030928.185882-1-yang.yang29@zte.com.cn Signed-off-by: xu xin Acked-by: David Hildenbrand Cc: Claudio Imbrenda Cc: Xuexin Jiang Reviewed-by: Xiaokai Ran Reviewed-by: Yang Yang Signed-off-by: Andrew Morton --- include/linux/ksm.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 899a314bc487..98878107244f 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -26,6 +26,12 @@ int ksm_disable(struct mm_struct *mm); int __ksm_enter(struct mm_struct *mm); void __ksm_exit(struct mm_struct *mm); +/* + * To identify zeropages that were mapped by KSM, we reuse the dirty bit + * in the PTE. If the PTE is dirty, the zeropage was mapped by KSM when + * deduplicating memory. + */ +#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { -- cgit v1.2.3 From e2942062e01df85b4692460fe5b48ab0c90fdb95 Mon Sep 17 00:00:00 2001 From: xu xin Date: Tue, 13 Jun 2023 11:09:34 +0800 Subject: ksm: count all zero pages placed by KSM As pages_sharing and pages_shared don't include the number of zero pages merged by KSM, we cannot know how many pages are zero pages placed by KSM when enabling use_zero_pages, which leads to KSM not being transparent with all actual merged pages by KSM. In the early days of use_zero_pages, zero-pages was unable to get unshared by the ways like MADV_UNMERGEABLE so it's hard to count how many times one of those zeropages was then unmerged. But now, unsharing KSM-placed zero page accurately has been achieved, so we can easily count both how many times a page full of zeroes was merged with zero-page and how many times one of those pages was then unmerged. and so, it helps to estimate memory demands when each and every shared page could get unshared. So we add ksm_zero_pages under /sys/kernel/mm/ksm/ to show the number of all zero pages placed by KSM. Meanwhile, we update the Documentation. Link: https://lkml.kernel.org/r/20230613030934.185944-1-yang.yang29@zte.com.cn Signed-off-by: xu xin Acked-by: David Hildenbrand Cc: Claudio Imbrenda Cc: Xuexin Jiang Reviewed-by: Xiaokai Ran Reviewed-by: Yang Yang Signed-off-by: Andrew Morton --- include/linux/ksm.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 98878107244f..e80aa49009b2 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -33,6 +33,14 @@ void __ksm_exit(struct mm_struct *mm); */ #define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)) +extern unsigned long ksm_zero_pages; + +static inline void ksm_might_unmap_zero_page(pte_t pte) +{ + if (is_ksm_zero_pte(pte)) + ksm_zero_pages--; +} + static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { int ret; @@ -101,6 +109,10 @@ static inline void ksm_exit(struct mm_struct *mm) { } +static inline void ksm_might_unmap_zero_page(pte_t pte) +{ +} + #ifdef CONFIG_MEMORY_FAILURE static inline void collect_procs_ksm(struct page *page, struct list_head *to_kill, int force_early) -- cgit v1.2.3 From 6080d19f07043ade61094d0f58b14c05e1694a39 Mon Sep 17 00:00:00 2001 From: xu xin Date: Tue, 13 Jun 2023 11:09:38 +0800 Subject: ksm: add ksm zero pages for each process As the number of ksm zero pages is not included in ksm_merging_pages per process when enabling use_zero_pages, it's unclear of how many actual pages are merged by KSM. To let users accurately estimate their memory demands when unsharing KSM zero-pages, it's necessary to show KSM zero- pages per process. In addition, it help users to know the actual KSM profit because KSM-placed zero pages are also benefit from KSM. since unsharing zero pages placed by KSM accurately is achieved, then tracking empty pages merging and unmerging is not a difficult thing any longer. Since we already have /proc//ksm_stat, just add the information of 'ksm_zero_pages' in it. Link: https://lkml.kernel.org/r/20230613030938.185993-1-yang.yang29@zte.com.cn Signed-off-by: xu xin Acked-by: David Hildenbrand Reviewed-by: Xiaokai Ran Reviewed-by: Yang Yang Cc: Claudio Imbrenda Cc: Xuexin Jiang Signed-off-by: Andrew Morton --- include/linux/ksm.h | 8 +++++--- include/linux/mm_types.h | 9 +++++++-- 2 files changed, 12 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index e80aa49009b2..c2dd786a30e1 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -35,10 +35,12 @@ void __ksm_exit(struct mm_struct *mm); extern unsigned long ksm_zero_pages; -static inline void ksm_might_unmap_zero_page(pte_t pte) +static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte) { - if (is_ksm_zero_pte(pte)) + if (is_ksm_zero_pte(pte)) { ksm_zero_pages--; + mm->ksm_zero_pages--; + } } static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) @@ -109,7 +111,7 @@ static inline void ksm_exit(struct mm_struct *mm) { } -static inline void ksm_might_unmap_zero_page(pte_t pte) +static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte) { } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5e74ce4a28cd..51d04c1847c1 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -812,7 +812,7 @@ struct mm_struct { #ifdef CONFIG_KSM /* * Represent how many pages of this process are involved in KSM - * merging. + * merging (not including ksm_zero_pages). */ unsigned long ksm_merging_pages; /* @@ -820,7 +820,12 @@ struct mm_struct { * including merged and not merged. */ unsigned long ksm_rmap_items; -#endif + /* + * Represent how many empty pages are merged with kernel zero + * pages when enabling KSM use_zero_pages. + */ + unsigned long ksm_zero_pages; +#endif /* CONFIG_KSM */ #ifdef CONFIG_LRU_GEN struct { /* this mm_struct is on lru_gen_mm_list */ -- cgit v1.2.3 From bded67f81ec47e6054ad24c1c7992a6523a9b2c6 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 6 Jul 2023 14:39:05 +0800 Subject: memory tier: rename destroy_memory_type() to put_memory_type() It appears that destroy_memory_type() isn't a very good name because we usually will not free the memory_type here. So rename it to a more appropriate name i.e. put_memory_type(). Link: https://lkml.kernel.org/r/20230706063905.543800-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Suggested-by: Huang, Ying Reviewed-by: "Huang, Ying" Reviewed-by: Xiao Yang Cc: Dan Williams Cc: Dave Jiang Cc: Vishal Verma Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index fc9647b1b4f9..437441cdf78f 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -33,7 +33,7 @@ struct memory_dev_type { #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; struct memory_dev_type *alloc_memory_type(int adistance); -void destroy_memory_type(struct memory_dev_type *memtype); +void put_memory_type(struct memory_dev_type *memtype); void init_node_memory_type(int node, struct memory_dev_type *default_type); void clear_node_memory_type(int node, struct memory_dev_type *memtype); #ifdef CONFIG_MIGRATION @@ -68,7 +68,7 @@ static inline struct memory_dev_type *alloc_memory_type(int adistance) return NULL; } -static inline void destroy_memory_type(struct memory_dev_type *memtype) +static inline void put_memory_type(struct memory_dev_type *memtype) { } -- cgit v1.2.3 From 8f21912a4bf854e51b3ba69298f559f976d63685 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 6 Jul 2023 17:24:41 +0800 Subject: mm: remove obsolete comment above struct per_cpu_pages Since commit 01b44456a7aa ("mm/page_alloc: replace local_lock with normal spinlock"), per_cpu_pages is protected by normal spinlock. Remove the obsolete comment as it's not that helpful. Link: https://lkml.kernel.org/r/20230706092441.1574950-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5e50b78d58ea..4106fbc5b4b3 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -676,7 +676,6 @@ enum zone_watermarks { #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost) #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost) -/* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { spinlock_t lock; /* Protects lists field */ int count; /* number of pages in the list */ -- cgit v1.2.3 From b4fa966f03b7401ceacd4ffd7227197afb2b8376 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 28 Jun 2023 11:48:52 +0100 Subject: mm, netfs, fscache: stop read optimisation when folio removed from pagecache Fscache has an optimisation by which reads from the cache are skipped until we know that (a) there's data there to be read and (b) that data isn't entirely covered by pages resident in the netfs pagecache. This is done with two flags manipulated by fscache_note_page_release(): if (... test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) && test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate that netfslib should download from the server or clear the page instead. The fscache_note_page_release() function is intended to be called from ->releasepage() - but that only gets called if PG_private or PG_private_2 is set - and currently the former is at the discretion of the network filesystem and the latter is only set whilst a page is being written to the cache, so sometimes we miss clearing the optimisation. Fix this by following Willy's suggestion[1] and adding an address_space flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call ->release_folio() if it's set, even if PG_private or PG_private_2 aren't set. Note that this would require folio_test_private() and page_has_private() to become more complicated. To avoid that, in the places[*] where these are used to conditionalise calls to filemap_release_folio() and try_to_release_page(), the tests are removed the those functions just jumped to unconditionally and the test is performed there. [*] There are some exceptions in vmscan.c where the check guards more than just a call to the releaser. I've added a function, folio_needs_release() to wrap all the checks for that. AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from fscache and cleared in ->evict_inode() before truncate_inode_pages_final() is called. Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared and the optimisation cancelled if a cachefiles object already contains data when we open it. [dwysocha@redhat.com: call folio_mapping() inside folio_needs_release()] Link: https://github.com/DaveWysochanskiRH/kernel/commit/902c990e311120179fa5de99d68364b2947b79ec Link: https://lkml.kernel.org/r/20230628104852.3391651-3-dhowells@redhat.com Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page") Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines") Signed-off-by: David Howells Signed-off-by: Dave Wysochanski Reported-by: Rohith Surabattula Suggested-by: Matthew Wilcox Tested-by: SeongJae Park Cc: Daire Byrne Cc: Matthew Wilcox Cc: Linus Torvalds Cc: Steve French Cc: Shyam Prasad N Cc: Rohith Surabattula Cc: Dave Wysochanski Cc: Dominique Martinet Cc: Ilya Dryomov Cc: Andreas Dilger Cc: Jingbo Xu Cc: "Theodore Ts'o" Cc: Xiubo Li Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 716953ee1ebd..0ab0f2362b9b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -203,6 +203,7 @@ enum mapping_flags { /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, AS_LARGE_FOLIO_SUPPORT = 6, + AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ }; /** @@ -273,6 +274,21 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping) return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags); } +static inline bool mapping_release_always(const struct address_space *mapping) +{ + return test_bit(AS_RELEASE_ALWAYS, &mapping->flags); +} + +static inline void mapping_set_release_always(struct address_space *mapping) +{ + set_bit(AS_RELEASE_ALWAYS, &mapping->flags); +} + +static inline void mapping_clear_release_always(struct address_space *mapping) +{ + clear_bit(AS_RELEASE_ALWAYS, &mapping->flags); +} + static inline gfp_t mapping_gfp_mask(struct address_space * mapping) { return mapping->gfp_mask; -- cgit v1.2.3 From 60b1e24ce8c3334d9204d6229356b750632136be Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Sat, 8 Jul 2023 10:33:04 +0800 Subject: mm/memcg: minor cleanup for MEM_CGROUP_ID_MAX MEM_CGROUP_ID_MAX is only used when CONFIG_MEMCG is configured. So remove unneeded !CONFIG_MEMCG variant. Also it's only used in mem_cgroup_alloc(), so move it from memcontrol.h to memcontrol.c. And further define it as: #define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1) so if someone changes MEM_CGROUP_ID_SHIFT in the future, then MEM_CGROUP_ID_MAX will be updated accordingly, as suggested by Muchun. Link: https://lkml.kernel.org/r/20230708023304.1184111-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Shakeel Butt Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5818af8eca5a..58eb7ca65699 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -61,7 +61,6 @@ struct mem_cgroup_reclaim_cookie { #ifdef CONFIG_MEMCG #define MEM_CGROUP_ID_SHIFT 16 -#define MEM_CGROUP_ID_MAX USHRT_MAX struct mem_cgroup_id { int id; @@ -1158,7 +1157,6 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 -#define MEM_CGROUP_ID_MAX 0 static inline struct mem_cgroup *folio_memcg(struct folio *folio) { -- cgit v1.2.3 From af19487f00f34ff8643921d7909dbb3fedc7e329 Mon Sep 17 00:00:00 2001 From: Axel Rasmussen Date: Fri, 7 Jul 2023 14:55:33 -0700 Subject: mm: make PTE_MARKER_SWAPIN_ERROR more general Patch series "add UFFDIO_POISON to simulate memory poisoning with UFFD", v4. This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4 for a detailed description of the feature. This patch (of 8): Future patches will reuse PTE_MARKER_SWAPIN_ERROR to implement UFFDIO_POISON, so make some various preparations for that: First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be confusing since we're going to re-use it for something not really related to swap. This can be particularly confusing for things like hugetlbfs, which doesn't support swap whatsoever. Also rename some various helper functions. Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. But, since we're going to re-use it, we want it to go ahead and copy it just like non-hugetlbfs memory does today. Since the code to do this is more complicated now, pull it out into a helper which can be re-used in both places. While we're at it, also make it slightly more explicit in its handling of e.g. uffd wp markers. For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an error entry, return VM_FAULT_HWPOISON. For most cases this change doesn't matter, e.g. a userspace program would receive a SIGBUS either way. But for UFFDIO_POISON, this change will let KVM guests get an MCE out of the box, instead of giving a SIGBUS to the hypervisor and requiring it to somehow inject an MCE. Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today because the lack of swap support means we'll never end up with such a PTE anyway, but this behavior will be needed once such entries *can* show up via UFFDIO_POISON. Link: https://lkml.kernel.org/r/20230707215540.2324998-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20230707215540.2324998-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen Acked-by: Peter Xu Cc: Al Viro Cc: Brian Geffon Cc: Christian Brauner Cc: David Hildenbrand Cc: Gaosheng Cui Cc: Huang, Ying Cc: Hugh Dickins Cc: James Houghton Cc: Jan Alexander Steffens (heftig) Cc: Jiaqi Yan Cc: Jonathan Corbet Cc: Kefeng Wang Cc: Liam R. Howlett Cc: Miaohe Lin Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Muchun Song Cc: Nadav Amit Cc: Naoya Horiguchi Cc: Ryan Roberts Cc: Shuah Khan Cc: Suleiman Souhlal Cc: Suren Baghdasaryan Cc: T.J. Alumbaugh Cc: Yu Zhao Cc: ZhangPeng Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 19 +++++++++++++++++++ include/linux/swapops.h | 15 ++++++++++----- 2 files changed, 29 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 21d6c72bcc71..a86c84600787 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } +/* + * Computes the pte marker to copy from the given source entry into dst_vma. + * If no marker should be copied, returns 0. + * The caller should insert a new pte created with make_pte_marker(). + */ +static inline pte_marker copy_pte_marker( + swp_entry_t entry, struct vm_area_struct *dst_vma) +{ + pte_marker srcm = pte_marker_get(entry); + /* Always copy error entries. */ + pte_marker dstm = srcm & PTE_MARKER_POISONED; + + /* Only copy PTE markers if UFFD register matches. */ + if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma)) + dstm |= PTE_MARKER_UFFD_WP; + + return dstm; +} + /* * If this pte is wr-protected by uffd-wp in any form, arm the special pte to * replace a none pte. NOTE! This should only be called when *pte is already diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4c932cb45e0b..bff1e8d97de0 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -393,7 +393,12 @@ static inline bool is_migration_entry_dirty(swp_entry_t entry) typedef unsigned long pte_marker; #define PTE_MARKER_UFFD_WP BIT(0) -#define PTE_MARKER_SWAPIN_ERROR BIT(1) +/* + * "Poisoned" here is meant in the very general sense of "future accesses are + * invalid", instead of referring very specifically to hardware memory errors. + * This marker is meant to represent any of various different causes of this. + */ +#define PTE_MARKER_POISONED BIT(1) #define PTE_MARKER_MASK (BIT(2) - 1) static inline swp_entry_t make_pte_marker_entry(pte_marker marker) @@ -421,15 +426,15 @@ static inline pte_t make_pte_marker(pte_marker marker) return swp_entry_to_pte(make_pte_marker_entry(marker)); } -static inline swp_entry_t make_swapin_error_entry(void) +static inline swp_entry_t make_poisoned_swp_entry(void) { - return make_pte_marker_entry(PTE_MARKER_SWAPIN_ERROR); + return make_pte_marker_entry(PTE_MARKER_POISONED); } -static inline int is_swapin_error_entry(swp_entry_t entry) +static inline int is_poisoned_swp_entry(swp_entry_t entry) { return is_pte_marker_entry(entry) && - (pte_marker_get(entry) & PTE_MARKER_SWAPIN_ERROR); + (pte_marker_get(entry) & PTE_MARKER_POISONED); } /* -- cgit v1.2.3 From f92cedfa39ef208c9685d2fdd8215bb58178571b Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Sat, 8 Jul 2023 18:03:23 -0700 Subject: mm-make-pte_marker_swapin_error-more-general-fix fix CONFIG_MMU=n build Cc: Axel Rasmussen Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index a86c84600787..8148b30a9df1 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -523,6 +523,7 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } +#ifdef CONFIG_MMU /* * Computes the pte marker to copy from the given source entry into dst_vma. * If no marker should be copied, returns 0. @@ -541,6 +542,7 @@ static inline pte_marker copy_pte_marker( return dstm; } +#endif /* * If this pte is wr-protected by uffd-wp in any form, arm the special pte to -- cgit v1.2.3 From fc71884a5f599a603fcc3c2b28b3872c09d19c18 Mon Sep 17 00:00:00 2001 From: Axel Rasmussen Date: Fri, 7 Jul 2023 14:55:36 -0700 Subject: mm: userfaultfd: add new UFFDIO_POISON ioctl The basic idea here is to "simulate" memory poisoning for VMs. A VM running on some host might encounter a memory error, after which some page(s) are poisoned (i.e., future accesses SIGBUS). They expect that once poisoned, pages can never become "un-poisoned". So, when we live migrate the VM, we need to preserve the poisoned status of these pages. When live migrating, we try to get the guest running on its new host as quickly as possible. So, we start it running before all memory has been copied, and before we're certain which pages should be poisoned or not. So the basic way to use this new feature is: - On the new host, the guest's memory is registered with userfaultfd, in either MISSING or MINOR mode (doesn't really matter for this purpose). - On any first access, we get a userfaultfd event. At this point we can communicate with the old host to find out if the page was poisoned. - If so, we can respond with a UFFDIO_POISON - this places a swap marker so any future accesses will SIGBUS. Because the pte is now "present", future accesses won't generate more userfaultfd events, they'll just SIGBUS directly. UFFDIO_POISON does not handle unmapping previously-present PTEs. This isn't needed, because during live migration we want to intercept all accesses with userfaultfd (not just writes, so WP mode isn't useful for this). So whether minor or missing mode is being used (or both), the PTE won't be present in any case, so handling that case isn't needed. Similarly, UFFDIO_POISON won't replace existing PTE markers. This might be okay to do, but it seems to be safer to just refuse to overwrite any existing entry (like a UFFD_WP PTE marker). Link: https://lkml.kernel.org/r/20230707215540.2324998-5-axelrasmussen@google.com Signed-off-by: Axel Rasmussen Acked-by: Peter Xu Cc: Al Viro Cc: Brian Geffon Cc: Christian Brauner Cc: David Hildenbrand Cc: Gaosheng Cui Cc: Huang, Ying Cc: Hugh Dickins Cc: James Houghton Cc: Jan Alexander Steffens (heftig) Cc: Jiaqi Yan Cc: Jonathan Corbet Cc: Kefeng Wang Cc: Liam R. Howlett Cc: Miaohe Lin Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Muchun Song Cc: Nadav Amit Cc: Naoya Horiguchi Cc: Ryan Roberts Cc: Shuah Khan Cc: Suleiman Souhlal Cc: Suren Baghdasaryan Cc: T.J. Alumbaugh Cc: Yu Zhao Cc: ZhangPeng Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index ac7b0c96d351..ac8c6854097c 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -46,6 +46,7 @@ enum mfill_atomic_mode { MFILL_ATOMIC_COPY, MFILL_ATOMIC_ZEROPAGE, MFILL_ATOMIC_CONTINUE, + MFILL_ATOMIC_POISON, NR_MFILL_ATOMIC_MODES, }; @@ -83,6 +84,9 @@ extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t flags); +extern ssize_t mfill_atomic_poison(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t flags); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, atomic_t *mmap_changing); -- cgit v1.2.3 From d695c30a8ca07ac7e2138435b461b36289d5656e Mon Sep 17 00:00:00 2001 From: Peng Zhang Date: Tue, 11 Jul 2023 11:54:38 +0800 Subject: maple_tree: don't use MAPLE_ARANGE64_META_MAX to indicate no gap Patch series "Improve the validation for maple tree and some cleanup", v2. This patch (of 7): Do not use a special offset to indicate that there is no gap. When there is no gap, offset can point to any valid slots because its gap is 0. Link: https://lkml.kernel.org/r/20230711035444.526-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230711035444.526-3-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang Reviewed-by: Liam R. Howlett Tested-by: Geert Uytterhoeven Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 6e5bd2c9875d..7769270b85e8 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -29,14 +29,12 @@ #define MAPLE_NODE_SLOTS 31 /* 256 bytes including ->parent */ #define MAPLE_RANGE64_SLOTS 16 /* 256 bytes */ #define MAPLE_ARANGE64_SLOTS 10 /* 240 bytes */ -#define MAPLE_ARANGE64_META_MAX 15 /* Out of range for metadata */ #define MAPLE_ALLOC_SLOTS (MAPLE_NODE_SLOTS - 1) #else /* 32bit sizes */ #define MAPLE_NODE_SLOTS 63 /* 256 bytes including ->parent */ #define MAPLE_RANGE64_SLOTS 32 /* 256 bytes */ #define MAPLE_ARANGE64_SLOTS 21 /* 240 bytes */ -#define MAPLE_ARANGE64_META_MAX 31 /* Out of range for metadata */ #define MAPLE_ALLOC_SLOTS (MAPLE_NODE_SLOTS - 2) #endif /* defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) */ -- cgit v1.2.3 From a349d72fd9efc87c8fd1d16d3164752d84a7275b Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 11 Jul 2023 21:30:40 -0700 Subject: mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "mm: free retracted page table by RCU", v3. Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs: the usefulness of MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it currently requires. Likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. This patch (of 13): Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Link: https://lkml.kernel.org/r/7cd843a9-aa80-14f-5eb2-33427363c20@google.com Link: https://lkml.kernel.org/r/d3b01da5-2a6-833c-6681-67a3e024a16f@google.com Signed-off-by: Hugh Dickins Cc: Alexander Gordeev Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Christoph Hellwig Cc: Claudio Imbrenda Cc: David Hildenbrand Cc: "David S. Miller" Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Huang, Ying Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Ellerman Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Russell King Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Vasily Gorbik Cc: Vishal Moola (Oracle) Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5063b482e34f..5134edcec668 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif -- cgit v1.2.3 From 146b42e07494e45f7c7bcf2cbf7afd1424afd78e Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 11 Jul 2023 21:32:05 -0700 Subject: mm/pgtable: add PAE safety to __pte_offset_map() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Link: https://lkml.kernel.org/r/3adcd8f-9191-2df1-d7ea-c4877698aad@google.com Signed-off-by: Hugh Dickins Cc: Alexander Gordeev Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Christoph Hellwig Cc: Claudio Imbrenda Cc: David Hildenbrand Cc: "David S. Miller" Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Huang, Ying Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Ellerman Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Russell King Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Vasily Gorbik Cc: Vishal Moola (Oracle) Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5134edcec668..7f2db400f653 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -390,6 +390,7 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless +#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ @@ -408,6 +409,9 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } +static inline void pmdp_get_lockless_sync(void) +{ +} #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE -- cgit v1.2.3 From 13cf577e6b66a148d6d63f5ef7801f4b61d5850f Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 11 Jul 2023 21:39:48 -0700 Subject: mm/pgtable: add pte_free_defer() for pgtable as page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Link: https://lkml.kernel.org/r/78e921b0-b681-a1b0-dc20-44c9efa4ef3c@google.com Signed-off-by: Hugh Dickins Cc: Alexander Gordeev Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Christoph Hellwig Cc: Claudio Imbrenda Cc: David Hildenbrand Cc: "David S. Miller" Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Huang, Ying Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Ellerman Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Russell King Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Vasily Gorbik Cc: Vishal Moola (Oracle) Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 4 ++++ include/linux/pgtable.h | 2 ++ 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 51d04c1847c1..1fc4b9c2c8a6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -144,6 +144,10 @@ struct page { struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ + /* + * A PTE page table page might be freed by use of + * rcu_head: which overlays those two fields above. + */ unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgds only */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 7f2db400f653..9fa34be65159 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) -- cgit v1.2.3 From cf95e337cb63cfbf5c9ea1a1f64f9818b979e3b3 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 11 Jul 2023 21:48:48 -0700 Subject: mm: delete mmap_write_trylock() and vma_try_start_write() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Link: https://lkml.kernel.org/r/4e6db3d-e8e-73fb-1f2a-8de2dab2a87c@google.com Signed-off-by: Hugh Dickins Cc: Alexander Gordeev Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Christoph Hellwig Cc: Claudio Imbrenda Cc: David Hildenbrand Cc: "David S. Miller" Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Huang, Ying Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Ellerman Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Russell King Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Vasily Gorbik Cc: Vishal Moola (Oracle) Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d1ad22980ebe..bfb46483108c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -709,21 +709,6 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -748,8 +733,6 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e05e167dbd16..a5c63b6d7d46 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -118,16 +118,6 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) return ret; } -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret = down_write_trylock(&mm->mmap_lock) != 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); -- cgit v1.2.3 From 73e791d73877e90444afb6036d86ea37fd78584f Mon Sep 17 00:00:00 2001 From: Xueshi Hu Date: Wed, 12 Jul 2023 21:49:59 +0800 Subject: mm: remove clear_page_idle() All callers have now been converted to call folio_clear_idle(). Link: https://lkml.kernel.org/r/20230712134959.145373-1-xueshi.hu@smartx.com Signed-off-by: Xueshi Hu Reviewed-by: David Hildenbrand Cc: Charan Teja Kalla Cc: Michal Hocko Signed-off-by: Andrew Morton --- include/linux/page_idle.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 5cb7bd2078ec..d8f344840643 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -144,9 +144,4 @@ static inline void set_page_idle(struct page *page) { folio_set_idle(page_folio(page)); } - -static inline void clear_page_idle(struct page *page) -{ - folio_clear_idle(page_folio(page)); -} #endif /* _LINUX_MM_PAGE_IDLE_H */ -- cgit v1.2.3 From b79f8eb408d0468df0d6082ed958b67d94adce65 Mon Sep 17 00:00:00 2001 From: Jiaqi Yan Date: Thu, 13 Jul 2023 00:18:31 +0000 Subject: mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON Add the functionality, is_raw_hwpoison_page_in_hugepage, to tell if a raw page in a hugetlb folio is HWPOISON. This functionality relies on RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list becomes meaningless. is_raw_hwpoison_page_in_hugepage holds mf_mutex in order to synchronize with folio_set_hugetlb_hwpoison and folio_free_raw_hwp who iterate, insert, or delete entry in raw_hwp_list. llist itself doesn't ensure insertion and removal are synchornized with the llist_for_each_entry used by is_raw_hwpoison_page_in_hugepage (unless iterated entries are already deleted from the list). Caller can minimize the overhead of lock cycles by first checking HWPOISON flag of the folio. Exports this functionality to be immediately used in the read operation for hugetlbfs. Link: https://lkml.kernel.org/r/20230713001833.3778937-3-jiaqiyan@google.com Signed-off-by: Jiaqi Yan Reviewed-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Cc: James Houghton Cc: Muchun Song Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 9bc3c2d71b71..9f4bac3df59e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -997,6 +997,11 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +/* + * Check if a given raw @page in a hugepage is HWPOISON. + */ +bool is_raw_hwpoison_page_in_hugepage(struct page *page); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; -- cgit v1.2.3 From aa232204c4689427cefa55fe975692b57291523a Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:31 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear Remove unused addr in page_table_check_pte_clear and __page_table_check_pte_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- include/linux/pgtable.h | 2 +- 2 files changed, 5 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 01e16c7696ec..35c53c4b94d3 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -14,8 +14,7 @@ extern struct static_key_true page_table_check_disabled; extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); -void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr, - pte_t pte); +void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, @@ -46,13 +45,12 @@ static inline void page_table_check_free(struct page *page, unsigned int order) __page_table_check_zero(page, order); } -static inline void page_table_check_pte_clear(struct mm_struct *mm, - unsigned long addr, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pte_clear(mm, addr, pte); + __page_table_check_pte_clear(mm, pte); } static inline void page_table_check_pmd_clear(struct mm_struct *mm, @@ -123,8 +121,7 @@ static inline void page_table_check_free(struct page *page, unsigned int order) { } -static inline void page_table_check_pte_clear(struct mm_struct *mm, - unsigned long addr, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 9fa34be65159..a1ccb13c4853 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -322,7 +322,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = ptep_get(ptep); pte_clear(mm, address, ptep); - page_table_check_pte_clear(mm, address, pte); + page_table_check_pte_clear(mm, pte); return pte; } #endif -- cgit v1.2.3 From 1831414cd729a34af937d56ad684a66599de6344 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:32 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear Remove unused addr in page_table_check_pmd_clear and __page_table_check_pmd_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- include/linux/pgtable.h | 2 +- 2 files changed, 5 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 35c53c4b94d3..0f777bca5283 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -15,8 +15,7 @@ extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); -void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, - pmd_t pmd); +void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, pud_t pud); void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr, @@ -53,13 +52,12 @@ static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) __page_table_check_pte_clear(mm, pte); } -static inline void page_table_check_pmd_clear(struct mm_struct *mm, - unsigned long addr, pmd_t pmd) +static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pmd_clear(mm, addr, pmd); + __page_table_check_pmd_clear(mm, pmd); } static inline void page_table_check_pud_clear(struct mm_struct *mm, @@ -125,8 +123,7 @@ static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) { } -static inline void page_table_check_pmd_clear(struct mm_struct *mm, - unsigned long addr, pmd_t pmd) +static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a1ccb13c4853..3edef5ed008f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -425,7 +425,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, pmd_t pmd = *pmdp; pmd_clear(pmdp); - page_table_check_pmd_clear(mm, address, pmd); + page_table_check_pmd_clear(mm, pmd); return pmd; } -- cgit v1.2.3 From 931c38e16499a057e30a3033f4d6a9c242f0f156 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:33 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear Remove unused addr in __page_table_check_pud_clear and page_table_check_pud_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-6-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- include/linux/pgtable.h | 2 +- 2 files changed, 5 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 0f777bca5283..5c9dc848a1bc 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -16,8 +16,7 @@ extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); -void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, - pud_t pud); +void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte); void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, @@ -60,13 +59,12 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) __page_table_check_pmd_clear(mm, pmd); } -static inline void page_table_check_pud_clear(struct mm_struct *mm, - unsigned long addr, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pud_clear(mm, addr, pud); + __page_table_check_pud_clear(mm, pud); } static inline void page_table_check_pte_set(struct mm_struct *mm, @@ -127,8 +125,7 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) { } -static inline void page_table_check_pud_clear(struct mm_struct *mm, - unsigned long addr, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 3edef5ed008f..5f36c055794b 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -438,7 +438,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, pud_t pud = *pudp; pud_clear(pudp); - page_table_check_pud_clear(mm, address, pud); + page_table_check_pud_clear(mm, pud); return pud; } -- cgit v1.2.3 From 1066293d426d3000793c3c3b4276ef38b63ada4a Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:34 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pte_set Remove unused addr in __page_table_check_pte_set and page_table_check_pte_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-7-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 5c9dc848a1bc..63ebd9fcf28b 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -17,8 +17,7 @@ void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); -void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte); +void __page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte); void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, @@ -67,14 +66,13 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) __page_table_check_pud_clear(mm, pud); } -static inline void page_table_check_pte_set(struct mm_struct *mm, - unsigned long addr, pte_t *ptep, +static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pte_set(mm, addr, ptep, pte); + __page_table_check_pte_set(mm, ptep, pte); } static inline void page_table_check_pmd_set(struct mm_struct *mm, @@ -129,8 +127,7 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) { } -static inline void page_table_check_pte_set(struct mm_struct *mm, - unsigned long addr, pte_t *ptep, +static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte) { } -- cgit v1.2.3 From a3b837130b5865521fa8662aceaa6ebc8d29389a Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:35 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set Remove unused addr in __page_table_check_pmd_set and page_table_check_pmd_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-8-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 63ebd9fcf28b..dd58dfb0e643 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -18,8 +18,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte); -void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, - pmd_t *pmdp, pmd_t pmd); +void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud); void __page_table_check_pte_clear_range(struct mm_struct *mm, @@ -75,14 +74,13 @@ static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, __page_table_check_pte_set(mm, ptep, pte); } -static inline void page_table_check_pmd_set(struct mm_struct *mm, - unsigned long addr, pmd_t *pmdp, +static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pmd_set(mm, addr, pmdp, pmd); + __page_table_check_pmd_set(mm, pmdp, pmd); } static inline void page_table_check_pud_set(struct mm_struct *mm, @@ -132,8 +130,7 @@ static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, { } -static inline void page_table_check_pmd_set(struct mm_struct *mm, - unsigned long addr, pmd_t *pmdp, +static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd) { } -- cgit v1.2.3 From 6d144436d954311f2dbacb5bf7b084042448d83e Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Fri, 14 Jul 2023 01:26:36 +0800 Subject: mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set Remove unused addr in __page_table_check_pud_set and page_table_check_pud_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-9-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index dd58dfb0e643..7f6b9bf926c5 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -19,8 +19,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte); void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd); -void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, - pud_t *pudp, pud_t pud); +void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud); void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, pmd_t pmd); @@ -83,14 +82,13 @@ static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, __page_table_check_pmd_set(mm, pmdp, pmd); } -static inline void page_table_check_pud_set(struct mm_struct *mm, - unsigned long addr, pud_t *pudp, +static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pud_set(mm, addr, pudp, pud); + __page_table_check_pud_set(mm, pudp, pud); } static inline void page_table_check_pte_clear_range(struct mm_struct *mm, @@ -135,8 +133,7 @@ static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, { } -static inline void page_table_check_pud_set(struct mm_struct *mm, - unsigned long addr, pud_t *pudp, +static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud) { } -- cgit v1.2.3 From b23d03ef7af567929bcd3fca7eea6f4f347387d3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 13 Jul 2023 04:55:06 +0100 Subject: highmem: add memcpy_to_folio() and memcpy_from_folio() Patch series "More filesystem folio conversions for 6.6". Remove the only spots in affs which actually use a struct page; there are a few places where one is mentioned, but it's part of the interface. The rest of this is removing the remaining calls to set_bh_page(), and then removing the function before any new users show up. This patch (of 7): These are the folio equivalent of memcpy_to_page() and memcpy_from_page(). [agruenba@redhat.com: use correct chunk size in memcpy()] Link: https://lkml.kernel.org/r/20230802144354.1023099-1-agruenba@redhat.com Link: https://lkml.kernel.org/r/20230713035512.4139457-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230713035512.4139457-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Sterba Cc: Jan Kara Cc: Konstantin Komarov Cc: Pankaj Raghav Cc: "Theodore Ts'o" Cc: Alexander Viro Cc: Christian Brauner Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Tom Rix Cc: Andreas Gruenbacher Signed-off-by: Andrew Morton --- include/linux/highmem.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 68da30625a6c..99c474de800d 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -439,6 +439,50 @@ static inline void memzero_page(struct page *page, size_t offset, size_t len) kunmap_local(addr); } +static inline void memcpy_from_folio(char *to, struct folio *folio, + size_t offset, size_t len) +{ + VM_BUG_ON(offset + len > folio_size(folio)); + + do { + const char *from = kmap_local_folio(folio, offset); + size_t chunk = len; + + if (folio_test_highmem(folio) && + chunk > PAGE_SIZE - offset_in_page(offset)) + chunk = PAGE_SIZE - offset_in_page(offset); + memcpy(to, from, chunk); + kunmap_local(from); + + from += chunk; + offset += chunk; + len -= chunk; + } while (len > 0); +} + +static inline void memcpy_to_folio(struct folio *folio, size_t offset, + const char *from, size_t len) +{ + VM_BUG_ON(offset + len > folio_size(folio)); + + do { + char *to = kmap_local_folio(folio, offset); + size_t chunk = len; + + if (folio_test_highmem(folio) && + chunk > PAGE_SIZE - offset_in_page(offset)) + chunk = PAGE_SIZE - offset_in_page(offset); + memcpy(to, from, chunk); + kunmap_local(to); + + from += chunk; + offset += chunk; + len -= chunk; + } while (len > 0); + + flush_dcache_folio(folio); +} + /** * memcpy_from_file_folio - Copy some bytes from a file folio. * @to: The destination buffer. -- cgit v1.2.3 From 5f6d28622ffc7fa356b2745b088c831ebb8546b0 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 13 Jul 2023 04:55:12 +0100 Subject: buffer: remove set_bh_page() With all users converted to folio_set_bh(), remove this function. Link: https://lkml.kernel.org/r/20230713035512.4139457-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Jan Kara Cc: Alexander Viro Cc: Christian Brauner Cc: David Sterba Cc: Konstantin Komarov Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Pankaj Raghav Cc: "Theodore Ts'o" Cc: Tom Rix Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index a7377877ff4e..06566aee94ca 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -194,8 +194,6 @@ void buffer_check_dirty_writeback(struct folio *folio, void mark_buffer_dirty(struct buffer_head *bh); void mark_buffer_write_io_error(struct buffer_head *bh); void touch_buffer(struct buffer_head *bh); -void set_bh_page(struct buffer_head *bh, - struct page *page, unsigned long offset); void folio_set_bh(struct buffer_head *bh, struct folio *folio, unsigned long offset); bool try_to_free_buffers(struct folio *); -- cgit v1.2.3 From 016fec91013cfb27c9f5a101a87ae8266537fed1 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Thu, 6 Jul 2023 23:45:17 +0800 Subject: mm: move is_ioremap_addr() into new header file Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in mm/ioremap.c. Move it into its own new header file linux/ioremap.h. Link: https://lkml.kernel.org/r/20230706154520.11257-17-bhe@redhat.com Suggested-by: Christoph Hellwig Signed-off-by: Baoquan He Reviewed-by: Christoph Hellwig Cc: Alexander Gordeev Cc: Arnd Bergmann Cc: Brian Cain Cc: Catalin Marinas Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Chris Zankel Cc: David Laight Cc: Geert Uytterhoeven Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Helge Deller Cc: "James E.J. Bottomley" Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Kefeng Wang Cc: Matthew Wilcox Cc: Max Filippov Cc: Michael Ellerman Cc: Mike Rapoport (IBM) Cc: Nathan Chancellor Cc: Nicholas Piggin Cc: Niklas Schnelle Cc: Rich Felker Cc: Stafford Horne Cc: Stefan Kristiansson Cc: Sven Schnelle Cc: Vasily Gorbik Cc: Vineet Gupta Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/ioremap.h | 30 ++++++++++++++++++++++++++++++ include/linux/mm.h | 5 ----- 2 files changed, 30 insertions(+), 5 deletions(-) create mode 100644 include/linux/ioremap.h (limited to 'include/linux') diff --git a/include/linux/ioremap.h b/include/linux/ioremap.h new file mode 100644 index 000000000000..f0e99fc7dd8b --- /dev/null +++ b/include/linux/ioremap.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_IOREMAP_H +#define _LINUX_IOREMAP_H + +#include +#include + +#if defined(CONFIG_HAS_IOMEM) || defined(CONFIG_GENERIC_IOREMAP) +/* + * Ioremap often, but not always uses the generic vmalloc area. E.g on + * Power ARCH, it could have different ioremap space. + */ +#ifndef IOREMAP_START +#define IOREMAP_START VMALLOC_START +#define IOREMAP_END VMALLOC_END +#endif +static inline bool is_ioremap_addr(const void *x) +{ + unsigned long addr = (unsigned long)kasan_reset_tag(x); + + return addr >= IOREMAP_START && addr < IOREMAP_END; +} +#else +static inline bool is_ioremap_addr(const void *x) +{ + return false; +} +#endif + +#endif /* _LINUX_IOREMAP_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index bfb46483108c..0ae5654f665b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1055,11 +1055,6 @@ unsigned long vmalloc_to_pfn(const void *addr); * On nommu, vmalloc/vfree wrap through kmalloc/kfree directly, so there * is no special casing required. */ - -#ifndef is_ioremap_addr -#define is_ioremap_addr(x) is_vmalloc_addr(x) -#endif - #ifdef CONFIG_MMU extern bool is_vmalloc_addr(const void *x); extern int is_vmalloc_or_module_addr(const void *x); -- cgit v1.2.3 From f73419bb89d606de9be2043febf0957d56627a5b Mon Sep 17 00:00:00 2001 From: Barry Song Date: Mon, 17 Jul 2023 21:10:02 +0800 Subject: mm/tlbbatch: rename and extend some functions This patch does some preparation works to extend batched TLB flush to arm64. Including: - Extend set_tlb_ubc_flush_pending() and arch_tlbbatch_add_mm() to accept an additional argument for address, architectures like arm64 may need this for tlbi. - Rename arch_tlbbatch_add_mm() to arch_tlbbatch_add_pending() to match its current function since we don't need to handle mm on architectures like arm64 and add_mm is not proper, add_pending will make sense to both as on x86 we're pending the TLB flush operations while on arm64 we're pending the synchronize operations. This intends no functional changes on x86. Link: https://lkml.kernel.org/r/20230717131004.12662-3-yangyicong@huawei.com Tested-by: Yicong Yang Tested-by: Xin Hao Tested-by: Punit Agrawal Signed-off-by: Barry Song Signed-off-by: Yicong Yang Reviewed-by: Kefeng Wang Reviewed-by: Xin Hao Reviewed-by: Anshuman Khandual Reviewed-by: Catalin Marinas Cc: Jonathan Corbet Cc: Nadav Amit Cc: Mel Gorman Cc: Anshuman Khandual Cc: Arnd Bergmann Cc: Barry Song Cc: Darren Hart Cc: Jonathan Cameron Cc: lipeifeng Cc: Mark Rutland Cc: Peter Zijlstra Cc: Ryan Roberts Cc: Steven Miao Cc: Will Deacon Cc: Zeng Tao Signed-off-by: Andrew Morton --- include/linux/mm_types_task.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 5414b5c6a103..aa44fff8bb9d 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -52,8 +52,8 @@ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * The arch code makes the following promise: generic code can modify a - * PTE, then call arch_tlbbatch_add_mm() (which internally provides all - * needed barriers), then call arch_tlbbatch_flush(), and the entries + * PTE, then call arch_tlbbatch_add_pending() (which internally provides + * all needed barriers), then call arch_tlbbatch_flush(), and the entries * will be flushed on all CPUs by the time that arch_tlbbatch_flush() * returns. */ -- cgit v1.2.3 From aee79d4e5271cee4ffa89ed830189929a6272eb8 Mon Sep 17 00:00:00 2001 From: "Zhu, Lipeng" Date: Sun, 16 Jul 2023 22:56:54 +0800 Subject: fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing. When running UnixBench/Shell Scripts, we observed high false sharing for accessing i_mmap against i_mmap_rwsem. UnixBench/Shell Scripts are typical load/execute command test scenarios, which concurrently launch->execute->exit a lot of shell commands. A lot of processes invoke vma_interval_tree_remove which touch "i_mmap", the call stack: ----vma_interval_tree_remove |----unlink_file_vma | free_pgtables | |----exit_mmap | | mmput | | |----begin_new_exec | | | load_elf_binary | | | bprm_execve Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire the semaphore in order to access 'i_mmap'. In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in the same cacheline. The patch places the i_mmap and i_mmap_rwsem in separate cache lines to avoid this false sharing problem. With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids 112C/224T platform, the score improves by ~5.3%. And perf c2c tool shows the false sharing is resolved as expected, the symbol vma_interval_tree_remove disappeared in cache line 0 after this change. Baseline: ================================================= Shared Cache Line Distribution Pareto ================================================= ------------------------------------------------------------- 0 3729 5791 0 0 0xff19b3818445c740 ------------------------------------------------------------- 3.27% 3.02% 0.00% 0.00% 0x18 0 1 0xffffffffa194403b 604 483 389 692 203 [k] vma_interval_tree_insert [kernel.kallsyms] vma_interval_tree_insert+75 0 1 4.13% 3.63% 0.00% 0.00% 0x20 0 1 0xffffffffa19440a2 553 413 415 962 215 [k] vma_interval_tree_remove [kernel.kallsyms] vma_interval_tree_remove+18 0 1 2.04% 1.35% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1d6 1210 855 460 1229 222 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1 0.62% 1.85% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1bf 762 329 577 527 198 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1 0.48% 0.31% 0.00% 0.00% 0x28 0 1 0xffffffffa219a58c 1677 1476 733 1544 224 [k] down_write [kernel.kallsyms] down_write+28 0 1 0.05% 0.07% 0.00% 0.00% 0x28 0 1 0xffffffffa219a21d 1040 819 689 33 27 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1 0.00% 0.05% 0.00% 0.00% 0x28 0 1 0xffffffffa17707db 0 1005 786 1373 223 [k] up_write [kernel.kallsyms] up_write+27 0 1 0.00% 0.02% 0.00% 0.00% 0x28 0 1 0xffffffffa219a064 0 233 778 32 30 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1 33.82% 34.10% 0.00% 0.00% 0x30 0 1 0xffffffffa1770945 779 495 534 6011 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1 17.06% 15.28% 0.00% 0.00% 0x30 0 1 0xffffffffa1770915 593 438 468 2715 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1 3.54% 3.52% 0.00% 0.00% 0x30 0 1 0xffffffffa2199f84 881 601 583 1421 223 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+84 0 1 With this change: ------------------------------------------------------------- 0 556 838 0 0 0xff2780d7965d2780 ------------------------------------------------------------- 0.18% 0.60% 0.00% 0.00% 0x8 0 1 0xffffffffafff27b8 503 453 569 14 13 [k] do_dentry_open [kernel.kallsyms] do_dentry_open+456 0 1 0.54% 0.12% 0.00% 0.00% 0x8 0 1 0xffffffffaffc51ac 510 199 428 15 12 [k] hugepage_vma_check [kernel.kallsyms] hugepage_vma_check+252 0 1 1.80% 2.15% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1d6 1778 799 343 215 136 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1 0.54% 1.31% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1bf 547 296 528 91 71 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1 0.72% 0.72% 0.00% 0.00% 0x18 0 1 0xffffffffb079a58c 1479 1534 676 288 163 [k] down_write [kernel.kallsyms] down_write+28 0 1 0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffafd707db 0 2381 744 282 158 [k] up_write [kernel.kallsyms] up_write+27 0 1 0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffb079a064 0 239 518 6 6 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1 46.58% 47.02% 0.00% 0.00% 0x20 0 1 0xffffffffafd70945 704 403 499 1137 219 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1 23.92% 25.78% 0.00% 0.00% 0x20 0 1 0xffffffffafd70915 558 413 500 542 185 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1 v1->v2: change padding to exchange fields. Link: https://lkml.kernel.org/r/20230716145653.20122-1-lipeng.zhu@intel.com Signed-off-by: Lipeng Zhu Reviewed-by: Tim Chen Cc: Alexander Viro Cc: Christian Brauner Cc: Yu Ma Signed-off-by: Andrew Morton --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..bb00d37a6ca0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -447,11 +447,11 @@ struct address_space { atomic_t nr_thps; #endif struct rb_root_cached i_mmap; - struct rw_semaphore i_mmap_rwsem; unsigned long nrpages; pgoff_t writeback_index; const struct address_space_operations *a_ops; unsigned long flags; + struct rw_semaphore i_mmap_rwsem; errseq_t wb_err; spinlock_t private_lock; struct list_head private_list; -- cgit v1.2.3 From cabdf74e6b319c989eb8e812f1854291ae0af1c0 Mon Sep 17 00:00:00 2001 From: Peng Zhang Date: Tue, 18 Jul 2023 15:30:19 +0800 Subject: mm: kfence: allocate kfence_metadata at runtime kfence_metadata is currently a static array. For the purpose of allocating scalable __kfence_pool, we first change it to runtime allocation of metadata. Since the size of an object of kfence_metadata is 1160 bytes, we can save at least 72 pages (with default 256 objects) without enabling kfence. [akpm@linux-foundation.org: restore newline, per Marco] Link: https://lkml.kernel.org/r/20230718073019.52513-1-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/kfence.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 726857a4b680..401af4757514 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -59,15 +59,16 @@ static __always_inline bool is_kfence_address(const void *addr) } /** - * kfence_alloc_pool() - allocate the KFENCE pool via memblock + * kfence_alloc_pool_and_metadata() - allocate the KFENCE pool and KFENCE + * metadata via memblock */ -void __init kfence_alloc_pool(void); +void __init kfence_alloc_pool_and_metadata(void); /** * kfence_init() - perform KFENCE initialization at boot time * - * Requires that kfence_alloc_pool() was called before. This sets up the - * allocation gate timer, and requires that workqueues are available. + * Requires that kfence_alloc_pool_and_metadata() was called before. This sets + * up the allocation gate timer, and requires that workqueues are available. */ void __init kfence_init(void); @@ -223,7 +224,7 @@ bool __kfence_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *sla #else /* CONFIG_KFENCE */ static inline bool is_kfence_address(const void *addr) { return false; } -static inline void kfence_alloc_pool(void) { } +static inline void kfence_alloc_pool_and_metadata(void) { } static inline void kfence_init(void) { } static inline void kfence_shutdown_cache(struct kmem_cache *s) { } static inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { return NULL; } -- cgit v1.2.3 From affd26b1fbd67fceea70d9ceac40ff4815afbeb5 Mon Sep 17 00:00:00 2001 From: Sidhartha Kumar Date: Wed, 19 Jul 2023 11:41:45 -0700 Subject: mm/hugetlb: get rid of page_hstate() Convert the last page_hstate() user to use folio_hstate() so page_hstate() can be safely removed. Link: https://lkml.kernel.org/r/20230719184145.301911-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar Reviewed-by: Mike Kravetz Cc: Matthew Wilcox (Oracle) Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 9f4bac3df59e..0a393bc02f25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -841,11 +841,6 @@ static inline struct hstate *folio_hstate(struct folio *folio) return size_to_hstate(folio_size(folio)); } -static inline struct hstate *page_hstate(struct page *page) -{ - return folio_hstate(page_folio(page)); -} - static inline unsigned hstate_index_to_shift(unsigned index) { return hstates[index].order + PAGE_SHIFT; @@ -1062,11 +1057,6 @@ static inline struct hstate *folio_hstate(struct folio *folio) return NULL; } -static inline struct hstate *page_hstate(struct page *page) -{ - return NULL; -} - static inline struct hstate *size_to_hstate(unsigned long size) { return NULL; -- cgit v1.2.3 From 134d153c9346fc1b2842cacec8720da3a9667a11 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Fri, 14 Jul 2023 15:55:49 -0400 Subject: maple_tree: relax lockdep checks for on-stack trees To support early release of the maple tree locks, do not lockdep check the lock if it is set to NULL. This is intended for the special case on-stack use of tracking entries and not for general use. Link: https://lkml.kernel.org/r/20230714195551.894800-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Cc: Linus Torvalds Cc: Oliver Sang Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 7769270b85e8..6618c1512886 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -182,7 +182,9 @@ enum maple_type { #ifdef CONFIG_LOCKDEP typedef struct lockdep_map *lockdep_map_p; -#define mt_lock_is_held(mt) lock_is_held(mt->ma_external_lock) +#define mt_lock_is_held(mt) \ + (!(mt)->ma_external_lock || lock_is_held((mt)->ma_external_lock)) + #define mt_set_external_lock(mt, lock) \ (mt)->ma_external_lock = &(lock)->dep_map #else -- cgit v1.2.3 From 02fdb25fb41c563a3afbd4a97469c527a6c5abbf Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Wed, 5 Jul 2023 14:47:49 -0400 Subject: mm/mmap: change detached vma locking scheme Don't set the lock to the mm lock so that the detached VMA tree does not complain about being unlocked when the mmap_lock is dropped prior to freeing the tree. Introduce mt_on_stack() for setting the external lock to NULL only when LOCKDEP is used. Move the destroying of the detached tree outside the mmap lock all together. Link: https://lkml.kernel.org/r/20230719183142.ktgcmuj2pnlr3h3s@revolver Signed-off-by: Liam R. Howlett Cc: Linus Torvalds Cc: Oliver Sang Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 6618c1512886..e278b9598428 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -187,10 +187,13 @@ typedef struct lockdep_map *lockdep_map_p; #define mt_set_external_lock(mt, lock) \ (mt)->ma_external_lock = &(lock)->dep_map + +#define mt_on_stack(mt) (mt).ma_external_lock = NULL #else typedef struct { /* nothing */ } lockdep_map_p; #define mt_lock_is_held(mt) 1 #define mt_set_external_lock(mt, lock) do { } while (0) +#define mt_on_stack(mt) do { } while (0) #endif /* -- cgit v1.2.3 From 19a462f06eb5a78e0c3ebe4fd4fbdc71620b8788 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Fri, 14 Jul 2023 15:55:51 -0400 Subject: maple_tree: Be more strict about locking Use lockdep to check the write path in the maple tree holds the lock in write mode. Introduce mt_write_lock_is_held() to check if the lock is held for writing. Update the necessary checks for rcu_dereference_protected() to use the new write lock check. Link: https://lkml.kernel.org/r/20230714195551.894800-5-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Cc: Linus Torvalds Cc: Oliver Sang Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index e278b9598428..949f911bf955 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -185,13 +185,18 @@ typedef struct lockdep_map *lockdep_map_p; #define mt_lock_is_held(mt) \ (!(mt)->ma_external_lock || lock_is_held((mt)->ma_external_lock)) +#define mt_write_lock_is_held(mt) \ + (!(mt)->ma_external_lock || \ + lock_is_held_type((mt)->ma_external_lock, 0)) + #define mt_set_external_lock(mt, lock) \ (mt)->ma_external_lock = &(lock)->dep_map #define mt_on_stack(mt) (mt).ma_external_lock = NULL #else typedef struct { /* nothing */ } lockdep_map_p; -#define mt_lock_is_held(mt) 1 +#define mt_lock_is_held(mt) 1 +#define mt_write_lock_is_held(mt) 1 #define mt_set_external_lock(mt, lock) do { } while (0) #define mt_on_stack(mt) do { } while (0) #endif -- cgit v1.2.3 From ec8832d007cb7b50229ad5745eec35b847cc9120 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Tue, 25 Jul 2023 23:42:06 +1000 Subject: mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() Secondary TLBs are now invalidated from the architecture specific TLB invalidation functions. Therefore there is no need to explicitly notify or invalidate as part of the range end functions. This means we can remove mmu_notifier_invalidate_range_end_only() and some of the ptep_*_notify() functions. Link: https://lkml.kernel.org/r/90d749d03cbab256ca0edeb5287069599566d783.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Reviewed-by: Jason Gunthorpe Cc: Andrew Donnellan Cc: Catalin Marinas Cc: Chaitanya Kumar Borah Cc: Frederic Barrat Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kevin Tian Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nicolin Chen Cc: Robin Murphy Cc: Sean Christopherson Cc: SeongJae Park Cc: Tvrtko Ursulin Cc: Will Deacon Cc: Zhi Wang Signed-off-by: Andrew Morton --- include/linux/mmu_notifier.h | 56 ++------------------------------------------ 1 file changed, 2 insertions(+), 54 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..f2e9edc6aa43 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -395,8 +395,7 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm, extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); -extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r, - bool only_end); +extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r); extern void __mmu_notifier_invalidate_range(struct mm_struct *mm, unsigned long start, unsigned long end); extern bool @@ -481,14 +480,7 @@ mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range) might_sleep(); if (mm_has_notifiers(range->mm)) - __mmu_notifier_invalidate_range_end(range, false); -} - -static inline void -mmu_notifier_invalidate_range_only_end(struct mmu_notifier_range *range) -{ - if (mm_has_notifiers(range->mm)) - __mmu_notifier_invalidate_range_end(range, true); + __mmu_notifier_invalidate_range_end(range); } static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, @@ -582,45 +574,6 @@ static inline void mmu_notifier_range_init_owner( __young; \ }) -#define ptep_clear_flush_notify(__vma, __address, __ptep) \ -({ \ - unsigned long ___addr = __address & PAGE_MASK; \ - struct mm_struct *___mm = (__vma)->vm_mm; \ - pte_t ___pte; \ - \ - ___pte = ptep_clear_flush(__vma, __address, __ptep); \ - mmu_notifier_invalidate_range(___mm, ___addr, \ - ___addr + PAGE_SIZE); \ - \ - ___pte; \ -}) - -#define pmdp_huge_clear_flush_notify(__vma, __haddr, __pmd) \ -({ \ - unsigned long ___haddr = __haddr & HPAGE_PMD_MASK; \ - struct mm_struct *___mm = (__vma)->vm_mm; \ - pmd_t ___pmd; \ - \ - ___pmd = pmdp_huge_clear_flush(__vma, __haddr, __pmd); \ - mmu_notifier_invalidate_range(___mm, ___haddr, \ - ___haddr + HPAGE_PMD_SIZE); \ - \ - ___pmd; \ -}) - -#define pudp_huge_clear_flush_notify(__vma, __haddr, __pud) \ -({ \ - unsigned long ___haddr = __haddr & HPAGE_PUD_MASK; \ - struct mm_struct *___mm = (__vma)->vm_mm; \ - pud_t ___pud; \ - \ - ___pud = pudp_huge_clear_flush(__vma, __haddr, __pud); \ - mmu_notifier_invalidate_range(___mm, ___haddr, \ - ___haddr + HPAGE_PUD_SIZE); \ - \ - ___pud; \ -}) - /* * set_pte_at_notify() sets the pte _after_ running the notifier. * This is safe to start by updating the secondary MMUs, because the primary MMU @@ -711,11 +664,6 @@ void mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range) { } -static inline void -mmu_notifier_invalidate_range_only_end(struct mmu_notifier_range *range) -{ -} - static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, unsigned long start, unsigned long end) { -- cgit v1.2.3 From 1af5a8109904b7f00828e7f9f63f5695b42f8215 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Tue, 25 Jul 2023 23:42:07 +1000 Subject: mmu_notifiers: rename invalidate_range notifier There are two main use cases for mmu notifiers. One is by KVM which uses mmu_notifier_invalidate_range_start()/end() to manage a software TLB. The other is to manage hardware TLBs which need to use the invalidate_range() callback because HW can establish new TLB entries at any time. Hence using start/end() can lead to memory corruption as these callbacks happen too soon/late during page unmap. mmu notifier users should therefore either use the start()/end() callbacks or the invalidate_range() callbacks. To make this usage clearer rename the invalidate_range() callback to arch_invalidate_secondary_tlbs() and update documention. Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Suggested-by: Jason Gunthorpe Acked-by: Catalin Marinas Reviewed-by: Jason Gunthorpe Cc: Andrew Donnellan Cc: Chaitanya Kumar Borah Cc: Frederic Barrat Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kevin Tian Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nicolin Chen Cc: Robin Murphy Cc: Sean Christopherson Cc: SeongJae Park Cc: Tvrtko Ursulin Cc: Will Deacon Cc: Zhi Wang Signed-off-by: Andrew Morton --- include/linux/mmu_notifier.h | 48 ++++++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index f2e9edc6aa43..6e3c857606f1 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -187,27 +187,27 @@ struct mmu_notifier_ops { const struct mmu_notifier_range *range); /* - * invalidate_range() is either called between - * invalidate_range_start() and invalidate_range_end() when the - * VM has to free pages that where unmapped, but before the - * pages are actually freed, or outside of _start()/_end() when - * a (remote) TLB is necessary. + * arch_invalidate_secondary_tlbs() is used to manage a non-CPU TLB + * which shares page-tables with the CPU. The + * invalidate_range_start()/end() callbacks should not be implemented as + * invalidate_secondary_tlbs() already catches the points in time when + * an external TLB needs to be flushed. * - * If invalidate_range() is used to manage a non-CPU TLB with - * shared page-tables, it not necessary to implement the - * invalidate_range_start()/end() notifiers, as - * invalidate_range() already catches the points in time when an - * external TLB range needs to be flushed. For more in depth - * discussion on this see Documentation/mm/mmu_notifier.rst + * This requires arch_invalidate_secondary_tlbs() to be called while + * holding the ptl spin-lock and therefore this callback is not allowed + * to sleep. * - * Note that this function might be called with just a sub-range - * of what was passed to invalidate_range_start()/end(), if - * called between those functions. + * This is called by architecture code whenever invalidating a TLB + * entry. It is assumed that any secondary TLB has the same rules for + * when invalidations are required. If this is not the case architecture + * code will need to call this explicitly when required for secondary + * TLB invalidation. */ - void (*invalidate_range)(struct mmu_notifier *subscription, - struct mm_struct *mm, - unsigned long start, - unsigned long end); + void (*arch_invalidate_secondary_tlbs)( + struct mmu_notifier *subscription, + struct mm_struct *mm, + unsigned long start, + unsigned long end); /* * These callbacks are used with the get/put interface to manage the @@ -396,8 +396,8 @@ extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r); -extern void __mmu_notifier_invalidate_range(struct mm_struct *mm, - unsigned long start, unsigned long end); +extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm, + unsigned long start, unsigned long end); extern bool mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range); @@ -483,11 +483,11 @@ mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range) __mmu_notifier_invalidate_range_end(range); } -static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, - unsigned long start, unsigned long end) +static inline void mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm, + unsigned long start, unsigned long end) { if (mm_has_notifiers(mm)) - __mmu_notifier_invalidate_range(mm, start, end); + __mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm) @@ -664,7 +664,7 @@ void mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range) { } -static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, +static inline void mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm, unsigned long start, unsigned long end) { } -- cgit v1.2.3 From ea09800bf17561a0d20498923d766468c0d7a4a7 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 20 Jul 2023 19:28:06 +0800 Subject: mm: fix obsolete function name above debug_pagealloc_enabled_static() Since commit 04013513cc84 ("mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters"), init_debug_pagealloc() is converted to init_mem_debugging_and_hardening(). Later it's renamed to mem_debugging_and_hardening_init() via commit f2fc4b44ec2b ("mm: move init_mem_debugging_and_hardening() to mm/mm_init.c"). Link: https://lkml.kernel.org/r/20230720112806.3851893-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0ae5654f665b..84988d4ff8fb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3465,8 +3465,8 @@ static inline bool debug_pagealloc_enabled(void) } /* - * For use in fast paths after init_debug_pagealloc() has run, or when a - * false negative result is not harmful when called too early. + * For use in fast paths after mem_debugging_and_hardening_init() has run, + * or when a false negative result is not harmful when called too early. */ static inline bool debug_pagealloc_enabled_static(void) { -- cgit v1.2.3 From 6d2790d95d7cffaf0def36270032ce5228dd43a5 Mon Sep 17 00:00:00 2001 From: ZhangPeng Date: Fri, 21 Jul 2023 11:44:44 +0800 Subject: mm/page_io: introduce bio_first_folio_all() Introduce bio_first_folio_all() to return a folio, which makes it easier to use. Link: https://lkml.kernel.org/r/20230721034451.16412-4-zhangpeng362@huawei.com Signed-off-by: ZhangPeng Suggested-by: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Kefeng Wang Cc: Nanyong Sun Cc: Sidhartha Kumar Signed-off-by: Andrew Morton --- include/linux/bio.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index c4f5b5228105..027ff9ab5d12 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -253,6 +253,11 @@ static inline struct page *bio_first_page_all(struct bio *bio) return bio_first_bvec_all(bio)->bv_page; } +static inline struct folio *bio_first_folio_all(struct bio *bio) +{ + return page_folio(bio_first_page_all(bio)); +} + static inline struct bio_vec *bio_last_bvec_all(struct bio *bio) { WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); -- cgit v1.2.3 From 90717566f8f6b6761494ccfff43ea62af443fc9b Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Thu, 20 Jul 2023 21:34:36 +0200 Subject: mm: don't drop VMA locks in mm_drop_all_locks() Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap lock is held write-locked by the caller, and the caller is responsible for dropping the mmap lock at a later point (which will also release the VMA locks). Calling vma_end_write_all() here is dangerous because the caller might have write-locked a VMA with the expectation that it will stay write-locked until the mmap_lock is released, as usual. This _almost_ becomes a problem in the following scenario: An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. Userspace calls munmap() on a range starting at the start address of A and ending in the middle of B. Hypothetical call graph with additional notes in brackets: do_vmi_align_munmap [begin first for_each_vma_range loop] vma_start_write [on VMA A] vma_mark_detached [on VMA A] __split_vma [on VMA B] sgx_vma_open [== new->vm_ops->open] sgx_encl_mm_add __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] mm_take_all_locks mm_drop_all_locks vma_end_write_all [drops VMA lock taken on VMA A before] vma_start_write [on VMA B] vma_mark_detached [on VMA B] [end first for_each_vma_range loop] vma_iter_clear_gfp [removes VMAs from maple tree] mmap_write_downgrade unmap_region mmap_read_unlock In this hypothetical scenario, while do_vmi_align_munmap() thinks it still holds a VMA write lock on VMA A, the VMA write lock has actually been invalidated inside __split_vma(). The call from sgx_encl_mm_add() to __mmu_notifier_register() can't actually happen here, as far as I understand, because we are duplicating an existing SGX VMA, but sgx_encl_mm_add() only calls __mmu_notifier_register() for the first SGX VMA created in a given process. So this could only happen in fork(), not on munmap(). But in my view it is just pure luck that this can't happen. Also, we wouldn't actually have any bad consequences from this in do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, we've already marked VMA A as detached, which makes it completely ineligible for any VMA-locked page faults. But again, that's just pure luck. So remove the vma_end_write_all(), so that VMA write locks are only ever released on mmap_write_unlock() or mmap_write_downgrade(). Also add comments to document the locking rules established by this patch. Link: https://lkml.kernel.org/r/20230720193436.454247-1-jannh@google.com Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") Signed-off-by: Jann Horn Reviewed-by: Suren Baghdasaryan Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 +++++ include/linux/mmap_lock.h | 8 ++++++++ 2 files changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 84988d4ff8fb..93eb291181f7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -691,6 +691,11 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) return (vma->vm_lock_seq == *mm_lock_seq); } +/* + * Begin writing to a VMA. + * Exclude concurrent readers under the per-VMA lock until the currently + * write-locked mmap_lock is dropped or downgraded. + */ static inline void vma_start_write(struct vm_area_struct *vma) { int mm_lock_seq; diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index a5c63b6d7d46..8d38dcb6d044 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -73,6 +73,14 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) } #ifdef CONFIG_PER_VMA_LOCK +/* + * Drop all currently-held per-VMA locks. + * This is called from the mmap_lock implementation directly before releasing + * a write-locked mmap_lock (or downgrading it to read-locked). + * This should normally NOT be called manually from other places. + * If you want to call this manually anyway, keep in mind that this will release + * *all* VMA write locks, including ones from further up the stack. + */ static inline void vma_end_write_all(struct mm_struct *mm) { mmap_assert_write_locked(mm); -- cgit v1.2.3 From fd892593d44d8b649caf30a67f0c7696d976d901 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Mon, 24 Jul 2023 14:31:45 -0400 Subject: mm: change do_vmi_align_munmap() tracking of VMAs to remove The majority of the calls to munmap a vm range is within a single vma. The maple tree is able to store a single entry at 0, with a size of 1 as a pointer and avoid any allocations. Change do_vmi_align_munmap() to store the VMAs being munmap()'ed into a tree indexed by the count. This will leverage the ability to store the first entry without a node allocation. Storing the entries into a tree by the count and not the vma start and end means changing the functions which iterate over the entries. Update unmap_vmas() and free_pgtables() to take a maple state and a tree end address to support this functionality. Passing through the same maple state to unmap_vmas() and free_pgtables() means the state needs to be reset between calls. This happens in the static unmap_region() and exit_mmap(). Link: https://lkml.kernel.org/r/20230724183157.3939892-4-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Cc: Peng Zhang Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 93eb291181f7..ded514ee2588 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2287,9 +2287,9 @@ static inline void zap_vma_pages(struct vm_area_struct *vma) zap_page_range_single(vma, vma->vm_start, vma->vm_end - vma->vm_start, NULL); } -void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, +void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *start_vma, unsigned long start, - unsigned long end, bool mm_wr_locked); + unsigned long end, unsigned long tree_end, bool mm_wr_locked); struct mmu_notifier_range; -- cgit v1.2.3 From c1297987cc2ada57a7faea7985c2334548d110f9 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Mon, 24 Jul 2023 14:31:47 -0400 Subject: maple_tree: introduce __mas_set_range() mas_set_range() resets the node to MAS_START, which will cause a re-walk of the tree to the range. This is unnecessary when the maple state is already at the correct location of the write. Add a function that only sets the range to avoid unnecessary re-walking of the tree. Link: https://lkml.kernel.org/r/20230724183157.3939892-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Cc: Peng Zhang Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 949f911bf955..e10db656e31c 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -539,6 +539,22 @@ static inline void mas_reset(struct ma_state *mas) */ #define mas_for_each(__mas, __entry, __max) \ while (((__entry) = mas_find((__mas), (__max))) != NULL) +/** + * __mas_set_range() - Set up Maple Tree operation state to a sub-range of the + * current location. + * @mas: Maple Tree operation state. + * @start: New start of range in the Maple Tree. + * @last: New end of range in the Maple Tree. + * + * set the internal maple state values to a sub-range. + * Please use mas_set_range() if you do not know where you are in the tree. + */ +static inline void __mas_set_range(struct ma_state *mas, unsigned long start, + unsigned long last) +{ + mas->index = start; + mas->last = last; +} /** * mas_set_range() - Set up Maple Tree operation state for a different index. @@ -553,9 +569,8 @@ static inline void mas_reset(struct ma_state *mas) static inline void mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last) { - mas->index = start; - mas->last = last; - mas->node = MAS_START; + __mas_set_range(mas, start, last); + mas->node = MAS_START; } /** -- cgit v1.2.3 From da0892547b101df6e13255b378380d077975368d Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Mon, 24 Jul 2023 14:31:49 -0400 Subject: maple_tree: re-introduce entry to mas_preallocate() arguments The current preallocation strategy is to preallocate the absolute worst-case allocation for a tree modification. The entry (or NULL) is needed to know how many nodes are needed to write to the tree. Start by adding the argument to the mas_preallocate() definition. Link: https://lkml.kernel.org/r/20230724183157.3939892-8-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Cc: Peng Zhang Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index e10db656e31c..c962af188681 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -466,7 +466,7 @@ void *mas_find(struct ma_state *mas, unsigned long max); void *mas_find_range(struct ma_state *mas, unsigned long max); void *mas_find_rev(struct ma_state *mas, unsigned long min); void *mas_find_range_rev(struct ma_state *mas, unsigned long max); -int mas_preallocate(struct ma_state *mas, gfp_t gfp); +int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp); bool mas_is_err(struct ma_state *mas); bool mas_nomem(struct ma_state *mas, gfp_t gfp); -- cgit v1.2.3 From 284e05920498788c5df1a7dd6424adb426498e1c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 24 Jul 2023 19:54:01 +0100 Subject: mm: remove CONFIG_PER_VMA_LOCK ifdefs Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Suren Baghdasaryan Cc: Punit Agrawal Cc: Arjun Roy Cc: Eric Dumazet Signed-off-by: Andrew Morton --- include/linux/mm.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index ded514ee2588..21299a0cfbca 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -742,6 +742,12 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} +static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address) +{ + return NULL; +} + #endif /* CONFIG_PER_VMA_LOCK */ /* -- cgit v1.2.3 From 350f6bbca1de515cd7519a33661cefc93ea06054 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 24 Jul 2023 19:54:02 +0100 Subject: mm: allow per-VMA locks on file-backed VMAs Remove the TCP layering violation by allowing per-VMA locks on all VMAs. The fault path will immediately fail in handle_mm_fault(). There may be a small performance reduction from this patch as a little unnecessary work will be done on each page fault. See later patches for the improvement. Link: https://lkml.kernel.org/r/20230724185410.1124082-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Suren Baghdasaryan Cc: Arjun Roy Cc: Eric Dumazet Cc: Punit Agrawal Signed-off-by: Andrew Morton --- include/linux/net_mm.h | 17 ----------------- 1 file changed, 17 deletions(-) delete mode 100644 include/linux/net_mm.h (limited to 'include/linux') diff --git a/include/linux/net_mm.h b/include/linux/net_mm.h deleted file mode 100644 index b298998bd5a0..000000000000 --- a/include/linux/net_mm.h +++ /dev/null @@ -1,17 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -#ifdef CONFIG_MMU - -#ifdef CONFIG_INET -extern const struct vm_operations_struct tcp_vm_ops; -static inline bool vma_is_tcp(const struct vm_area_struct *vma) -{ - return vma->vm_ops == &tcp_vm_ops; -} -#else -static inline bool vma_is_tcp(const struct vm_area_struct *vma) -{ - return false; -} -#endif /* CONFIG_INET*/ - -#endif /* CONFIG_MMU */ -- cgit v1.2.3 From 348ad1606f4c09e3dc28092baac474e10a252471 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:47 +0530 Subject: mm/hugepage pud: allow arch-specific helper function to check huge page pud support Patch series "Add support for DAX vmemmap optimization for ppc64", v6. This patch series implements changes required to support DAX vmemmap optimization for ppc64. The vmemmap optimization is only enabled with radix MMU translation and 1GB PUD mapping with 64K page size. The patch series also splits the hugetlb vmemmap optimization as a separate Kconfig variable so that architectures can enable DAX vmemmap optimization without enabling hugetlb vmemmap optimization. This should enable architectures like arm64 to enable DAX vmemmap optimization while they can't enable hugetlb vmemmap optimization. More details of the same are in patch "mm/vmemmap optimization: Split hugetlb and devdax vmemmap optimization". With 64K page size for 16384 pages added (1G) we save 14 pages With 4K page size for 262144 pages added (1G) we save 4094 pages With 4K page size for 512 pages added (2M) we save 6 pages This patch (of 13): Architectures like powerpc would like to enable transparent huge page pud support only with radix translation. To support that add has_transparent_pud_hugepage() helper that architectures can override. [aneesh.kumar@linux.ibm.com: use the new has_transparent_pud_hugepage()] Link: https://lkml.kernel.org/r/87tttrvtaj.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20230724190759.483013-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20230724190759.483013-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: Christophe Leroy Cc: Catalin Marinas Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5f36c055794b..5eb6bdf30c62 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1505,6 +1505,9 @@ typedef unsigned int pgtbl_mod_mask; #define has_transparent_hugepage() IS_BUILTIN(CONFIG_TRANSPARENT_HUGEPAGE) #endif +#ifndef has_transparent_pud_hugepage +#define has_transparent_pud_hugepage() IS_BUILTIN(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#endif /* * On some architectures it depends on the mm if the p4d/pud or pmd * layer of the page table hierarchy is folded or not. -- cgit v1.2.3 From f32928ab6fe5abac5a270b6c0bffc4ce77ee8c42 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:48 +0530 Subject: mm: change pudp_huge_get_and_clear_full take vm_area_struct as arg We will use this in a later patch to do tlb flush when clearing pud entries on powerpc. This is similar to commit 93a98695f2f9 ("mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg") Link: https://lkml.kernel.org/r/20230724190759.483013-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: Christophe Leroy Cc: Catalin Marinas Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5eb6bdf30c62..124427ece520 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -456,11 +456,11 @@ static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, #endif #ifndef __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL -static inline pud_t pudp_huge_get_and_clear_full(struct mm_struct *mm, +static inline pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, int full) { - return pudp_huge_get_and_clear(mm, address, pudp); + return pudp_huge_get_and_clear(vma->vm_mm, address, pudp); } #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- cgit v1.2.3 From c1a6c536fb088c01d6bdce77731d89ad5e1734c6 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:49 +0530 Subject: mm/vmemmap: improve vmemmap_can_optimize and allow architectures to override dax vmemmap optimization requires a minimum of 2 PAGE_SIZE area within vmemmap such that tail page mapping can point to the second PAGE_SIZE area. Enforce that in vmemmap_can_optimize() function. Architectures like powerpc also want to enable vmemmap optimization conditionally (only with radix MMU translation). Hence allow architecture override. Link: https://lkml.kernel.org/r/20230724190759.483013-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: Christophe Leroy Cc: Catalin Marinas Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 21299a0cfbca..d4ce73c20dcc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3632,13 +3632,32 @@ void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif +#define VMEMMAP_RESERVE_NR 2 #ifdef CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP -static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, - struct dev_pagemap *pgmap) +static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) { - return is_power_of_2(sizeof(struct page)) && - pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap; + unsigned long nr_pages; + unsigned long nr_vmemmap_pages; + + if (!pgmap || !is_power_of_2(sizeof(struct page))) + return false; + + nr_pages = pgmap_vmemmap_nr(pgmap); + nr_vmemmap_pages = ((nr_pages * sizeof(struct page)) >> PAGE_SHIFT); + /* + * For vmemmap optimization with DAX we need minimum 2 vmemmap + * pages. See layout diagram in Documentation/mm/vmemmap_dedup.rst + */ + return !altmap && (nr_vmemmap_pages > VMEMMAP_RESERVE_NR); } +/* + * If we don't have an architecture override, use the generic rule + */ +#ifndef vmemmap_can_optimize +#define vmemmap_can_optimize __vmemmap_can_optimize +#endif + #else static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap) -- cgit v1.2.3 From 973bf6800cf37802ca48292378d574f2a689de9b Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:51 +0530 Subject: mm: add pud_same similar to __HAVE_ARCH_P4D_SAME This helps architectures to override pmd_same and pud_same independently. Link: https://lkml.kernel.org/r/20230724190759.483013-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Christophe Leroy Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 124427ece520..0af8bc4ce258 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -699,11 +699,14 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) { return pmd_val(pmd_a) == pmd_val(pmd_b); } +#endif +#ifndef pud_same static inline int pud_same(pud_t pud_a, pud_t pud_b) { return pud_val(pud_a) == pud_val(pud_b); } +#define pud_same pud_same #endif #ifndef __HAVE_ARCH_P4D_SAME -- cgit v1.2.3 From 54a948a1e97a9d19a0a4bed63a2d4caef30c5d17 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:52 +0530 Subject: mm/huge pud: use transparent huge pud helpers only with CONFIG_TRANSPARENT_HUGEPAGE pudp_set_wrprotect and move_huge_pud helpers are only used when CONFIG_TRANSPARENT_HUGEPAGE is enabled. Similar to pmdp_set_wrprotect and move_huge_pmd_helpers use architecture override only if CONFIG_TRANSPARENT_HUGEPAGE is set Link: https://lkml.kernel.org/r/20230724190759.483013-7-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: Christophe Leroy Cc: Catalin Marinas Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0af8bc4ce258..f34e0f2cb4d8 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -564,6 +564,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, #endif #ifndef __HAVE_ARCH_PUDP_SET_WRPROTECT #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline void pudp_set_wrprotect(struct mm_struct *mm, unsigned long address, pud_t *pudp) { @@ -577,6 +578,7 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm, { BUILD_BUG(); } +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ #endif -- cgit v1.2.3 From 0b6f15824cc7e431a9706c78bfb9cb3011477ad3 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 25 Jul 2023 00:37:53 +0530 Subject: mm/vmemmap optimization: split hugetlb and devdax vmemmap optimization Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap optimization includes an update of both the permissions (writeable to read-only) and the output address (pfn) of the vmemmap ptes. That is not supported without unmapping of pte(marking it invalid) by some architectures. With DAX vmemmap optimization we don't require such pte updates and architectures can enable DAX vmemmap optimization while having hugetlb vmemmap optimization disabled. Hence split DAX optimization support into a different config. s390, loongarch and riscv don't have devdax support. So the DAX config is not enabled for them. With this change, arm64 should be able to select DAX optimization [1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Christophe Leroy Cc: Dan Williams Cc: Joao Martins Cc: Michael Ellerman Cc: Mike Kravetz Cc: Muchun Song Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d4ce73c20dcc..b2520dd555f9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3633,7 +3633,7 @@ void vmemmap_free(unsigned long start, unsigned long end, #endif #define VMEMMAP_RESERVE_NR 2 -#ifdef CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP +#ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap) { -- cgit v1.2.3 From b229baa374dbb1bb47dccad2cc8887d9438ca3e7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 19 Jul 2023 00:11:44 +0300 Subject: kernel.h: split out COUNT_ARGS() and CONCATENATE() to args.h Patch series "kernel.h: Split out a couple of macros to args.h", v4. There are macros in kernel.h that can be used outside of that header. Split them to args.h and replace open coded variants. This patch (of 4): kernel.h is being used as a dump for all kinds of stuff for a long time. The COUNT_ARGS() and CONCATENATE() macros may be used in some places without need of the full kernel.h dependency train with it. Here is the attempt on cleaning it up by splitting out these macros(). While at it, include new header where it's being used. Link: https://lkml.kernel.org/r/20230718211147.18647-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20230718211147.18647-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Acked-by: Steven Rostedt (Google) Cc: Bjorn Helgaas Acked-by: Bjorn Helgaas [PCI] Cc: Brendan Higgins Cc: Daniel Latypov Cc: Dave Hansen Cc: David Gow Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Lorenzo Pieralisi Cc: Mark Rutland Cc: Masami Hiramatsu (Google) Cc: Shuah Khan Cc: Sudeep Holla Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/args.h | 28 ++++++++++++++++++++++++++++ include/linux/kernel.h | 7 ------- include/linux/pci.h | 2 +- 3 files changed, 29 insertions(+), 8 deletions(-) create mode 100644 include/linux/args.h (limited to 'include/linux') diff --git a/include/linux/args.h b/include/linux/args.h new file mode 100644 index 000000000000..8ff60a54eb7d --- /dev/null +++ b/include/linux/args.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_ARGS_H +#define _LINUX_ARGS_H + +/* + * How do these macros work? + * + * In __COUNT_ARGS() _0 to _12 are just placeholders from the start + * in order to make sure _n is positioned over the correct number + * from 12 to 0 (depending on X, which is a variadic argument list). + * They serve no purpose other than occupying a position. Since each + * macro parameter must have a distinct identifier, those identifiers + * are as good as any. + * + * In COUNT_ARGS() we use actual integers, so __COUNT_ARGS() returns + * that as _n. + */ + +/* This counts to 12. Any more, it will return 13th argument. */ +#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n +#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0) + +/* Concatenate two parameters, but allow them to be expanded beforehand. */ +#define __CONCAT(a, b) a ## b +#define CONCATENATE(a, b) __CONCAT(a, b) + +#endif /* _LINUX_ARGS_H */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 0d91e0af0125..b9e76f717a7e 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -457,13 +457,6 @@ ftrace_vprintk(const char *fmt, va_list ap) static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { } #endif /* CONFIG_TRACING */ -/* This counts to 12. Any more, it will return 13th argument. */ -#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n -#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0) - -#define __CONCAT(a, b) a ## b -#define CONCATENATE(a, b) __CONCAT(a, b) - /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */ #ifdef CONFIG_FTRACE_MCOUNT_RECORD # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD diff --git a/include/linux/pci.h b/include/linux/pci.h index c69a2cc1f412..9c856e9aada5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -23,7 +23,7 @@ #ifndef LINUX_PCI_H #define LINUX_PCI_H - +#include #include #include -- cgit v1.2.3 From 90e3e18548e6a8e9e66fe2b4f4ad62bd9e8f0cda Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 19 Jul 2023 00:11:46 +0300 Subject: arm64: smccc: replace custom COUNT_ARGS() & CONCATENATE() implementations Replace custom implementation of the macros from args.h. Link: https://lkml.kernel.org/r/20230718211147.18647-4-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Brendan Higgins Cc: Daniel Latypov Cc: Dave Hansen Cc: David Gow Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Lorenzo Pieralisi Cc: Mark Rutland Cc: Masami Hiramatsu (Google) Cc: Shuah Khan Cc: Steven Rostedt (Google) Cc: Sudeep Holla Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/arm-smccc.h | 69 ++++++++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 37 deletions(-) (limited to 'include/linux') diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index f196c19f8e55..7c67c17321d4 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -5,6 +5,7 @@ #ifndef __LINUX_ARM_SMCCC_H #define __LINUX_ARM_SMCCC_H +#include #include #include @@ -413,31 +414,26 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, #endif -#define ___count_args(_0, _1, _2, _3, _4, _5, _6, _7, _8, x, ...) x +#define __constraint_read_2 "r" (arg0) +#define __constraint_read_3 __constraint_read_2, "r" (arg1) +#define __constraint_read_4 __constraint_read_3, "r" (arg2) +#define __constraint_read_5 __constraint_read_4, "r" (arg3) +#define __constraint_read_6 __constraint_read_5, "r" (arg4) +#define __constraint_read_7 __constraint_read_6, "r" (arg5) +#define __constraint_read_8 __constraint_read_7, "r" (arg6) +#define __constraint_read_9 __constraint_read_8, "r" (arg7) -#define __count_args(...) \ - ___count_args(__VA_ARGS__, 7, 6, 5, 4, 3, 2, 1, 0) - -#define __constraint_read_0 "r" (arg0) -#define __constraint_read_1 __constraint_read_0, "r" (arg1) -#define __constraint_read_2 __constraint_read_1, "r" (arg2) -#define __constraint_read_3 __constraint_read_2, "r" (arg3) -#define __constraint_read_4 __constraint_read_3, "r" (arg4) -#define __constraint_read_5 __constraint_read_4, "r" (arg5) -#define __constraint_read_6 __constraint_read_5, "r" (arg6) -#define __constraint_read_7 __constraint_read_6, "r" (arg7) - -#define __declare_arg_0(a0, res) \ +#define __declare_arg_2(a0, res) \ struct arm_smccc_res *___res = res; \ register unsigned long arg0 asm("r0") = (u32)a0 -#define __declare_arg_1(a0, a1, res) \ +#define __declare_arg_3(a0, a1, res) \ typeof(a1) __a1 = a1; \ struct arm_smccc_res *___res = res; \ register unsigned long arg0 asm("r0") = (u32)a0; \ register typeof(a1) arg1 asm("r1") = __a1 -#define __declare_arg_2(a0, a1, a2, res) \ +#define __declare_arg_4(a0, a1, a2, res) \ typeof(a1) __a1 = a1; \ typeof(a2) __a2 = a2; \ struct arm_smccc_res *___res = res; \ @@ -445,7 +441,7 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, register typeof(a1) arg1 asm("r1") = __a1; \ register typeof(a2) arg2 asm("r2") = __a2 -#define __declare_arg_3(a0, a1, a2, a3, res) \ +#define __declare_arg_5(a0, a1, a2, a3, res) \ typeof(a1) __a1 = a1; \ typeof(a2) __a2 = a2; \ typeof(a3) __a3 = a3; \ @@ -455,34 +451,26 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, register typeof(a2) arg2 asm("r2") = __a2; \ register typeof(a3) arg3 asm("r3") = __a3 -#define __declare_arg_4(a0, a1, a2, a3, a4, res) \ +#define __declare_arg_6(a0, a1, a2, a3, a4, res) \ typeof(a4) __a4 = a4; \ - __declare_arg_3(a0, a1, a2, a3, res); \ + __declare_arg_5(a0, a1, a2, a3, res); \ register typeof(a4) arg4 asm("r4") = __a4 -#define __declare_arg_5(a0, a1, a2, a3, a4, a5, res) \ +#define __declare_arg_7(a0, a1, a2, a3, a4, a5, res) \ typeof(a5) __a5 = a5; \ - __declare_arg_4(a0, a1, a2, a3, a4, res); \ + __declare_arg_6(a0, a1, a2, a3, a4, res); \ register typeof(a5) arg5 asm("r5") = __a5 -#define __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res) \ +#define __declare_arg_8(a0, a1, a2, a3, a4, a5, a6, res) \ typeof(a6) __a6 = a6; \ - __declare_arg_5(a0, a1, a2, a3, a4, a5, res); \ + __declare_arg_7(a0, a1, a2, a3, a4, a5, res); \ register typeof(a6) arg6 asm("r6") = __a6 -#define __declare_arg_7(a0, a1, a2, a3, a4, a5, a6, a7, res) \ +#define __declare_arg_9(a0, a1, a2, a3, a4, a5, a6, a7, res) \ typeof(a7) __a7 = a7; \ - __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res); \ + __declare_arg_8(a0, a1, a2, a3, a4, a5, a6, res); \ register typeof(a7) arg7 asm("r7") = __a7 -#define ___declare_args(count, ...) __declare_arg_ ## count(__VA_ARGS__) -#define __declare_args(count, ...) ___declare_args(count, __VA_ARGS__) - -#define ___constraints(count) \ - : __constraint_read_ ## count \ - : smccc_sve_clobbers "memory" -#define __constraints(count) ___constraints(count) - /* * We have an output list that is not necessarily used, and GCC feels * entitled to optimise the whole sequence away. "volatile" is what @@ -494,11 +482,14 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, register unsigned long r1 asm("r1"); \ register unsigned long r2 asm("r2"); \ register unsigned long r3 asm("r3"); \ - __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \ + CONCATENATE(__declare_arg_, \ + COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__); \ asm volatile(SMCCC_SVE_CHECK \ inst "\n" : \ "=r" (r0), "=r" (r1), "=r" (r2), "=r" (r3) \ - __constraints(__count_args(__VA_ARGS__))); \ + : CONCATENATE(__constraint_read_, \ + COUNT_ARGS(__VA_ARGS__)) \ + : smccc_sve_clobbers "memory"); \ if (___res) \ *___res = (typeof(*___res)){r0, r1, r2, r3}; \ } while (0) @@ -542,8 +533,12 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, */ #define __fail_smccc_1_1(...) \ do { \ - __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \ - asm ("" : __constraints(__count_args(__VA_ARGS__))); \ + CONCATENATE(__declare_arg_, \ + COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__); \ + asm ("" : \ + : CONCATENATE(__constraint_read_, \ + COUNT_ARGS(__VA_ARGS__)) \ + : smccc_sve_clobbers "memory"); \ if (___res) \ ___res->a0 = SMCCC_RET_NOT_SUPPORTED; \ } while (0) -- cgit v1.2.3 From 2e106e564372a28f6dc9c8db411cb9424e78e743 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 19 Jul 2023 00:11:47 +0300 Subject: genetlink: replace custom CONCATENATE() implementation Replace custom implementation of the macros from args.h. Link: https://lkml.kernel.org/r/20230718211147.18647-5-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Brendan Higgins Cc: "H. Peter Anvin" Cc: Daniel Latypov Cc: Dave Hansen Cc: David Gow Cc: Ingo Molnar Cc: Lorenzo Pieralisi Cc: Mark Rutland Cc: Masami Hiramatsu (Google) Cc: Shuah Khan Cc: Steven Rostedt (Google) Cc: Sudeep Holla Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/genl_magic_func.h | 27 ++++++++++++++------------- include/linux/genl_magic_struct.h | 8 +++----- 2 files changed, 17 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/genl_magic_func.h b/include/linux/genl_magic_func.h index 2984b0cb24b1..d4da060b7532 100644 --- a/include/linux/genl_magic_func.h +++ b/include/linux/genl_magic_func.h @@ -2,6 +2,7 @@ #ifndef GENL_MAGIC_FUNC_H #define GENL_MAGIC_FUNC_H +#include #include #include @@ -23,7 +24,7 @@ #define GENL_struct(tag_name, tag_number, s_name, s_fields) \ [tag_name] = { .type = NLA_NESTED }, -static struct nla_policy CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy)[] = { +static struct nla_policy CONCATENATE(GENL_MAGIC_FAMILY, _tla_nl_policy)[] = { #include GENL_MAGIC_INCLUDE_FILE }; @@ -209,7 +210,7 @@ static int s_name ## _from_attrs_for_change(struct s_name *s, \ * Magic: define op number to op name mapping {{{1 * {{{2 */ -static const char *CONCAT_(GENL_MAGIC_FAMILY, _genl_cmd_to_str)(__u8 cmd) +static const char *CONCATENATE(GENL_MAGIC_FAMILY, _genl_cmd_to_str)(__u8 cmd) { switch (cmd) { #undef GENL_op @@ -235,7 +236,7 @@ static const char *CONCAT_(GENL_MAGIC_FAMILY, _genl_cmd_to_str)(__u8 cmd) .cmd = op_name, \ }, -#define ZZZ_genl_ops CONCAT_(GENL_MAGIC_FAMILY, _genl_ops) +#define ZZZ_genl_ops CONCATENATE(GENL_MAGIC_FAMILY, _genl_ops) static struct genl_ops ZZZ_genl_ops[] __read_mostly = { #include GENL_MAGIC_INCLUDE_FILE }; @@ -248,32 +249,32 @@ static struct genl_ops ZZZ_genl_ops[] __read_mostly = { * and provide register/unregister functions. * {{{2 */ -#define ZZZ_genl_family CONCAT_(GENL_MAGIC_FAMILY, _genl_family) +#define ZZZ_genl_family CONCATENATE(GENL_MAGIC_FAMILY, _genl_family) static struct genl_family ZZZ_genl_family; /* * Magic: define multicast groups * Magic: define multicast group registration helper */ -#define ZZZ_genl_mcgrps CONCAT_(GENL_MAGIC_FAMILY, _genl_mcgrps) +#define ZZZ_genl_mcgrps CONCATENATE(GENL_MAGIC_FAMILY, _genl_mcgrps) static const struct genl_multicast_group ZZZ_genl_mcgrps[] = { #undef GENL_mc_group #define GENL_mc_group(group) { .name = #group, }, #include GENL_MAGIC_INCLUDE_FILE }; -enum CONCAT_(GENL_MAGIC_FAMILY, group_ids) { +enum CONCATENATE(GENL_MAGIC_FAMILY, group_ids) { #undef GENL_mc_group -#define GENL_mc_group(group) CONCAT_(GENL_MAGIC_FAMILY, _group_ ## group), +#define GENL_mc_group(group) CONCATENATE(GENL_MAGIC_FAMILY, _group_ ## group), #include GENL_MAGIC_INCLUDE_FILE }; #undef GENL_mc_group #define GENL_mc_group(group) \ -static int CONCAT_(GENL_MAGIC_FAMILY, _genl_multicast_ ## group)( \ +static int CONCATENATE(GENL_MAGIC_FAMILY, _genl_multicast_ ## group)( \ struct sk_buff *skb, gfp_t flags) \ { \ unsigned int group_id = \ - CONCAT_(GENL_MAGIC_FAMILY, _group_ ## group); \ + CONCATENATE(GENL_MAGIC_FAMILY, _group_ ## group); \ return genlmsg_multicast(&ZZZ_genl_family, skb, 0, \ group_id, flags); \ } @@ -289,8 +290,8 @@ static struct genl_family ZZZ_genl_family __ro_after_init = { #ifdef GENL_MAGIC_FAMILY_HDRSZ .hdrsize = NLA_ALIGN(GENL_MAGIC_FAMILY_HDRSZ), #endif - .maxattr = ARRAY_SIZE(CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy))-1, - .policy = CONCAT_(GENL_MAGIC_FAMILY, _tla_nl_policy), + .maxattr = ARRAY_SIZE(CONCATENATE(GENL_MAGIC_FAMILY, _tla_nl_policy))-1, + .policy = CONCATENATE(GENL_MAGIC_FAMILY, _tla_nl_policy), .ops = ZZZ_genl_ops, .n_ops = ARRAY_SIZE(ZZZ_genl_ops), .mcgrps = ZZZ_genl_mcgrps, @@ -299,12 +300,12 @@ static struct genl_family ZZZ_genl_family __ro_after_init = { .module = THIS_MODULE, }; -int CONCAT_(GENL_MAGIC_FAMILY, _genl_register)(void) +int CONCATENATE(GENL_MAGIC_FAMILY, _genl_register)(void) { return genl_register_family(&ZZZ_genl_family); } -void CONCAT_(GENL_MAGIC_FAMILY, _genl_unregister)(void) +void CONCATENATE(GENL_MAGIC_FAMILY, _genl_unregister)(void) { genl_unregister_family(&ZZZ_genl_family); } diff --git a/include/linux/genl_magic_struct.h b/include/linux/genl_magic_struct.h index f81d48987528..a419d93789ff 100644 --- a/include/linux/genl_magic_struct.h +++ b/include/linux/genl_magic_struct.h @@ -14,14 +14,12 @@ # error "you need to define GENL_MAGIC_INCLUDE_FILE before inclusion" #endif +#include #include #include -#define CONCAT__(a,b) a ## b -#define CONCAT_(a,b) CONCAT__(a,b) - -extern int CONCAT_(GENL_MAGIC_FAMILY, _genl_register)(void); -extern void CONCAT_(GENL_MAGIC_FAMILY, _genl_unregister)(void); +extern int CONCATENATE(GENL_MAGIC_FAMILY, _genl_register)(void); +extern void CONCATENATE(GENL_MAGIC_FAMILY, _genl_unregister)(void); /* * Extension of genl attribute validation policies {{{2 -- cgit v1.2.3 From 46f12960aad2d48fc6742470f718e147e5c85d06 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 3 Aug 2023 16:19:18 +0300 Subject: drm/i915: Move abs_diff() to math.h abs_diff() belongs to math.h. Move it there. This will allow others to use it. [andriy.shevchenko@linux.intel.com: add abs_diff() documentation] Link: https://lkml.kernel.org/r/20230804050934.83223-1-andriy.shevchenko@linux.intel.com [akpm@linux-foundation.org: fix comment, per Randy] Link: https://lkml.kernel.org/r/20230803131918.53727-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Reviewed-by: Jiri Slaby # tty/serial Acked-by: Jani Nikula Acked-by: Greg Kroah-Hartman Reviewed-by: Andi Shyti Reviewed-by: Philipp Zabel # gpu/ipu-v3 Cc: Alexey Dobriyan Cc: Daniel Vetter Cc: David Airlie Cc: Helge Deller Cc: Imre Deak Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rasmus Villemoes Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Cc: Randy Dunlap Signed-off-by: Andrew Morton --- include/linux/math.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/math.h b/include/linux/math.h index 2d388650c556..dd4152711de7 100644 --- a/include/linux/math.h +++ b/include/linux/math.h @@ -155,6 +155,25 @@ __STRUCT_FRACT(u32) __builtin_types_compatible_p(typeof(x), unsigned type), \ ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other) +/** + * abs_diff - return absolute value of the difference between the arguments + * @a: the first argument + * @b: the second argument + * + * @a and @b have to be of the same type. With this restriction we compare + * signed to signed and unsigned to unsigned. The result is the subtraction + * the smaller of the two from the bigger, hence result is always a positive + * value. + * + * Return: an absolute value of the difference between the @a and @b. + */ +#define abs_diff(a, b) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + (void)(&__a == &__b); \ + __a > __b ? (__a - __b) : (__b - __a); \ +}) + /** * reciprocal_scale - "scale" a value into range [0, ep_ro) * @val: value -- cgit v1.2.3 From be33db21427c33a4217bd6b805944d2cf84faf25 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 4 Aug 2023 08:41:51 +0200 Subject: kthread: unexport __kthread_should_park() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are no in-kernel users of __kthread_should_park() so mark it as static and do not export it. Link: https://lkml.kernel.org/r/2023080450-handcuff-stump-1d6e@gregkh Signed-off-by: Greg Kroah-Hartman Reviewed-by: Kees Cook Cc: John Stultz Cc: "Peter Zijlstra (Intel)" Cc: "Arve Hjønnevåg" Cc: Valentin Schneider Cc: Andrew Morton Cc: "Christian Brauner (Microsoft)" Cc: Mike Christie Cc: Nicholas Piggin Cc: Zqiang Cc: Prathu Baronia Cc: Sami Tolvanen Signed-off-by: Andrew Morton --- include/linux/kthread.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kthread.h b/include/linux/kthread.h index f1f95a71a4bc..2c30ade43bc8 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -88,7 +88,6 @@ void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask); int kthread_stop(struct task_struct *k); bool kthread_should_stop(void); bool kthread_should_park(void); -bool __kthread_should_park(struct task_struct *k); bool kthread_should_stop_or_park(void); bool kthread_freezable_should_stop(bool *was_frozen); void *kthread_func(struct task_struct *k); -- cgit v1.2.3 From 3c9d017cc283df0dabf6b16a6cb4574426cbd53b Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 4 Aug 2023 09:46:36 +0300 Subject: range.h: Move resource API and constant to respective files range.h works with struct range data type. The resource_size_t is an alien here. (1) Move cap_resource() implementation into its only user, and (2) rename and move RESOURCE_SIZE_MAX to limits.h. Link: https://lkml.kernel.org/r/20230804064636.15368-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Acked-by: Bjorn Helgaas Cc: Alexander Sverdlin Cc: Borislav Petkov (AMD) Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Rasmus Villemoes Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/limits.h | 2 ++ include/linux/range.h | 8 -------- 2 files changed, 2 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/limits.h b/include/linux/limits.h index f6bcc9369010..38eb7f6f7e88 100644 --- a/include/linux/limits.h +++ b/include/linux/limits.h @@ -10,6 +10,8 @@ #define SSIZE_MAX ((ssize_t)(SIZE_MAX >> 1)) #define PHYS_ADDR_MAX (~(phys_addr_t)0) +#define RESOURCE_SIZE_MAX ((resource_size_t)~0) + #define U8_MAX ((u8)~0U) #define S8_MAX ((s8)(U8_MAX >> 1)) #define S8_MIN ((s8)(-S8_MAX - 1)) diff --git a/include/linux/range.h b/include/linux/range.h index 7efb6a9b069b..6ad0b73cb7ad 100644 --- a/include/linux/range.h +++ b/include/linux/range.h @@ -31,12 +31,4 @@ int clean_sort_range(struct range *range, int az); void sort_range(struct range *range, int nr_range); -#define MAX_RESOURCE ((resource_size_t)~0) -static inline resource_size_t cap_resource(u64 val) -{ - if (val > MAX_RESOURCE) - return MAX_RESOURCE; - - return val; -} #endif -- cgit v1.2.3 From 8d539b84f1e3478436f978ceaf55a0b6cab497b5 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 4 Aug 2023 07:00:42 -0700 Subject: nmi_backtrace: allow excluding an arbitrary CPU The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case. Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask. Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change. [akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson Acked-by: Michal Hocko Cc: kernel test robot Cc: Lecopzer Chen Cc: Petr Mladek Cc: Pingfan Liu Signed-off-by: Andrew Morton --- include/linux/nmi.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nmi.h b/include/linux/nmi.h index e3e6a64b98e0..e92e378df000 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -157,31 +157,31 @@ static inline void touch_nmi_watchdog(void) #ifdef arch_trigger_cpumask_backtrace static inline bool trigger_all_cpu_backtrace(void) { - arch_trigger_cpumask_backtrace(cpu_online_mask, false); + arch_trigger_cpumask_backtrace(cpu_online_mask, -1); return true; } -static inline bool trigger_allbutself_cpu_backtrace(void) +static inline bool trigger_allbutcpu_cpu_backtrace(int exclude_cpu) { - arch_trigger_cpumask_backtrace(cpu_online_mask, true); + arch_trigger_cpumask_backtrace(cpu_online_mask, exclude_cpu); return true; } static inline bool trigger_cpumask_backtrace(struct cpumask *mask) { - arch_trigger_cpumask_backtrace(mask, false); + arch_trigger_cpumask_backtrace(mask, -1); return true; } static inline bool trigger_single_cpu_backtrace(int cpu) { - arch_trigger_cpumask_backtrace(cpumask_of(cpu), false); + arch_trigger_cpumask_backtrace(cpumask_of(cpu), -1); return true; } /* generic implementation */ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, - bool exclude_self, + int exclude_cpu, void (*raise)(cpumask_t *mask)); bool nmi_cpu_backtrace(struct pt_regs *regs); @@ -190,7 +190,7 @@ static inline bool trigger_all_cpu_backtrace(void) { return false; } -static inline bool trigger_allbutself_cpu_backtrace(void) +static inline bool trigger_allbutcpu_cpu_backtrace(int exclude_cpu) { return false; } -- cgit v1.2.3 From c01467355f8eb126cab0ef28b66bb506fe6a2e21 Mon Sep 17 00:00:00 2001 From: Andre Werner Date: Fri, 18 Aug 2023 10:37:21 +0200 Subject: mfd: tps65086: Read DEVICE ID register 1 from device This commit prepares a following commit for the regulator part of the MFD. The driver should support different device chips that differ in their register definitions, for instance to control LDOA1 and SWB2. So it is necessary to use a dedicated regulator description for a specific device variant. Thus, the content from DEVICEID Register 1 is used to choose a dedicated configuration between the different device variants. Signed-off-by: Andre Werner Link: https://lore.kernel.org/r/20230818083721.29790-2-andre.werner@systec-electronic.com Signed-off-by: Lee Jones --- include/linux/mfd/tps65086.h | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/tps65086.h b/include/linux/mfd/tps65086.h index 16f87cccc003..87e590de6ca5 100644 --- a/include/linux/mfd/tps65086.h +++ b/include/linux/mfd/tps65086.h @@ -13,8 +13,9 @@ #include /* List of registers for TPS65086 */ -#define TPS65086_DEVICEID 0x01 -#define TPS65086_IRQ 0x02 +#define TPS65086_DEVICEID1 0x00 +#define TPS65086_DEVICEID2 0x01 +#define TPS65086_IRQ 0x02 #define TPS65086_IRQ_MASK 0x03 #define TPS65086_PMICSTAT 0x04 #define TPS65086_SHUTDNSRC 0x05 @@ -75,10 +76,16 @@ #define TPS65086_IRQ_SHUTDN_MASK BIT(3) #define TPS65086_IRQ_FAULT_MASK BIT(7) -/* DEVICEID Register field definitions */ -#define TPS65086_DEVICEID_PART_MASK GENMASK(3, 0) -#define TPS65086_DEVICEID_OTP_MASK GENMASK(5, 4) -#define TPS65086_DEVICEID_REV_MASK GENMASK(7, 6) +/* DEVICEID1 Register field definitions */ +#define TPS6508640_ID 0x00 +#define TPS65086401_ID 0x01 +#define TPS6508641_ID 0x10 +#define TPS65086470_ID 0x70 + +/* DEVICEID2 Register field definitions */ +#define TPS65086_DEVICEID2_PART_MASK GENMASK(3, 0) +#define TPS65086_DEVICEID2_OTP_MASK GENMASK(5, 4) +#define TPS65086_DEVICEID2_REV_MASK GENMASK(7, 6) /* VID Masks */ #define BUCK_VID_MASK GENMASK(7, 1) @@ -100,6 +107,7 @@ enum tps65086_irqs { struct tps65086 { struct device *dev; struct regmap *regmap; + unsigned int chip_id; /* IRQ Data */ int irq; -- cgit v1.2.3 From 2459f4dfe5529f8b847f452473e2da08a8fc9fe5 Mon Sep 17 00:00:00 2001 From: Yangtao Li Date: Thu, 6 Jul 2023 19:39:39 +0800 Subject: mfd: hi655x-pmic: Convert to devm_platform_ioremap_resource() Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li Link: https://lore.kernel.org/r/20230706113939.1178-7-frank.li@vivo.com Signed-off-by: Lee Jones --- include/linux/mfd/hi655x-pmic.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/hi655x-pmic.h b/include/linux/mfd/hi655x-pmic.h index 6a012784dd1b..194556851ccf 100644 --- a/include/linux/mfd/hi655x-pmic.h +++ b/include/linux/mfd/hi655x-pmic.h @@ -52,7 +52,6 @@ #define OTMP_D1R_INT_MASK BIT(OTMP_D1R_INT) struct hi655x_pmic { - struct resource *res; struct device *dev; struct regmap *regmap; struct gpio_desc *gpio; -- cgit v1.2.3 From 2dfe293bcde2d302c045d0cd536ccb15f86385d2 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 20 Jul 2023 22:37:38 +0800 Subject: mfd: db8500-prcmu: Remove unused inline functions Since commit b0e846248de5 ("mfd: db8500-prcmu: Remove dead code for a non-existing config") these inline helpers also no need any more. Signed-off-by: YueHaibing Link: https://lore.kernel.org/r/20230720143738.13996-1-yuehaibing@huawei.com Signed-off-by: Lee Jones --- include/linux/mfd/dbx500-prcmu.h | 21 --------------------- 1 file changed, 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/dbx500-prcmu.h b/include/linux/mfd/dbx500-prcmu.h index e7a7e70fdb38..dd0fc891b228 100644 --- a/include/linux/mfd/dbx500-prcmu.h +++ b/include/linux/mfd/dbx500-prcmu.h @@ -556,16 +556,6 @@ static inline void prcmu_clear(unsigned int reg, u32 bits) #define PRCMU_QOS_ARM_OPP 3 #define PRCMU_QOS_DEFAULT_VALUE -1 -static inline unsigned long prcmu_qos_get_cpufreq_opp_delay(void) -{ - return 0; -} - -static inline int prcmu_qos_requirement(int prcmu_qos_class) -{ - return 0; -} - static inline int prcmu_qos_add_requirement(int prcmu_qos_class, char *name, s32 value) { @@ -582,15 +572,4 @@ static inline void prcmu_qos_remove_requirement(int prcmu_qos_class, char *name) { } -static inline int prcmu_qos_add_notifier(int prcmu_qos_class, - struct notifier_block *notifier) -{ - return 0; -} -static inline int prcmu_qos_remove_notifier(int prcmu_qos_class, - struct notifier_block *notifier) -{ - return 0; -} - #endif /* __MACH_PRCMU_H */ -- cgit v1.2.3 From 10d3340441bd0db857fc7fcb1733a800acf47a3d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jul 2023 11:02:23 +0200 Subject: mfd: rz-mtu3: Link time dependencies The new set of drivers for RZ/G2L MTU3a tries to enable compile-testing the individual client drivers even when the MFD portion is disabled but gets it wrong, causing a link failure when the core is in a loadable module but the other drivers are built-in: x86_64-linux-ld: drivers/pwm/pwm-rz-mtu3.o: in function `rz_mtu3_pwm_apply': pwm-rz-mtu3.c:(.text+0x4bf): undefined reference to `rz_mtu3_8bit_ch_write' x86_64-linux-ld: pwm-rz-mtu3.c:(.text+0x509): undefined reference to `rz_mtu3_disable' arm-linux-gnueabi-ld: drivers/counter/rz-mtu3-cnt.o: in function `rz_mtu3_cascade_counts_enable_get': rz-mtu3-cnt.c:(.text+0xbec): undefined reference to `rz_mtu3_shared_reg_read' It seems better not to add the extra complexity here but instead just use a normal hard dependency, so remove the #else portion in the header along with the "|| COMPILE_TEST". This could also be fixed by having slightly more elaborate Kconfig dependencies or using the cursed 'IS_REACHABLE()' helper, but in practice it's already possible to compile-test all these drivers by enabling the mtd portion. Fixes: 254d3a727421c ("pwm: Add Renesas RZ/G2L MTU3a PWM driver") Fixes: 0be8907359df4 ("counter: Add Renesas RZ/G2L MTU3a counter driver") Fixes: 654c293e1687b ("mfd: Add Renesas RZ/G2L MTU3a core driver") Signed-off-by: Arnd Bergmann Acked-by: Thierry Reding Reviewed-by: Biju Das Link: https://lore.kernel.org/r/20230719090430.1925182-1-arnd@kernel.org Signed-off-by: Lee Jones --- include/linux/mfd/rz-mtu3.h | 66 --------------------------------------------- 1 file changed, 66 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/rz-mtu3.h b/include/linux/mfd/rz-mtu3.h index c5173bc06270..8421d49500bf 100644 --- a/include/linux/mfd/rz-mtu3.h +++ b/include/linux/mfd/rz-mtu3.h @@ -151,7 +151,6 @@ struct rz_mtu3 { void *priv_data; }; -#if IS_ENABLED(CONFIG_RZ_MTU3) static inline bool rz_mtu3_request_channel(struct rz_mtu3_channel *ch) { mutex_lock(&ch->lock); @@ -188,70 +187,5 @@ void rz_mtu3_32bit_ch_write(struct rz_mtu3_channel *ch, u16 off, u32 val); void rz_mtu3_shared_reg_write(struct rz_mtu3_channel *ch, u16 off, u16 val); void rz_mtu3_shared_reg_update_bit(struct rz_mtu3_channel *ch, u16 off, u16 pos, u8 val); -#else -static inline bool rz_mtu3_request_channel(struct rz_mtu3_channel *ch) -{ - return false; -} - -static inline void rz_mtu3_release_channel(struct rz_mtu3_channel *ch) -{ -} - -static inline bool rz_mtu3_is_enabled(struct rz_mtu3_channel *ch) -{ - return false; -} - -static inline void rz_mtu3_disable(struct rz_mtu3_channel *ch) -{ -} - -static inline int rz_mtu3_enable(struct rz_mtu3_channel *ch) -{ - return 0; -} - -static inline u8 rz_mtu3_8bit_ch_read(struct rz_mtu3_channel *ch, u16 off) -{ - return 0; -} - -static inline u16 rz_mtu3_16bit_ch_read(struct rz_mtu3_channel *ch, u16 off) -{ - return 0; -} - -static inline u32 rz_mtu3_32bit_ch_read(struct rz_mtu3_channel *ch, u16 off) -{ - return 0; -} - -static inline u16 rz_mtu3_shared_reg_read(struct rz_mtu3_channel *ch, u16 off) -{ - return 0; -} - -static inline void rz_mtu3_8bit_ch_write(struct rz_mtu3_channel *ch, u16 off, u8 val) -{ -} - -static inline void rz_mtu3_16bit_ch_write(struct rz_mtu3_channel *ch, u16 off, u16 val) -{ -} - -static inline void rz_mtu3_32bit_ch_write(struct rz_mtu3_channel *ch, u16 off, u32 val) -{ -} - -static inline void rz_mtu3_shared_reg_write(struct rz_mtu3_channel *ch, u16 off, u16 val) -{ -} - -static inline void rz_mtu3_shared_reg_update_bit(struct rz_mtu3_channel *ch, - u16 off, u16 pos, u8 val) -{ -} -#endif #endif /* __MFD_RZ_MTU3_H__ */ -- cgit v1.2.3 From e0d77323824054f15f3137e890f7addc48198a3e Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 28 Jul 2023 21:27:09 +0800 Subject: mfd: max77686: Remove unused extern declarations max77686_irq_init() and max77686_irq_exit() are not used since commit 6f1c1e71d933 ("mfd: max77686: Convert to use regmap_irq"). And max77686_irq_resume() never be implemented since introduced in commit dae8a969d512 ("mfd: Add Maxim 77686 driver"). Signed-off-by: Yue Haibing Reviewed-by: Krzysztof Kozlowski Reviewed-by: Chanwoo Choi Link: https://lore.kernel.org/r/20230728132709.27052-1-yuehaibing@huawei.com Signed-off-by: Lee Jones --- include/linux/mfd/max77686-private.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/max77686-private.h b/include/linux/mfd/max77686-private.h index 3acceeedbaba..ea635d12a741 100644 --- a/include/linux/mfd/max77686-private.h +++ b/include/linux/mfd/max77686-private.h @@ -441,8 +441,4 @@ enum max77686_types { TYPE_MAX77802, }; -extern int max77686_irq_init(struct max77686_dev *max77686); -extern void max77686_irq_exit(struct max77686_dev *max77686); -extern int max77686_irq_resume(struct max77686_dev *max77686); - #endif /* __LINUX_MFD_MAX77686_PRIV_H */ -- cgit v1.2.3 From 733e2e9a28e6fa109e51e0b77901552f69df0ef1 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 28 Jul 2023 21:24:39 +0800 Subject: mfd: ab8500: Remove unused extern declarations commit d28f1db8187d ("mfd: Remove confusing ab8500-i2c file and merge into ab8500-core") left behind this. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230728132439.31568-1-yuehaibing@huawei.com Signed-off-by: Lee Jones --- include/linux/mfd/abx500/ab8500.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/abx500/ab8500.h b/include/linux/mfd/abx500/ab8500.h index 302a330c5c84..09fb3c56e7d7 100644 --- a/include/linux/mfd/abx500/ab8500.h +++ b/include/linux/mfd/abx500/ab8500.h @@ -382,10 +382,6 @@ struct ab8500_platform_data { struct ab8500_sysctrl_platform_data *sysctrl; }; -extern int ab8500_init(struct ab8500 *ab8500, - enum ab8500_version version); -extern int ab8500_exit(struct ab8500 *ab8500); - extern int ab8500_suspend(struct ab8500 *ab8500); static inline int is_ab8500(struct ab8500 *ab) -- cgit v1.2.3 From 54ab43a957bcb2643c13df5ab71de9dc3f72e5a6 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 28 Jul 2023 21:28:41 +0800 Subject: mfd: 88pm860x: Remove unused extern declarations commit 260a127bfbeb ("mfd: 88pm860x-i2c: Purge unused functions") left behind this. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230728132841.10648-1-yuehaibing@huawei.com Signed-off-by: Lee Jones --- include/linux/mfd/88pm860x.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/88pm860x.h b/include/linux/mfd/88pm860x.h index 473545a2c425..6fa21791fc85 100644 --- a/include/linux/mfd/88pm860x.h +++ b/include/linux/mfd/88pm860x.h @@ -472,13 +472,7 @@ extern int pm860x_bulk_read(struct i2c_client *, int, int, unsigned char *); extern int pm860x_bulk_write(struct i2c_client *, int, int, unsigned char *); extern int pm860x_set_bits(struct i2c_client *, int, unsigned char, unsigned char); -extern int pm860x_page_reg_read(struct i2c_client *, int); extern int pm860x_page_reg_write(struct i2c_client *, int, unsigned char); extern int pm860x_page_bulk_read(struct i2c_client *, int, int, unsigned char *); -extern int pm860x_page_bulk_write(struct i2c_client *, int, int, - unsigned char *); -extern int pm860x_page_set_bits(struct i2c_client *, int, unsigned char, - unsigned char); - #endif /* __LINUX_MFD_88PM860X_H */ -- cgit v1.2.3 From 3a5e6e49855661ad39b8fbb1dbd041178af98e00 Mon Sep 17 00:00:00 2001 From: Andre Werner Date: Fri, 18 Aug 2023 10:37:24 +0200 Subject: regulator: tps65086: Select dedicated regulator config for chip variant Some configurations differ between chip variants, e,g. the register to control the on of state of LDOA1 and SWB2. Thus, it is necessary to choose the correct configuration for a dedicated device. If the wrong configuration was used, the LDOA1 output that was disabled by the bootloader was enabled in Kernel again. Each chip variant gets its dedicated configuration selected by the chip ID previously collected from MFD probe function. The VTT enum value (tps65086_regulators) is shifted because not all chip variants have a separate SWB2 switch. Sometimes they are merged. So the configuration possibilities differ, thus the regulator configuration arrays have a different length. Signed-off-by: Andre Werner Link: https://lore.kernel.org/r/20230818083721.29790-5-andre.werner@systec-electronic.com Signed-off-by: Mark Brown --- include/linux/mfd/tps65086.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/tps65086.h b/include/linux/mfd/tps65086.h index 87e590de6ca5..9185b5cd8371 100644 --- a/include/linux/mfd/tps65086.h +++ b/include/linux/mfd/tps65086.h @@ -99,6 +99,8 @@ enum tps65086_irqs { TPS65086_IRQ_FAULT, }; +struct tps65086_regulator_config; + /** * struct tps65086 - state holder for the tps65086 driver * @@ -108,6 +110,7 @@ struct tps65086 { struct device *dev; struct regmap *regmap; unsigned int chip_id; + const struct tps65086_regulator_config *reg_config; /* IRQ Data */ int irq; -- cgit v1.2.3 From 4025d3e73abde4f65f4b04d4b1d8449b00e31473 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Aug 2023 09:40:39 +0000 Subject: net: add skb_queue_purge_reason and __skb_queue_purge_reason skb_queue_purge() and __skb_queue_purge() become wrappers around the new generic functions. New SKB_DROP_REASON_QUEUE_PURGE drop reason is added, but users can start adding more specific reasons. Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/skbuff.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index aa57e2eca33b..9aec136bc690 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3149,20 +3149,35 @@ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask) } /** - * __skb_queue_purge - empty a list + * __skb_queue_purge_reason - empty a list * @list: list to empty + * @reason: drop reason * * Delete all buffers on an &sk_buff list. Each buffer is removed from * the list and one reference dropped. This function does not take the * list lock and the caller must hold the relevant locks to use it. */ -static inline void __skb_queue_purge(struct sk_buff_head *list) +static inline void __skb_queue_purge_reason(struct sk_buff_head *list, + enum skb_drop_reason reason) { struct sk_buff *skb; + while ((skb = __skb_dequeue(list)) != NULL) - kfree_skb(skb); + kfree_skb_reason(skb, reason); +} + +static inline void __skb_queue_purge(struct sk_buff_head *list) +{ + __skb_queue_purge_reason(list, SKB_DROP_REASON_QUEUE_PURGE); +} + +void skb_queue_purge_reason(struct sk_buff_head *list, + enum skb_drop_reason reason); + +static inline void skb_queue_purge(struct sk_buff_head *list) +{ + skb_queue_purge_reason(list, SKB_DROP_REASON_QUEUE_PURGE); } -void skb_queue_purge(struct sk_buff_head *list); unsigned int skb_rbtree_purge(struct rb_root *root); -- cgit v1.2.3 From 2e92f669b86d70a26a77b088c74e1cb6d26322e1 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Tue, 6 Dec 2022 14:33:49 +0200 Subject: net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core Since now MACsec flow steering (macsec_fs) and MACsec statistics (stats) are maintained by the core driver, move their data as well to be saved inside core structures instead of staying part of ethernet MACsec database. In addition cleanup all MACsec stats functions from the ethernet MACsec code and move what's needed to be part of macsec_fs instead. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 25d0528f9219..0e80241c270f 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -805,6 +805,9 @@ struct mlx5_core_dev { u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; struct mlx5_thermal *thermal; +#ifdef CONFIG_MLX5_MACSEC + struct mlx5_macsec_fs *macsec_fs; +#endif }; struct mlx5_db { -- cgit v1.2.3 From 758ce14aee825f8f3ca8f76c9991c108094cae8b Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Tue, 3 May 2022 08:37:48 +0300 Subject: RDMA/mlx5: Implement MACsec gid addition and deletion Handle MACsec IP ambiguity issue, since mlx5 hw can't support programming both the MACsec and the physical gid when they have the same IP address, because it wouldn't know to whom to steer the traffic. Hence in such case we delete the physical gid from the hw gid table, which would then cause all traffic sent over it to fail, and we'll only be able to send traffic over the MACsec gid. Signed-off-by: Patrisious Haddad Reviewed-by: Raed Salem Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 0e80241c270f..21954e7aeeba 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1323,6 +1323,50 @@ static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) return mlx5_is_roce_on(dev); } +static inline bool mlx5e_is_macsec_device(const struct mlx5_core_dev *mdev) +{ + if (!(MLX5_CAP_GEN_64(mdev, general_obj_types) & + MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD)) + return false; + + if (!MLX5_CAP_GEN(mdev, log_max_dek)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, log_max_macsec_offload)) + return false; + + if (!MLX5_CAP_FLOWTABLE_NIC_RX(mdev, macsec_decrypt) || + !MLX5_CAP_FLOWTABLE_NIC_RX(mdev, reformat_remove_macsec)) + return false; + + if (!MLX5_CAP_FLOWTABLE_NIC_TX(mdev, macsec_encrypt) || + !MLX5_CAP_FLOWTABLE_NIC_TX(mdev, reformat_add_macsec)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_128_encrypt) && + !MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_256_encrypt)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_128_decrypt) && + !MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_256_decrypt)) + return false; + + return true; +} + +#define NIC_RDMA_BOTH_DIRS_CAPS (MLX5_FT_NIC_RX_2_NIC_RX_RDMA | MLX5_FT_NIC_TX_RDMA_2_NIC_TX) + +static inline bool mlx5_is_macsec_roce_supported(struct mlx5_core_dev *mdev) +{ + if (((MLX5_CAP_GEN_2(mdev, flow_table_type_2_type) & + NIC_RDMA_BOTH_DIRS_CAPS) != NIC_RDMA_BOTH_DIRS_CAPS) || + !MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, max_modify_header_actions) || + !mlx5e_is_macsec_device(mdev)) + return false; + + return true; +} + enum { MLX5_OCTWORD = 16, }; -- cgit v1.2.3 From afcb21d5a89b40c3062aa48d39ab5340abf7dcd8 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 11 Aug 2022 05:20:02 +0300 Subject: net/mlx5: Add MACsec priorities in RDMA namespaces Add MACsec flow steering priorities in RDMA namespaces. This allows adding tables/rules to forward RoCEv2 traffic to the MACsec crypto tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain. Signed-off-by: Patrisious Haddad Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- include/linux/mlx5/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 2cb404c7ea13..091ff1240959 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -105,6 +105,8 @@ enum mlx5_flow_namespace_type { MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS, MLX5_FLOW_NAMESPACE_RDMA_RX_IPSEC, MLX5_FLOW_NAMESPACE_RDMA_TX_IPSEC, + MLX5_FLOW_NAMESPACE_RDMA_RX_MACSEC, + MLX5_FLOW_NAMESPACE_RDMA_TX_MACSEC, }; enum { -- cgit v1.2.3 From ac7ea1c78f0edb910af1c17a3fb01345944bcb39 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 13 Apr 2023 12:04:42 +0300 Subject: net/mlx5: Add RoCE MACsec steering infrastructure in core Adds all the core steering helper functions that are needed in order to setup RoCE steering rules which includes both the RX and TX rules addition and deletion. As well as exporting the function to be ready to use from the IB driver where we expose functions to allow deletion of all rules, which is needed when a GID is deleted, or a deletion of a specific rule when an SA is deleted, and a similar manner for the rules addition. These functions are used in a later patch by IB driver to trigger the rules addition/deletion when needed. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 2 ++ include/linux/mlx5/driver.h | 2 ++ include/linux/mlx5/macsec.h | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+) create mode 100644 include/linux/mlx5/macsec.h (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 80cc12a9a531..ca93a5ef9dac 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -364,6 +364,8 @@ enum mlx5_event { enum mlx5_driver_event { MLX5_DRIVER_EVENT_TYPE_TRAP = 0, MLX5_DRIVER_EVENT_UPLINK_NETDEV, + MLX5_DRIVER_EVENT_MACSEC_SA_ADDED, + MLX5_DRIVER_EVENT_MACSEC_SA_DELETED, }; enum { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 21954e7aeeba..21014bc34516 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -807,6 +807,8 @@ struct mlx5_core_dev { struct mlx5_thermal *thermal; #ifdef CONFIG_MLX5_MACSEC struct mlx5_macsec_fs *macsec_fs; + /* MACsec notifier chain to sync MACsec core and IB database */ + struct blocking_notifier_head macsec_nh; #endif }; diff --git a/include/linux/mlx5/macsec.h b/include/linux/mlx5/macsec.h new file mode 100644 index 000000000000..f7ff4c2a95d0 --- /dev/null +++ b/include/linux/mlx5/macsec.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. */ + +#ifndef MLX5_MACSEC_H +#define MLX5_MACSEC_H + +#ifdef CONFIG_MLX5_MACSEC +struct mlx5_macsec_event_data { + struct mlx5_macsec_fs *macsec_fs; + void *macdev; + u32 fs_id; + bool is_tx; +}; + +int mlx5_macsec_add_roce_rule(void *macdev, const struct sockaddr *addr, u16 gid_idx, + struct list_head *tx_rules_list, struct list_head *rx_rules_list, + struct mlx5_macsec_fs *macsec_fs); + +void mlx5_macsec_del_roce_rule(u16 gid_idx, struct mlx5_macsec_fs *macsec_fs, + struct list_head *tx_rules_list, struct list_head *rx_rules_list); + +void mlx5_macsec_add_roce_sa_rules(u32 fs_id, const struct sockaddr *addr, u16 gid_idx, + struct list_head *tx_rules_list, + struct list_head *rx_rules_list, + struct mlx5_macsec_fs *macsec_fs, bool is_tx); + +void mlx5_macsec_del_roce_sa_rules(u32 fs_id, struct mlx5_macsec_fs *macsec_fs, + struct list_head *tx_rules_list, + struct list_head *rx_rules_list, bool is_tx); + +#endif +#endif /* MLX5_MACSEC_H */ -- cgit v1.2.3 From 58dbd6428a6819e55a3c52ec60126b5d00804a38 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 13 Apr 2023 12:04:59 +0300 Subject: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion Add RoCE MACsec rules when a gid is added for the MACsec netdevice and handle their cleanup when the gid is removed or the MACsec SA is deleted. Also support alias IP for the MACsec device, as long as we don't have more ips than what the gid table can hold. In addition handle the case where a gid is added but there are still no SAs added for the MACsec device, so the rules are added later on when the SAs are added. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 21014bc34516..728bcd6d184c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1325,6 +1325,7 @@ static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) return mlx5_is_roce_on(dev); } +#ifdef CONFIG_MLX5_MACSEC static inline bool mlx5e_is_macsec_device(const struct mlx5_core_dev *mdev) { if (!(MLX5_CAP_GEN_64(mdev, general_obj_types) & @@ -1363,11 +1364,12 @@ static inline bool mlx5_is_macsec_roce_supported(struct mlx5_core_dev *mdev) if (((MLX5_CAP_GEN_2(mdev, flow_table_type_2_type) & NIC_RDMA_BOTH_DIRS_CAPS) != NIC_RDMA_BOTH_DIRS_CAPS) || !MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, max_modify_header_actions) || - !mlx5e_is_macsec_device(mdev)) + !mlx5e_is_macsec_device(mdev) || !mdev->macsec_fs) return false; return true; } +#endif enum { MLX5_OCTWORD = 16, -- cgit v1.2.3 From 0f158b32a9b146c9b86783efccfd9ed02c744623 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Aug 2023 17:41:45 +0000 Subject: net: selectively purge error queue in IP_RECVERR / IPV6_RECVERR Setting IP_RECVERR and IPV6_RECVERR options to zero currently purges the socket error queue, which was probably not expected for zerocopy and tx_timestamp users. I discovered this issue while preparing commit 6b5f43ea0815 ("inet: move inet->recverr to inet->inet_flags"), I presume this change does not need to be backported to stable kernels. Add skb_errqueue_purge() helper to purge error messages only. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Soheil Hassas Yeganeh Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9aec136bc690..4174c4b82d13 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3180,6 +3180,7 @@ static inline void skb_queue_purge(struct sk_buff_head *list) } unsigned int skb_rbtree_purge(struct rb_root *root); +void skb_errqueue_purge(struct sk_buff_head *list); void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); -- cgit v1.2.3 From ab6860f62bfe329e11e5b5b7295b673c9c3a62d0 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 11 Aug 2023 12:08:19 +0200 Subject: block: simplify the disk_force_media_change interface Hard code the events to DISK_EVENT_MEDIA_CHANGE as that is the only useful use case, and drop the superfluous return value. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Message-Id: <20230811100828.1897174-9-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 83262702eea7..c8eab6effc22 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -750,7 +750,7 @@ static inline int bdev_read_only(struct block_device *bdev) } bool set_capacity_and_notify(struct gendisk *disk, sector_t size); -bool disk_force_media_change(struct gendisk *disk, unsigned int events); +void disk_force_media_change(struct gendisk *disk); void add_disk_randomness(struct gendisk *disk) __latent_entropy; void rand_initialize_disk(struct gendisk *disk); -- cgit v1.2.3 From 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 11 Aug 2023 12:08:24 +0200 Subject: block: consolidate __invalidate_device and fsync_bdev We currently have two interfaces that take a block_devices and the find a mounted file systems to flush or invaldidate data on it. Both are a bit problematic because they only work for the "main" block devices that is used as s_dev for the super_block, and because they don't call into the file system at all. Merge the two into a new bdev_mark_dead helper that does both the syncing and invalidation and which is properly documented. This is in preparation of merging the functionality into the ->mark_dead holder operation so that it will work on additional block devices used by a file systems and give us a single entry point for invalidation of dead devices or media. Note that a single standalone fsync_bdev call for an obscure ioctl remains for now, but that one will also be deal with in a bit. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Message-Id: <20230811100828.1897174-14-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c8eab6effc22..6721595b9f97 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -751,6 +751,7 @@ static inline int bdev_read_only(struct block_device *bdev) bool set_capacity_and_notify(struct gendisk *disk, sector_t size); void disk_force_media_change(struct gendisk *disk); +void bdev_mark_dead(struct block_device *bdev, bool surprise); void add_disk_randomness(struct gendisk *disk) __latent_entropy; void rand_initialize_disk(struct gendisk *disk); @@ -809,7 +810,6 @@ int __register_blkdev(unsigned int major, const char *name, void unregister_blkdev(unsigned int major, const char *name); bool disk_check_media_change(struct gendisk *disk); -int __invalidate_device(struct block_device *bdev, bool kill_dirty); void set_capacity(struct gendisk *disk, sector_t size); #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED -- cgit v1.2.3 From d8530de5a6e82be0ce17a5fdf727a394bcf6444c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 11 Aug 2023 12:08:25 +0200 Subject: block: call into the file system for bdev_mark_dead Combine the newly merged bdev_mark_dead helper with the existing mark_dead holder operation so that all operations that invalidate a device that is dead or being removed now go through the holder ops. This allows file systems to explicitly shutdown either ASAP (for a surprise removal) or after writing back data (for an orderly removal), and do so not only for the main device. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Message-Id: <20230811100828.1897174-15-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6721595b9f97..cdd03c612d39 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1461,7 +1461,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset); #endif struct blk_holder_ops { - void (*mark_dead)(struct block_device *bdev); + void (*mark_dead)(struct block_device *bdev, bool surprise); }; extern const struct blk_holder_ops fs_holder_ops; -- cgit v1.2.3 From 2142b88c37a3e49fbca4a36b8674626917d9bf40 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 11 Aug 2023 12:08:26 +0200 Subject: block: call into the file system for ioctl BLKFLSBUF BLKFLSBUF is a historic ioctl that is called on a file handle to a block device and syncs either the file system mounted on that block device if there is one, or otherwise the just the data on the block device. Replace the get_super based syncing with a holder operation to remove the last usage of get_super, and to also support syncing the file system if the block device is not the main block device stored in s_dev. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik Message-Id: <20230811100828.1897174-16-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/blkdev.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index cdd03c612d39..fa462d48ee96 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1462,6 +1462,11 @@ void blkdev_show(struct seq_file *seqf, off_t offset); struct blk_holder_ops { void (*mark_dead)(struct block_device *bdev, bool surprise); + + /* + * Sync the file system mounted on the block device. + */ + void (*sync)(struct block_device *bdev); }; extern const struct blk_holder_ops fs_holder_ops; @@ -1524,8 +1529,6 @@ static inline int early_lookup_bdev(const char *pathname, dev_t *dev) } #endif /* CONFIG_BLOCK */ -int fsync_bdev(struct block_device *bdev); - int freeze_bdev(struct block_device *bdev); int thaw_bdev(struct block_device *bdev); -- cgit v1.2.3 From 38bcdd38935350abceace901313e007e84d84456 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 11 Aug 2023 12:08:27 +0200 Subject: fs: remove get_super get_super is unused now, remove it. Signed-off-by: Christoph Hellwig Reviewed-by: Christian Brauner Reviewed-by: Josef Bacik Message-Id: <20230811100828.1897174-17-hch@lst.de> Signed-off-by: Christian Brauner --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..14b5777a24a0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2913,7 +2913,6 @@ extern int vfs_readlink(struct dentry *, char __user *, int); extern struct file_system_type *get_filesystem(struct file_system_type *fs); extern void put_filesystem(struct file_system_type *fs); extern struct file_system_type *get_fs_type(const char *name); -extern struct super_block *get_super(struct block_device *); extern struct super_block *get_active_super(struct block_device *bdev); extern void drop_super(struct super_block *sb); extern void drop_super_exclusive(struct super_block *sb); -- cgit v1.2.3 From ed2da9246f324ae88a2dcae629fc2008632ff151 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 28 Jun 2023 17:31:44 +0200 Subject: mm: remove folio_account_redirty Fold folio_account_redirty into folio_redirty_for_writepage now that all other users except for the also unused account_page_redirty wrapper are gone. Reviewed-by: Josef Bacik Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/writeback.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fba937999fbf..083387c00f0c 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -375,11 +375,6 @@ void tag_pages_for_writeback(struct address_space *mapping, pgoff_t start, pgoff_t end); bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio); -void folio_account_redirty(struct folio *folio); -static inline void account_page_redirty(struct page *page) -{ - folio_account_redirty(page_folio(page)); -} bool folio_redirty_for_writepage(struct writeback_control *, struct folio *); bool redirty_page_for_writepage(struct writeback_control *, struct page *); -- cgit v1.2.3 From 60ba4431c8e8c137f92d050b9f0bfc8e8c26a364 Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Fri, 18 Aug 2023 17:31:38 +0800 Subject: spi: pxa2xx: switch to use modern name Change legacy name master/slave to modern name host/target or controller. No functional changed. Signed-off-by: Yang Yingliang Link: https://lore.kernel.org/r/20230818093154.1183529-8-yangyingliang@huawei.com Signed-off-by: Mark Brown --- include/linux/spi/pxa2xx_spi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/pxa2xx_spi.h b/include/linux/spi/pxa2xx_spi.h index 4658e7801b42..0916cb9bcb0a 100644 --- a/include/linux/spi/pxa2xx_spi.h +++ b/include/linux/spi/pxa2xx_spi.h @@ -19,7 +19,7 @@ struct pxa2xx_spi_controller { u16 num_chipselect; u8 enable_dma; u8 dma_burst_size; - bool is_slave; + bool is_target; /* DMA engine specific config */ bool (*dma_filter)(struct dma_chan *chan, void *param); @@ -31,7 +31,7 @@ struct pxa2xx_spi_controller { }; /* - * The controller specific data for SPI slave devices + * The controller specific data for SPI target devices * (resides in spi_board_info.controller_data), * copied to spi_device.platform_data ... mostly for * DMA tuning. -- cgit v1.2.3 From 1cb3ebc417fe6cc5bea1b37682c0e860ea9ffb2d Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Fri, 18 Aug 2023 17:31:50 +0800 Subject: spi: sh-msiof: switch to use modern name Change legacy name master/slave to modern name host/target. No functional changed. Signed-off-by: Yang Yingliang Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20230818093154.1183529-20-yangyingliang@huawei.com Signed-off-by: Mark Brown --- include/linux/spi/sh_msiof.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/sh_msiof.h b/include/linux/spi/sh_msiof.h index dc2a0cbd210d..f950d280461b 100644 --- a/include/linux/spi/sh_msiof.h +++ b/include/linux/spi/sh_msiof.h @@ -3,8 +3,8 @@ #define __SPI_SH_MSIOF_H__ enum { - MSIOF_SPI_MASTER, - MSIOF_SPI_SLAVE, + MSIOF_SPI_HOST, + MSIOF_SPI_TARGET, }; struct sh_msiof_spi_info { -- cgit v1.2.3 From f6c05b9e5d9bb8f54c6d00b8070fa93619f62a5e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 17 Aug 2023 17:13:32 +0300 Subject: fs: add kerneldoc to file_{start,end}_write() helpers and use sb_end_write() instead of open coded version. Signed-off-by: Amir Goldstein Reviewed-by: Jan Kara Reviewed-by: Jens Axboe Message-Id: <20230817141337.1025891-3-amir73il@gmail.com> Signed-off-by: Christian Brauner --- include/linux/fs.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 57e2a0a9eea1..660ff7178caa 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2539,6 +2539,13 @@ static inline bool inode_wrong_type(const struct inode *inode, umode_t mode) return (inode->i_mode ^ mode) & S_IFMT; } +/** + * file_start_write - get write access to a superblock for regular file io + * @file: the file we want to write to + * + * This is a variant of sb_start_write() which is a noop on non-regualr file. + * Should be matched with a call to file_end_write(). + */ static inline void file_start_write(struct file *file) { if (!S_ISREG(file_inode(file)->i_mode)) @@ -2553,11 +2560,17 @@ static inline bool file_start_write_trylock(struct file *file) return sb_start_write_trylock(file_inode(file)->i_sb); } +/** + * file_end_write - drop write access to a superblock of a regular file + * @file: the file we wrote to + * + * Should be matched with a call to file_start_write(). + */ static inline void file_end_write(struct file *file) { if (!S_ISREG(file_inode(file)->i_mode)) return; - __sb_end_write(file_inode(file)->i_sb, SB_FREEZE_WRITE); + sb_end_write(file_inode(file)->i_sb); } /* -- cgit v1.2.3 From ed0360bbab72b829437b67ebb2f9cfac19f59dfe Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 17 Aug 2023 17:13:33 +0300 Subject: fs: create kiocb_{start,end}_write() helpers aio, io_uring, cachefiles and overlayfs, all open code an ugly variant of file_{start,end}_write() to silence lockdep warnings. Create helpers for this lockdep dance so we can use the helpers in all the callers. Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Reviewed-by: Jan Kara Reviewed-by: Jens Axboe Message-Id: <20230817141337.1025891-4-amir73il@gmail.com> Signed-off-by: Christian Brauner --- include/linux/fs.h | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 660ff7178caa..1eac9545a7b4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2573,6 +2573,42 @@ static inline void file_end_write(struct file *file) sb_end_write(file_inode(file)->i_sb); } +/** + * kiocb_start_write - get write access to a superblock for async file io + * @iocb: the io context we want to submit the write with + * + * This is a variant of sb_start_write() for async io submission. + * Should be matched with a call to kiocb_end_write(). + */ +static inline void kiocb_start_write(struct kiocb *iocb) +{ + struct inode *inode = file_inode(iocb->ki_filp); + + sb_start_write(inode->i_sb); + /* + * Fool lockdep by telling it the lock got released so that it + * doesn't complain about the held lock when we return to userspace. + */ + __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE); +} + +/** + * kiocb_end_write - drop write access to a superblock after async file io + * @iocb: the io context we sumbitted the write with + * + * Should be matched with a call to kiocb_start_write(). + */ +static inline void kiocb_end_write(struct kiocb *iocb) +{ + struct inode *inode = file_inode(iocb->ki_filp); + + /* + * Tell lockdep we inherited freeze protection from submission thread. + */ + __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); + sb_end_write(inode->i_sb); +} + /* * This is used for regular files where some users -- especially the * currently executed binary in a process, previously handled via -- cgit v1.2.3 From 2a6c0b4777ae51222b6d9d5d5687bd6cbf9ed4f8 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Jul 2023 20:28:12 +0300 Subject: pm: Introduce DEFINE_NOIRQ_DEV_PM_OPS() helper _DEFINE_DEV_PM_OPS() helps to define PM operations for the system sleep and/or runtime PM cases. Some of the existing users want to have _noirq() variants to be set. For that purpose introduce a new helper which sets up _noirq() callbacks to be assigned and struct dev_pm_ops be provided. Acked-by: "Rafael J. Wysocki" Reviewed-by: Jonathan Cameron Reviewed-by: Paul Cercueil Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20230717172821.62827-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko --- include/linux/pm.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index badad7d11f4f..1400c37b29c7 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -448,6 +448,15 @@ const struct dev_pm_ops __maybe_unused name = { \ SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ } +/* + * Use this if you want to have the suspend and resume callbacks be called + * with IRQs disabled. + */ +#define DEFINE_NOIRQ_DEV_PM_OPS(name, suspend_fn, resume_fn) \ +const struct dev_pm_ops name = { \ + NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ +} + #define pm_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM), (_ptr)) #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr)) -- cgit v1.2.3 From c7c7bce093c883739d6735d68604055131acfbea Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sun, 2 Jul 2023 15:15:18 +0200 Subject: efi/runtime-wrappers: Use type safe encapsulation of call arguments The current code that marshalls the EFI runtime call arguments to hand them off to a async helper does so in a type unsafe and slightly messy manner - everything is cast to void* except for some integral types that are passed by reference and dereferenced on the receiver end. Let's clean this up a bit, and record the arguments of each runtime service invocation exactly as they are issued, in a manner that permits the compiler to check the types of the arguments at both ends. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index e9004358f7bd..9d09dba116ce 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1264,23 +1264,21 @@ enum efi_rts_ids { EFI_QUERY_CAPSULE_CAPS, }; +union efi_rts_args; + /* * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments + * @args: Pointer to union describing the arguments * @status: Status of executing EFI Runtime Service * @efi_rts_id: EFI Runtime Service function identifier * @efi_rts_comp: Struct used for handling completions */ struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; + union efi_rts_args *args; + efi_status_t status; + struct work_struct work; + enum efi_rts_ids efi_rts_id; + struct completion efi_rts_comp; }; extern struct efi_runtime_work efi_rts_work; -- cgit v1.2.3 From 5e87491415217d6bec0bcae08a3156622be2b177 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 18 Aug 2023 16:00:50 +0200 Subject: super: wait for nascent superblocks Recent patches experiment with making it possible to allocate a new superblock before opening the relevant block device. Naturally this has intricate side-effects that we get to learn about while developing this. Superblock allocators such as sget{_fc}() return with s_umount of the new superblock held and lock ordering currently requires that block level locks such as bdev_lock and open_mutex rank above s_umount. Before aca740cecbe5 ("fs: open block device after superblock creation") ordering was guaranteed to be correct as block devices were opened prior to superblock allocation and thus s_umount wasn't held. But now s_umount must be dropped before opening block devices to avoid locking violations. This has consequences. The main one being that iterators over @super_blocks and @fs_supers that grab a temporary reference to the superblock can now also grab s_umount before the caller has managed to open block devices and called fill_super(). So whereas before such iterators or concurrent mounts would have simply slept on s_umount until SB_BORN was set or the superblock was discard due to initalization failure they can now needlessly spin through sget{_fc}(). If the caller is sleeping on bdev_lock or open_mutex one caller waiting on SB_BORN will always spin somewhere and potentially this can go on for quite a while. It should be possible to drop s_umount while allowing iterators to wait on a nascent superblock to either be born or discarded. This patch implements a wait_var_event() mechanism allowing iterators to sleep until they are woken when the superblock is born or discarded. This also allows us to avoid relooping through @fs_supers and @super_blocks if a superblock isn't yet born or dying. Link: aca740cecbe5 ("fs: open block device after superblock creation") Reviewed-by: Jan Kara Message-Id: <20230818-vfs-super-fixes-v3-v3-3-9f0b1876e46b@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 14b5777a24a0..173672645156 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1095,6 +1095,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_LAZYTIME BIT(25) /* Update the on-disk [acm]times lazily */ /* These sb flags are internal to the kernel */ +#define SB_DYING BIT(24) #define SB_SUBMOUNT BIT(26) #define SB_FORCE BIT(27) #define SB_NOSEC BIT(28) -- cgit v1.2.3 From 2c18a63b760a0f68f14cb8bb4c3840bb0b63b73e Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 18 Aug 2023 16:00:51 +0200 Subject: super: wait until we passed kill super Recent rework moved block device closing out of sb->put_super() and into sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and blkdev_put() can end up taking s_umount again. That means we need to move the removal of the superblock from @fs_supers out of generic_shutdown_super() and into deactivate_locked_super() to ensure that concurrent mounters don't fail to open block devices that are still in use because blkdev_put() in sb->kill_sb() hasn't been called yet. We can now do this as we can make iterators through @fs_super and @super_blocks wait without holding s_umount. Concurrent mounts will wait until a dying superblock is fully dead so until sb->kill_sb() has been called and SB_DEAD been set. Concurrent iterators can already discard any SB_DYING superblock. Reviewed-by: Jan Kara Message-Id: <20230818-vfs-super-fixes-v3-v3-4-9f0b1876e46b@kernel.org> Signed-off-by: Christian Brauner --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 173672645156..a63da68305e9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1095,6 +1095,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_LAZYTIME BIT(25) /* Update the on-disk [acm]times lazily */ /* These sb flags are internal to the kernel */ +#define SB_DEAD BIT(21) #define SB_DYING BIT(24) #define SB_SUBMOUNT BIT(26) #define SB_FORCE BIT(27) -- cgit v1.2.3 From 3ac9b733723e4a09ad9125ceb51898d904cd5169 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 18 Aug 2023 14:40:01 -0500 Subject: ACPI: Adjust #ifdef for *_lps0_dev use The `#ifdef` for acpi_register_lps0_dev() currently is guarded against `CONFIG_X86`, but actually the functions contained in the block are specifically sleep related functions. Adjust the guard to also check for `CONFIG_SUSPEND`. Signed-off-by: Mario Limonciello Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 641dc4843987..6b4966e5d7b1 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1100,7 +1100,7 @@ void acpi_os_set_prepare_extended_sleep(int (*func)(u8 sleep_state, acpi_status acpi_os_prepare_extended_sleep(u8 sleep_state, u32 val_a, u32 val_b); -#ifdef CONFIG_X86 +#if defined(CONFIG_SUSPEND) && defined(CONFIG_X86) struct acpi_s2idle_dev_ops { struct list_head list_node; void (*prepare)(void); @@ -1109,7 +1109,7 @@ struct acpi_s2idle_dev_ops { }; int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg); void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg); -#endif /* CONFIG_X86 */ +#endif /* CONFIG_SUSPEND && CONFIG_X86 */ #ifndef CONFIG_IA64 void arch_reserve_mem_area(acpi_physical_address addr, size_t size); #else -- cgit v1.2.3 From 1c2a66d47de36608551d73562d01bb4afcf1d61a Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 18 Aug 2023 14:40:07 -0500 Subject: ACPI: x86: s2idle: Add a function to get LPS0 constraint for a device Other parts of the kernel may use LPS0 constraints information to make decisions on what power state to put a device into. Signed-off-by: Mario Limonciello [ rjw: Rewrite kerneldoc, rearrange if () statement, edit subject and changelog ] Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6b4966e5d7b1..9e2a15dd33e4 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1109,6 +1109,12 @@ struct acpi_s2idle_dev_ops { }; int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg); void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg); +int acpi_get_lps0_constraint(struct acpi_device *adev); +#else /* CONFIG_SUSPEND && CONFIG_X86 */ +static inline int acpi_get_lps0_constraint(struct device *dev) +{ + return ACPI_STATE_UNKNOWN; +} #endif /* CONFIG_SUSPEND && CONFIG_X86 */ #ifndef CONFIG_IA64 void arch_reserve_mem_area(acpi_physical_address addr, size_t size); -- cgit v1.2.3 From 2ddd3cac1fa988323684cee567356e970e6750bd Mon Sep 17 00:00:00 2001 From: Elena Reshetova Date: Thu, 17 Aug 2023 21:13:32 -0700 Subject: nsproxy: Convert nsproxy.count to refcount_t atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsproxy.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in refcount.h have different memory ordering guarantees than their atomic counterparts. Please check Documentation/core-api/refcount-vs-atomic.rst for more information. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsproxy.count it might make a difference in following places: - put_nsproxy() and switch_task_namespaces(): decrement in refcount_dec_and_test() only provides RELEASE ordering and ACQUIRE ordering on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook Signed-off-by: Elena Reshetova Reviewed-by: David Windsor Reviewed-by: Hans Liljestrand Reviewed-by: Christian Brauner Link: https://lore.kernel.org/r/20230818041327.gonna.210-kees@kernel.org Signed-off-by: Kees Cook --- include/linux/nsproxy.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index fee881cded01..771cb0285872 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -29,7 +29,7 @@ struct fs_struct; * nsproxy is copied. */ struct nsproxy { - atomic_t count; + refcount_t count; struct uts_namespace *uts_ns; struct ipc_namespace *ipc_ns; struct mnt_namespace *mnt_ns; @@ -102,14 +102,13 @@ int __init nsproxy_cache_init(void); static inline void put_nsproxy(struct nsproxy *ns) { - if (atomic_dec_and_test(&ns->count)) { + if (refcount_dec_and_test(&ns->count)) free_nsproxy(ns); - } } static inline void get_nsproxy(struct nsproxy *ns) { - atomic_inc(&ns->count); + refcount_inc(&ns->count); } #endif -- cgit v1.2.3 From d74943a2f3cdade34e471b36f55f7979be656867 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 3 Aug 2023 16:32:02 +0200 Subject: mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs. As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages." liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now. As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option. To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments. Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users. Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set. 1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults. To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone(). As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users. So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation. While at it, remove a stale comment from follow_trans_huge_pmd(): That comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP migration"), which noted: THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it. [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand Reported-by: liubo Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com Reported-by: Peter Xu Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Acked-by: Mel Gorman Acked-by: Peter Xu Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Linus Torvalds Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Paolo Bonzini Cc: Shuah Khan Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 21 +++++++++++++++------ include/linux/mm_types.h | 9 +++++++++ 2 files changed, 24 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 406ab9ea818f..34f9dba17c1a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) * Indicates whether GUP can follow a PROT_NONE mapped page, or whether * a (NUMA hinting) fault is required. */ -static inline bool gup_can_follow_protnone(unsigned int flags) +static inline bool gup_can_follow_protnone(struct vm_area_struct *vma, + unsigned int flags) { /* - * FOLL_FORCE has to be able to make progress even if the VMA is - * inaccessible. Further, FOLL_FORCE access usually does not represent - * application behaviour and we should avoid triggering NUMA hinting - * faults. + * If callers don't want to honor NUMA hinting faults, no need to + * determine if we would actually have to trigger a NUMA hinting fault. */ - return flags & FOLL_FORCE; + if (!(flags & FOLL_HONOR_NUMA_FAULT)) + return true; + + /* + * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs. + * + * Requiring a fault here even for inaccessible VMAs would mean that + * FOLL_FORCE cannot make any progress, because handle_mm_fault() + * refuses to process NUMA hinting faults in inaccessible VMAs. + */ + return !vma_is_accessible(vma); } typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5e74ce4a28cd..7d30dc4ff0ff 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1286,6 +1286,15 @@ enum { FOLL_PCI_P2PDMA = 1 << 10, /* allow interrupts from generic signals */ FOLL_INTERRUPTIBLE = 1 << 11, + /* + * Always honor (trigger) NUMA hinting faults. + * + * FOLL_WRITE implicitly honors NUMA hinting faults because a + * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE + * apply). get_user_pages_fast_only() always implicitly honors NUMA + * hinting faults. + */ + FOLL_HONOR_NUMA_FAULT = 1 << 12, /* See also internal only FOLL flags in mm/internal.h */ }; -- cgit v1.2.3 From 8b9c1cc0418a43196477083e7082568e7a4c9418 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 3 Aug 2023 16:32:03 +0200 Subject: smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() We shouldn't be using a GUP-internal helper if it can be avoided. Similar to smaps_pte_entry() that uses vm_normal_page(), let's use vm_normal_page_pmd() that similarly refuses to return the huge zeropage. In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd(): (1) Will always return the head page, not a tail page of a THP. If we'd ever call smaps_account with a tail page while setting "compound = true", we could be in trouble, because smaps_account() would look at the memmap of unrelated pages. If we're unlucky, that memmap does not exist at all. Before we removed PG_doublemap, we could have triggered something similar as in commit 24d7275ce279 ("fs/proc: task_mmu.c: don't read mapcount for migration entry"). This can theoretically happen ever since commit ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock"): (a) We're in show_smaps_rollup() and processed a VMA (b) We release the mmap lock in show_smaps_rollup() because it is contended (c) We merged that VMA with another VMA (d) We collapsed a THP in that merged VMA at that position If the end address of the original VMA falls into the middle of a THP area, we would call smap_gather_stats() with a start address that falls into a PMD-mapped THP. It's probably very rare to trigger when not really forced. (2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page() Treat such PMDs here just like smaps_pte_entry() would treat such PTEs. If such pages would be anonymous, we most certainly would want to account them. (3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap() As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE pages. So just like smaps_pte_entry(), we'll now also ignore such PMD entries. Especially, follow_pmd_mask() never ends up calling follow_trans_huge_pmd() on pmd_devmap(). Instead it calls follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN is set. So skipping pmd_devmap() pages seems to be the right thing to do. (4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page() We won't be returning a memmap that should be ignored by core-mm, or worse, a memmap that does not even exist. Note that while walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't. Most probably this case doesn't currently really happen on the PMD level, otherwise we'd already be able to trigger kernel crashes when reading smaps / smaps_rollup. So most probably only (1) is relevant in practice as of now, but could only cause trouble in extreme corner cases. Let's move follow_trans_huge_pmd() to mm/internal.h to discourage future reuse in wrong context. Link: https://lkml.kernel.org/r/20230803143208.383663-3-david@redhat.com Fixes: ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock") Signed-off-by: David Hildenbrand Acked-by: Mel Gorman Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Linus Torvalds Cc: liubo Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Paolo Bonzini Cc: Peter Xu Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 20284387b841..e718dbe928ba 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -25,9 +25,6 @@ static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) #endif vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); -struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmd, - unsigned int flags); bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long next); int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, -- cgit v1.2.3 From 49b0638502da097c15d46cd4e871dbaa022caf7c Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 4 Aug 2023 08:27:19 -0700 Subject: mm: enable page walking API to lock vmas during the walk walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk. The change ensures that page walks which prevent concurrent page faults by write-locking mmap_lock, operate correctly after introduction of per-vma locks. With per-vma locks page faults can be handled under vma lock without taking mmap_lock at all, so write locking mmap_lock would not stop them. The change ensures vmas are properly locked during such walks. A sample issue this solves is do_mbind() performing queue_pages_range() to queue pages for migration. Without this change a concurrent page can be faulted into the area and be left out of migration. Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com Signed-off-by: Suren Baghdasaryan Suggested-by: Linus Torvalds Suggested-by: Jann Horn Cc: David Hildenbrand Cc: Davidlohr Bueso Cc: Hugh Dickins Cc: Johannes Weiner Cc: Laurent Dufour Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Michel Lespinasse Cc: Peter Xu Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- include/linux/pagewalk.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27a6df448ee5..27cd1e59ccf7 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -6,6 +6,16 @@ struct mm_walk; +/* Locking requirement during a page walk. */ +enum page_walk_lock { + /* mmap_lock should be locked for read to stabilize the vma tree */ + PGWALK_RDLOCK = 0, + /* vma will be write-locked during the walk */ + PGWALK_WRLOCK = 1, + /* vma is expected to be already write-locked during the walk */ + PGWALK_WRLOCK_VERIFY = 2, +}; + /** * struct mm_walk_ops - callbacks for walk_page_range * @pgd_entry: if set, called for each non-empty PGD (top-level) entry @@ -66,6 +76,7 @@ struct mm_walk_ops { int (*pre_vma)(unsigned long start, unsigned long end, struct mm_walk *walk); void (*post_vma)(struct mm_walk *walk); + enum page_walk_lock walk_lock; }; /* -- cgit v1.2.3 From 42c06a0e8ebe95b81e5fb41c6556ff22d9255b0c Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 17 Jul 2023 12:02:27 -0400 Subject: mm: kill frontswap The only user of frontswap is zswap, and has been for a long time. Have swap call into zswap directly and remove the indirection. [hannes@cmpxchg.org: remove obsolete comment, per Yosry] Link: https://lkml.kernel.org/r/20230719142832.GA932528@cmpxchg.org [fengwei.yin@intel.com: don't warn if none swapcache folio is passed to zswap_load] Link: https://lkml.kernel.org/r/20230810095652.3905184-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230717160227.GA867137@cmpxchg.org Signed-off-by: Johannes Weiner Signed-off-by: Yin Fengwei Acked-by: Konrad Rzeszutek Wilk Acked-by: Nhat Pham Acked-by: Yosry Ahmed Acked-by: Christoph Hellwig Cc: Domenico Cerasuolo Cc: Matthew Wilcox (Oracle) Cc: Vitaly Wool Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/frontswap.h | 91 ----------------------------------------------- include/linux/swap.h | 9 ----- include/linux/swapfile.h | 5 --- include/linux/zswap.h | 37 +++++++++++++++++++ 4 files changed, 37 insertions(+), 105 deletions(-) delete mode 100644 include/linux/frontswap.h create mode 100644 include/linux/zswap.h (limited to 'include/linux') diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h deleted file mode 100644 index eaa0ac5f9003..000000000000 --- a/include/linux/frontswap.h +++ /dev/null @@ -1,91 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_FRONTSWAP_H -#define _LINUX_FRONTSWAP_H - -#include -#include -#include -#include - -struct frontswap_ops { - void (*init)(unsigned); /* this swap type was just swapon'ed */ - int (*store)(unsigned, pgoff_t, struct page *); /* store a page */ - int (*load)(unsigned, pgoff_t, struct page *, bool *); /* load a page */ - void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ - void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ -}; - -int frontswap_register_ops(const struct frontswap_ops *ops); - -extern void frontswap_init(unsigned type, unsigned long *map); -extern int __frontswap_store(struct page *page); -extern int __frontswap_load(struct page *page); -extern void __frontswap_invalidate_page(unsigned, pgoff_t); -extern void __frontswap_invalidate_area(unsigned); - -#ifdef CONFIG_FRONTSWAP -extern struct static_key_false frontswap_enabled_key; - -static inline bool frontswap_enabled(void) -{ - return static_branch_unlikely(&frontswap_enabled_key); -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ - p->frontswap_map = map; -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return p->frontswap_map; -} -#else -/* all inline routines become no-ops and all externs are ignored */ - -static inline bool frontswap_enabled(void) -{ - return false; -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return NULL; -} -#endif - -static inline int frontswap_store(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_store(page); - - return -1; -} - -static inline int frontswap_load(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_load(page); - - return -1; -} - -static inline void frontswap_invalidate_page(unsigned type, pgoff_t offset) -{ - if (frontswap_enabled()) - __frontswap_invalidate_page(type, offset); -} - -static inline void frontswap_invalidate_area(unsigned type) -{ - if (frontswap_enabled()) - __frontswap_invalidate_area(type); -} - -#endif /* _LINUX_FRONTSWAP_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 456546443f1f..bb5adc604144 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -302,10 +302,6 @@ struct swap_info_struct { struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ struct completion comp; /* seldom referenced */ -#ifdef CONFIG_FRONTSWAP - unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ - atomic_t frontswap_pages; /* frontswap pages in-use counter */ -#endif spinlock_t lock; /* * protect map scan related fields like * swap_map, lowest_bit, highest_bit, @@ -630,11 +626,6 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif -#ifdef CONFIG_ZSWAP -extern u64 zswap_pool_total_size; -extern atomic_t zswap_stored_pages; -#endif - #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp); static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index 7ed529a77c5b..99e3ed469e88 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -2,11 +2,6 @@ #ifndef _LINUX_SWAPFILE_H #define _LINUX_SWAPFILE_H -/* - * these were static in swapfile.c but frontswap.c needs them and we don't - * want to expose them to the dozens of source files that include swap.h - */ -extern struct swap_info_struct *swap_info[]; extern unsigned long generic_max_swapfile_size(void); unsigned long arch_max_swapfile_size(void); diff --git a/include/linux/zswap.h b/include/linux/zswap.h new file mode 100644 index 000000000000..850c377d9b6d --- /dev/null +++ b/include/linux/zswap.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_ZSWAP_H +#define _LINUX_ZSWAP_H + +#include +#include + +extern u64 zswap_pool_total_size; +extern atomic_t zswap_stored_pages; + +#ifdef CONFIG_ZSWAP + +bool zswap_store(struct page *page); +bool zswap_load(struct page *page); +void zswap_invalidate(int type, pgoff_t offset); +void zswap_swapon(int type); +void zswap_swapoff(int type); + +#else + +static inline bool zswap_store(struct page *page) +{ + return false; +} + +static inline bool zswap_load(struct page *page) +{ + return false; +} + +static inline void zswap_invalidate(int type, pgoff_t offset) {} +static inline void zswap_swapon(int type) {} +static inline void zswap_swapoff(int type) {} + +#endif + +#endif /* _LINUX_ZSWAP_H */ -- cgit v1.2.3 From 34f4c198bfbe86612c368eb122002787acecaa93 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 15 Jul 2023 05:23:40 +0100 Subject: zswap: make zswap_store() take a folio Patch series "Followup folio conversions for zswap". With frontswap killed, it's worth converting the zswap_load() and zswap_store() functions to take a folio instead of a page pointer. They aren't converted to support large folios, but there are a lot of unnecessary calls to compound_head() that are removed by these patches. This patch (of 4): Only convert a few easy parts of this function to use the folio passed in; convert back to struct page for the majority of it. This does remove a few hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20230715042343.434588-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230715042343.434588-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Domenico Cerasuolo Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Nhat Pham Cc: Vitaly Wool Cc: Yosry Ahmed Signed-off-by: Andrew Morton --- include/linux/zswap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 850c377d9b6d..9f318c8bc367 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -10,7 +10,7 @@ extern atomic_t zswap_stored_pages; #ifdef CONFIG_ZSWAP -bool zswap_store(struct page *page); +bool zswap_store(struct folio *folio); bool zswap_load(struct page *page); void zswap_invalidate(int type, pgoff_t offset); void zswap_swapon(int type); @@ -18,7 +18,7 @@ void zswap_swapoff(int type); #else -static inline bool zswap_store(struct page *page) +static inline bool zswap_store(struct folio *folio) { return false; } -- cgit v1.2.3 From 074e3e262adb02a7a42e32e0aadfb6b6cf416854 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 15 Jul 2023 05:23:41 +0100 Subject: memcg: convert get_obj_cgroup_from_page to get_obj_cgroup_from_folio As the one caller now has a folio, pass it in and use it. Removes three calls to compound_head(). Link: https://lkml.kernel.org/r/20230715042343.434588-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Domenico Cerasuolo Cc: Johannes Weiner Cc: Nhat Pham Cc: Vitaly Wool Cc: Yosry Ahmed Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 58eb7ca65699..058fb748e128 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1759,7 +1759,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); struct obj_cgroup *get_obj_cgroup_from_current(void); -struct obj_cgroup *get_obj_cgroup_from_page(struct page *page); +struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); @@ -1843,7 +1843,7 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } -static inline struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) +static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) { return NULL; } -- cgit v1.2.3 From ca54f6d89d60abf3e7dea68c95dfd442eeece212 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 15 Jul 2023 05:23:43 +0100 Subject: zswap: make zswap_load() take a folio Only convert a few easy parts of this function to use the folio passed in; convert back to struct page for the majority of it. Removes three hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20230715042343.434588-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Domenico Cerasuolo Cc: Johannes Weiner Cc: Nhat Pham Cc: Vitaly Wool Cc: Yosry Ahmed Signed-off-by: Andrew Morton --- include/linux/zswap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 9f318c8bc367..2a60ce39cfde 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -11,7 +11,7 @@ extern atomic_t zswap_stored_pages; #ifdef CONFIG_ZSWAP bool zswap_store(struct folio *folio); -bool zswap_load(struct page *page); +bool zswap_load(struct folio *folio); void zswap_invalidate(int type, pgoff_t offset); void zswap_swapon(int type); void zswap_swapoff(int type); @@ -23,7 +23,7 @@ static inline bool zswap_store(struct folio *folio) return false; } -static inline bool zswap_load(struct page *page) +static inline bool zswap_load(struct folio *folio) { return false; } -- cgit v1.2.3 From c0a5d93a885b42c22085ad0f39233795f4a578e0 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Tue, 18 Jul 2023 22:58:10 +0800 Subject: mm/page_ext: add common function to get client data from page_ext Patch series "add page_ext_data to get client data in page_ext". Current clients get data from page_ext by adding offset which is auto generated in page_ext core and exposes the data layout design inside page_ext core. This series adds a page_ext_data() to hide this from clients. Benefits include: 1. Future clients can call page_ext_data directly instead of defining a new function like get_page_owner to get the data. 2. There is no change to clients if the layout of page_ext data changes. This patch (of 3): Add common page_ext_data function to get client data. This could hide offset which is auto generated in page_ext core and expose the desgin of page_ext data layout. Link: https://lkml.kernel.org/r/20230718145812.1991717-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20230718145812.1991717-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Reviewed-by: Andrew Morton Acked-by: Mike Rapoport (IBM) Signed-off-by: Andrew Morton --- include/linux/page_ext.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 67314f648aeb..658d8d39a10e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -82,6 +82,12 @@ static inline void page_ext_init(void) extern struct page_ext *page_ext_get(struct page *page); extern void page_ext_put(struct page_ext *page_ext); +static inline void *page_ext_data(struct page_ext *page_ext, + struct page_ext_operations *ops) +{ + return (void *)(page_ext) + ops->offset; +} + static inline struct page_ext *page_ext_next(struct page_ext *curr) { void *next = curr; -- cgit v1.2.3 From 5d241789dfe1c0e5c9b4eb4ae6e48590964b4976 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 27 Jul 2023 19:59:34 +0800 Subject: mm/memcg: fix obsolete function name in mem_cgroup_protection() Commit 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") changed the function name but not the corresponding comment. Link: https://lkml.kernel.org/r/20230727115934.657787-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Shakeel Butt Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 058fb748e128..419e001a02e4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -582,7 +582,7 @@ static inline void mem_cgroup_protection(struct mem_cgroup *root, /* * There is no reclaim protection applied to a targeted reclaim. * We are special casing this specific case here because - * mem_cgroup_protected calculation is not robust enough to keep + * mem_cgroup_calculate_protection is not robust enough to keep * the protection invariant for calculated effective values for * parallel reclaimers with different reclaim target. This is * especially a problem for tail memcgs (as they have pages on LRU) -- cgit v1.2.3 From 67311a36e5e1e56dfa7264c93db759b18217e114 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Mon, 17 Jul 2023 19:32:27 +0800 Subject: mm/page_ext: move page_ext_operations definition under CONFIG_PAGE_EXTENSION page_ext_operations should only be defined when CONFIG_PAGE_EXTENSION is enabled. Besides, this may detect missing reliance on CONFIG_PAGE_EXTENSION from future Page Extension clients at compile time. Link: https://lkml.kernel.org/r/20230717113227.1897173-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi Signed-off-by: Andrew Morton --- include/linux/page_ext.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 658d8d39a10e..be98564191e6 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -8,6 +8,7 @@ struct pglist_data; +#ifdef CONFIG_PAGE_EXTENSION /** * struct page_ext_operations - per page_ext client operations * @offset: Offset to the client's data within page_ext. Offset is returned to @@ -29,8 +30,6 @@ struct page_ext_operations { bool need_shared_flags; }; -#ifdef CONFIG_PAGE_EXTENSION - /* * The page_ext_flags users must set need_shared_flags to true. */ -- cgit v1.2.3 From 11250fd12eb8a58205e69ea36f19fa8c084afb62 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Fri, 28 Jul 2023 13:00:40 +0800 Subject: mm: factor out VMA stack and heap checks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "mm: convert to vma_is_initial_heap/stack()", v3. Add vma_is_initial_stack() and vma_is_initial_heap() helpers and use them to simplify code. This patch (of 4): Factor out VMA stack and heap checks and name them vma_is_initial_stack() and vma_is_initial_heap() for general use. Link: https://lkml.kernel.org/r/20230728050043.59880-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20230728050043.59880-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: David Hildenbrand Acked-by: Peter Zijlstra (Intel) Cc: Christian Göttsche Cc: Alex Deucher Cc: Arnaldo Carvalho de Melo Cc: Christian Göttsche Cc: Christian König Cc: Daniel Vetter Cc: David Airlie Cc: Eric Paris Cc: Felix Kuehling Cc: "Pan, Xinhui" Cc: Paul Moore Cc: Stephen Smalley Signed-off-by: Andrew Morton --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index b2520dd555f9..f64d1de3af09 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -833,6 +833,31 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma) return !vma->vm_ops; } +/* + * Indicate if the VMA is a heap for the given task; for + * /proc/PID/maps that is the heap of the main task. + */ +static inline bool vma_is_initial_heap(const struct vm_area_struct *vma) +{ + return vma->vm_start <= vma->vm_mm->brk && + vma->vm_end >= vma->vm_mm->start_brk; +} + +/* + * Indicate if the VMA is a stack for the given task; for + * /proc/PID/maps that is the stack of the main task. + */ +static inline bool vma_is_initial_stack(const struct vm_area_struct *vma) +{ + /* + * We make no effort to guess what a given thread considers to be + * its "stack". It's not even well-defined for programs written + * languages like Go. + */ + return vma->vm_start <= vma->vm_mm->start_stack && + vma->vm_end >= vma->vm_mm->start_stack; +} + static inline bool vma_is_temporary_stack(struct vm_area_struct *vma) { int maybe_stack = vma->vm_flags & (VM_GROWSDOWN | VM_GROWSUP); -- cgit v1.2.3 From ca39c5e7d10f09983293f064caa447690cb3ec92 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 1 Aug 2023 20:43:59 +0800 Subject: mm/memcg: update obsolete comment above parent_mem_cgroup() Since commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode"), use_hierarchy is already deprecated. And it's further removed via commit 9d9d341df4d5 ("cgroup: remove obsoleted broken_hierarchy and warned_broken_hierarchy"). Update corresponding comment. Link: https://lkml.kernel.org/r/20230801124359.2266860-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Shakeel Butt Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 419e001a02e4..163004ae3349 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -860,8 +860,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec) * parent_mem_cgroup - find the accounting parent of a memcg * @memcg: memcg whose parent to find * - * Returns the parent memcg, or NULL if this is the root or the memory - * controller is in legacy no-hierarchy mode. + * Returns the parent memcg, or NULL if this is the root. */ static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { -- cgit v1.2.3 From ab9bda001b681c293fb72ef21f083adfbcd78028 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Wed, 2 Aug 2023 21:43:00 +0000 Subject: mm/damon/core: introduce address range type damos filter Patch series "Extend DAMOS filters for address ranges and DAMON monitoring targets" There are use cases that need to apply DAMOS schemes to specific address ranges or DAMON monitoring targets. NUMA nodes in the physical address space, special memory objects in the virtual address space, and monitoring target specific efficient monitoring results snapshot retrieval could be examples of such use cases. This patchset extends DAMOS filters feature for such cases, by implementing two more filter types, namely address ranges and DAMON monitoring types. Patches sequence ---------------- The first seven patches are for the address ranges based DAMOS filter. The first patch implements the filter feature and expose it via DAMON kernel API. The second patch further expose the feature to users via DAMON sysfs interface. The third and fourth patches implement unit tests and selftests for the feature. Three patches (fifth to seventh) updating the documents follow. The following six patches are for the DAMON monitoring target based DAMOS filter. The eighth patch implements the feature in the core layer and expose it via DAMON's kernel API. The ninth patch further expose it to users via DAMON sysfs interface. Tenth patch add a selftest, and two patches (eleventh and twelfth) update documents. [1] https://lore.kernel.org/damon/20230728203444.70703-1-sj@kernel.org/ This patch (of 13): Users can know special characteristic of specific address ranges. NUMA nodes or special objects or buffers in virtual address space could be such examples. For such cases, DAMOS schemes could required to be applied to only specific address ranges. Implement yet another type of DAMOS filter for the purpose. Note that the existing filter types, namely anon pages and memcg DAMOS filters needed page level type check. Because such check can be done efficiently in the opertions set layer, those filters are handled in operations set layer. Specifically, only paddr operations set implementation supports these filters. Also, because statistics counting is done in the DAMON core layer, the regions that filtered out by these filters are counted as tried but failed to the statistics. Unlike those, address range based filters can efficiently handled in the core layer. Hence, do the handling in the layer, and count the regions that filtered out by those as the scheme has not tried for the region. This difference should clearly documented. Link: https://lkml.kernel.org/r/20230802214312.110532-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230802214312.110532-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Jonathan Corbet Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/damon.h | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index d5d4d19928e0..476f37a883a4 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -226,16 +226,24 @@ struct damos_stat { * enum damos_filter_type - Type of memory for &struct damos_filter * @DAMOS_FILTER_TYPE_ANON: Anonymous pages. * @DAMOS_FILTER_TYPE_MEMCG: Specific memcg's pages. + * @DAMOS_FILTER_TYPE_ADDR: Address range. * @NR_DAMOS_FILTER_TYPES: Number of filter types. * - * The support of each filter type is up to running &struct damon_operations. - * &enum DAMON_OPS_PADDR is supporting all filter types, while - * &enum DAMON_OPS_VADDR and &enum DAMON_OPS_FVADDR are not supporting any - * filter types. + * The anon pages type and memcg type filters are handled by underlying + * &struct damon_operations as a part of scheme action trying, and therefore + * accounted as 'tried'. In contrast, other types are handled by core layer + * before trying of the action and therefore not accounted as 'tried'. + * + * The support of the filters that handled by &struct damon_operations depend + * on the running &struct damon_operations. + * &enum DAMON_OPS_PADDR supports both anon pages type and memcg type filters, + * while &enum DAMON_OPS_VADDR and &enum DAMON_OPS_FVADDR don't support any of + * the two types. */ enum damos_filter_type { DAMOS_FILTER_TYPE_ANON, DAMOS_FILTER_TYPE_MEMCG, + DAMOS_FILTER_TYPE_ADDR, NR_DAMOS_FILTER_TYPES, }; @@ -244,18 +252,20 @@ enum damos_filter_type { * @type: Type of the page. * @matching: If the matching page should filtered out or in. * @memcg_id: Memcg id of the question if @type is DAMOS_FILTER_MEMCG. + * @addr_range: Address range if @type is DAMOS_FILTER_TYPE_ADDR. * @list: List head for siblings. * * Before applying the &damos->action to a memory region, DAMOS checks if each * page of the region matches to this and avoid applying the action if so. - * Note that the check support is up to &struct damon_operations - * implementation. + * Support of each filter type depends on the running &struct damon_operations + * and the type. Refer to &enum damos_filter_type for more detai. */ struct damos_filter { enum damos_filter_type type; bool matching; union { unsigned short memcg_id; + struct damon_addr_range addr_range; }; struct list_head list; }; -- cgit v1.2.3 From 17e7c724d3c2e622c4d9969b7a473e8ed1d14ff0 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Wed, 2 Aug 2023 21:43:07 +0000 Subject: mm/damon/core: implement target type damos filter One DAMON context can have multiple monitoring targets, and DAMOS schemes are applied to all targets. In some cases, users need to apply different scheme to different targets. Retrieving monitoring results via DAMON sysfs interface' 'tried_regions' directory could be one good example. Also, there could be cases that cgroup DAMOS filter is not enough. All such use cases can be worked around by having multiple DAMON contexts having only single target, but it is inefficient in terms of resource usage, thogh the overhead is not estimated to be huge. Implement DAMON monitoring target based DAMOS filter for the case. Like address range target DAMOS filter, handle these filters in the DAMON core layer, since it is more efficient than doing in operations set layer. This also means that regions that filtered out by monitoring target type DAMOS filters are counted as not tried by the scheme. Hence, target granularity monitoring results retrieval via DAMON sysfs interface becomes available. Link: https://lkml.kernel.org/r/20230802214312.110532-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: Brendan Higgins Cc: Jonathan Corbet Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/damon.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 476f37a883a4..ae2664d1d5f1 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -227,6 +227,7 @@ struct damos_stat { * @DAMOS_FILTER_TYPE_ANON: Anonymous pages. * @DAMOS_FILTER_TYPE_MEMCG: Specific memcg's pages. * @DAMOS_FILTER_TYPE_ADDR: Address range. + * @DAMOS_FILTER_TYPE_TARGET: Data Access Monitoring target. * @NR_DAMOS_FILTER_TYPES: Number of filter types. * * The anon pages type and memcg type filters are handled by underlying @@ -244,6 +245,7 @@ enum damos_filter_type { DAMOS_FILTER_TYPE_ANON, DAMOS_FILTER_TYPE_MEMCG, DAMOS_FILTER_TYPE_ADDR, + DAMOS_FILTER_TYPE_TARGET, NR_DAMOS_FILTER_TYPES, }; @@ -253,6 +255,9 @@ enum damos_filter_type { * @matching: If the matching page should filtered out or in. * @memcg_id: Memcg id of the question if @type is DAMOS_FILTER_MEMCG. * @addr_range: Address range if @type is DAMOS_FILTER_TYPE_ADDR. + * @target_idx: Index of the &struct damon_target of + * &damon_ctx->adaptive_targets if @type is + * DAMOS_FILTER_TYPE_TARGET. * @list: List head for siblings. * * Before applying the &damos->action to a memory region, DAMOS checks if each @@ -266,6 +271,7 @@ struct damos_filter { union { unsigned short memcg_id; struct damon_addr_range addr_range; + int target_idx; }; struct list_head list; }; -- cgit v1.2.3 From ce2fc5fffdfa9fc1412aff108afa102ddf82fd2b Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 4 Aug 2023 08:27:20 -0700 Subject: mm: for !CONFIG_PER_VMA_LOCK equate write lock assertion for vma and mmap When CONFIG_PER_VMA_LOCK=n, vma_assert_write_locked() should be equivalent to mmap_assert_write_locked(). Link: https://lkml.kernel.org/r/20230804152724.3090321-3-surenb@google.com Suggested-by: Jann Horn Signed-off-by: Suren Baghdasaryan Reviewed-by: Liam R. Howlett Cc: Linus Torvalds Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index f64d1de3af09..49eafc62b4e6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -738,7 +738,8 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma) + { mmap_assert_write_locked(vma->vm_mm); } static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} -- cgit v1.2.3 From 60081bf19b0ec8fa40c589bd361fa2bc763f1050 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 4 Aug 2023 08:27:22 -0700 Subject: mm: lock vma explicitly before doing vm_flags_reset and vm_flags_reset_once Implicit vma locking inside vm_flags_reset() and vm_flags_reset_once() is not obvious and makes it hard to understand where vma locking is happening. Also in some cases (like in dup_userfaultfd()) vma should be locked earlier than vma_flags modification. To make locking more visible, change these functions to assert that the vma write lock is taken and explicitly lock the vma beforehand. Fix userfaultfd functions which should lock the vma earlier. Link: https://lkml.kernel.org/r/20230804152724.3090321-5-surenb@google.com Suggested-by: Linus Torvalds Signed-off-by: Suren Baghdasaryan Cc: Jann Horn Cc: Liam R. Howlett Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 49eafc62b4e6..5a6ff9140090 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -774,18 +774,22 @@ static inline void vm_flags_init(struct vm_area_struct *vma, ACCESS_PRIVATE(vma, __vm_flags) = flags; } -/* Use when VMA is part of the VMA tree and modifications need coordination */ +/* + * Use when VMA is part of the VMA tree and modifications need coordination + * Note: vm_flags_reset and vm_flags_reset_once do not lock the vma and + * it should be locked explicitly beforehand. + */ static inline void vm_flags_reset(struct vm_area_struct *vma, vm_flags_t flags) { - vma_start_write(vma); + vma_assert_write_locked(vma); vm_flags_init(vma, flags); } static inline void vm_flags_reset_once(struct vm_area_struct *vma, vm_flags_t flags) { - vma_start_write(vma); + vma_assert_write_locked(vma); WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags); } -- cgit v1.2.3 From 9a9d0b829901125553c36b9512b2a5da4505be31 Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Mon, 7 Aug 2023 01:16:11 +0200 Subject: mm: move dummy_vm_ops out of a header Otherwise the kernel ends up with multiple copies: $ nm vmlinux | grep dummy_vm_ops ffffffff81e4ea00 d dummy_vm_ops.2 ffffffff81e11760 d dummy_vm_ops.254 ffffffff81e406e0 d dummy_vm_ops.4 ffffffff81e3c780 d dummy_vm_ops.7 While here prefix it with vma_. Link: https://lkml.kernel.org/r/20230806231611.1395735-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 5a6ff9140090..c63ec57a54dc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -751,17 +751,17 @@ static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, #endif /* CONFIG_PER_VMA_LOCK */ +extern const struct vm_operations_struct vma_dummy_vm_ops; + /* * WARNING: vma_init does not initialize vma->vm_lock. * Use vm_area_alloc()/vm_area_free() if vma needs locking. */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { - static const struct vm_operations_struct dummy_vm_ops = {}; - memset(vma, 0, sizeof(*vma)); vma->vm_mm = mm; - vma->vm_ops = &dummy_vm_ops; + vma->vm_ops = &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); -- cgit v1.2.3 From 3f32c49ed6f15c8412a8abc93a92c4b37e6c4592 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Tue, 8 Aug 2023 11:33:59 +0800 Subject: mm: memtest: convert to memtest_report_meminfo() It is better to not expose too many internal variables of memtest, add a helper memtest_report_meminfo() to show memtest results. Link: https://lkml.kernel.org/r/20230808033359.174986-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Mike Rapoport (IBM) Cc: Matthew Wilcox Cc: Tomas Mudrunka Signed-off-by: Andrew Morton --- include/linux/memblock.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 0d031fbfea25..1c1072e3ca06 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -594,13 +594,11 @@ extern int hashdist; /* Distribute hashes across NUMA nodes? */ #endif #ifdef CONFIG_MEMTEST -extern phys_addr_t early_memtest_bad_size; /* Size of faulty ram found by memtest */ -extern bool early_memtest_done; /* Was early memtest done? */ -extern void early_memtest(phys_addr_t start, phys_addr_t end); +void early_memtest(phys_addr_t start, phys_addr_t end); +void memtest_report_meminfo(struct seq_file *m); #else -static inline void early_memtest(phys_addr_t start, phys_addr_t end) -{ -} +static inline void early_memtest(phys_addr_t start, phys_addr_t end) { } +static inline void memtest_report_meminfo(struct seq_file *m) { } #endif -- cgit v1.2.3 From e3c2bfdd33a30b34674fb8839f5476ab2702c1c1 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 8 Aug 2023 14:44:57 +0530 Subject: mm/memory_hotplug: allow memmap on memory hotplug request to fallback If not supported, fallback to not using memap on memmory. This avoids the need for callers to do the fallback. Link: https://lkml.kernel.org/r/20230808091501.287660-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Acked-by: Michal Hocko Acked-by: David Hildenbrand Cc: Christophe Leroy Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Vishal Verma Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 013c69753c91..7d2076583494 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -97,6 +97,8 @@ typedef int __bitwise mhp_t; * To do so, we will use the beginning of the hot-added range to build * the page tables for the memmap array that describes the entire range. * Only selected architectures support it with SPARSE_VMEMMAP. + * This is only a hint, the core kernel can decide to not do this based on + * different alignment checks. */ #define MHP_MEMMAP_ON_MEMORY ((__force mhp_t)BIT(1)) /* @@ -354,7 +356,6 @@ extern struct zone *zone_for_pfn_range(int online_type, int nid, extern int arch_create_linear_mapping(int nid, u64 start, u64 size, struct mhp_params *params); void arch_remove_linear_mapping(u64 start, u64 size); -extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ #endif /* __LINUX_MEMORY_HOTPLUG_H */ -- cgit v1.2.3 From 1a8c64e110435e44e71bcd50a75663174b575f22 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 8 Aug 2023 14:45:01 +0530 Subject: mm/memory_hotplug: embed vmem_altmap details in memory block With memmap on memory, some architecture needs more details w.r.t altmap such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of computing them again when we remove a memory block, embed vmem_altmap details in struct memory_block if we are using memmap on memory block feature. [yangyingliang@huawei.com: fix error return code in add_memory_resource()] Link: https://lkml.kernel.org/r/20230809081552.1351184-1-yangyingliang@huawei.com Link: https://lkml.kernel.org/r/20230808091501.287660-7-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Signed-off-by: Yang Yingliang Acked-by: Michal Hocko Acked-by: David Hildenbrand Cc: Christophe Leroy Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Vishal Verma Signed-off-by: Andrew Morton --- include/linux/memory.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory.h b/include/linux/memory.h index 31343566c221..f53cfdaaaa41 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -77,11 +77,7 @@ struct memory_block { */ struct zone *zone; struct device dev; - /* - * Number of vmemmap pages. These pages - * lay at the beginning of the memory block. - */ - unsigned long nr_vmemmap_pages; + struct vmem_altmap *altmap; struct memory_group *group; /* group (if any) for this block */ struct list_head group_next; /* next block inside memory group */ #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG) @@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri) extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, - unsigned long vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); -- cgit v1.2.3 From f7bda0d85dd7733143b5ea66987cb9b102bd0189 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:43 -0700 Subject: mm: add PAGE_TYPE_OP folio functions Patch series "Split ptdesc from struct page", v9. The MM subsystem is trying to shrink struct page. This patchset introduces a memory descriptor for page table tracking - struct ptdesc. This patchset introduces ptdesc, splits ptdesc from struct page, and converts many callers of page table constructor/destructors to use ptdescs. Ptdesc is a foundation to further standardize page tables, and eventually allow for dynamic allocation of page tables independent of struct page. However, the use of pages for page table tracking is quite deeply ingrained and varied across archictectures, so there is still a lot of work to be done before that can happen. This patch (of 31): No folio equivalents for page type operations have been defined, so define them for later folio conversions. Also changes the Page##uname macros to take in const struct page* since we only read the memory here. Link: https://lkml.kernel.org/r/20230807230513.102486-1-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20230807230513.102486-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Huacai Chen Cc: Hugh Dickins Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Cc: Geert Uytterhoeven Cc: Guo Ren Cc: John Paul Adrian Glaubitz Cc: Palmer Dabbelt Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 92a2063a0a23..9218028caf33 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page) #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) +#define folio_test_type(folio, flag) \ + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) static inline int page_type_has_type(unsigned int page_type) { @@ -919,27 +921,41 @@ static inline int page_has_type(struct page *page) return page_type_has_type(page->page_type); } -#define PAGE_TYPE_OPS(uname, lname) \ -static __always_inline int Page##uname(struct page *page) \ +#define PAGE_TYPE_OPS(uname, lname, fname) \ +static __always_inline int Page##uname(const struct page *page) \ { \ return PageType(page, PG_##lname); \ } \ +static __always_inline int folio_test_##fname(const struct folio *folio)\ +{ \ + return folio_test_type(folio, PG_##lname); \ +} \ static __always_inline void __SetPage##uname(struct page *page) \ { \ VM_BUG_ON_PAGE(!PageType(page, 0), page); \ page->page_type &= ~PG_##lname; \ } \ +static __always_inline void __folio_set_##fname(struct folio *folio) \ +{ \ + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \ + folio->page.page_type &= ~PG_##lname; \ +} \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type |= PG_##lname; \ -} +} \ +static __always_inline void __folio_clear_##fname(struct folio *folio) \ +{ \ + VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ + folio->page.page_type |= PG_##lname; \ +} \ /* * PageBuddy() indicates that the page is free and in the buddy system * (see mm/page_alloc.c). */ -PAGE_TYPE_OPS(Buddy, buddy) +PAGE_TYPE_OPS(Buddy, buddy, buddy) /* * PageOffline() indicates that the page is logically offline although the @@ -963,7 +979,7 @@ PAGE_TYPE_OPS(Buddy, buddy) * pages should check PageOffline() and synchronize with such drivers using * page_offline_freeze()/page_offline_thaw(). */ -PAGE_TYPE_OPS(Offline, offline) +PAGE_TYPE_OPS(Offline, offline, offline) extern void page_offline_freeze(void); extern void page_offline_thaw(void); @@ -973,12 +989,12 @@ extern void page_offline_end(void); /* * Marks pages in use as page tables. */ -PAGE_TYPE_OPS(Table, table) +PAGE_TYPE_OPS(Table, table, pgtable) /* * Marks guardpages used with debug_pagealloc. */ -PAGE_TYPE_OPS(Guard, guard) +PAGE_TYPE_OPS(Guard, guard, guard) extern bool is_free_buddy_page(struct page *page); -- cgit v1.2.3 From 9a35de4ffc209bd7956c4811ad17c4883791db43 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:44 -0700 Subject: pgtable: create struct ptdesc Currently, page table information is stored within struct page. As part of simplifying struct page, create struct ptdesc for page table information. Link: https://lkml.kernel.org/r/20230807230513.102486-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 70 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1fc4b9c2c8a6..d4ebcefe482e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -397,6 +397,76 @@ FOLIO_MATCH(flags, _flags_2); FOLIO_MATCH(compound_head, _head_2); #undef FOLIO_MATCH +/** + * struct ptdesc - Memory descriptor for page tables. + * @__page_flags: Same as page flags. Unused for page tables. + * @pt_rcu_head: For freeing page table pages. + * @pt_list: List of used page tables. Used for s390 and x86. + * @_pt_pad_1: Padding that aliases with page's compound head. + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs. + * @__page_mapping: Aliases with page->mapping. Unused for page tables. + * @pt_mm: Used for x86 pgds. + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 only. + * @_pt_pad_2: Padding to ensure proper alignment. + * @ptl: Lock for the page table. + * @__page_type: Same as page->page_type. Unused for page tables. + * @_refcount: Same as page refcount. Used for s390 page tables. + * @pt_memcg_data: Memcg data. Tracked for page tables here. + * + * This struct overlays struct page for now. Do not modify without a good + * understanding of the issues. + */ +struct ptdesc { + unsigned long __page_flags; + + union { + struct rcu_head pt_rcu_head; + struct list_head pt_list; + struct { + unsigned long _pt_pad_1; + pgtable_t pmd_huge_pte; + }; + }; + unsigned long __page_mapping; + + union { + struct mm_struct *pt_mm; + atomic_t pt_frag_refcount; + }; + + union { + unsigned long _pt_pad_2; +#if ALLOC_SPLIT_PTLOCKS + spinlock_t *ptl; +#else + spinlock_t ptl; +#endif + }; + unsigned int __page_type; + atomic_t _refcount; +#ifdef CONFIG_MEMCG + unsigned long pt_memcg_data; +#endif +}; + +#define TABLE_MATCH(pg, pt) \ + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt)) +TABLE_MATCH(flags, __page_flags); +TABLE_MATCH(compound_head, pt_list); +TABLE_MATCH(compound_head, _pt_pad_1); +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte); +TABLE_MATCH(mapping, __page_mapping); +TABLE_MATCH(pt_mm, pt_mm); +TABLE_MATCH(ptl, ptl); +TABLE_MATCH(rcu_head, pt_rcu_head); +TABLE_MATCH(page_type, __page_type); +TABLE_MATCH(_refcount, _refcount); +#ifdef CONFIG_MEMCG +TABLE_MATCH(memcg_data, pt_memcg_data); +#endif +#undef TABLE_MATCH +static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); + /* * Used for sizing the vmemmap region on some architectures */ -- cgit v1.2.3 From bf2d4334f72e4e033166c5a3bf1331a7238eab9d Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:45 -0700 Subject: mm: add utility functions for ptdesc Introduce utility functions setting the foundation for ptdescs. These will also assist in the splitting out of ptdesc from struct page. Functions that focus on the descriptor are prefixed with ptdesc_* while functions that focus on the pagetable are prefixed with pagetable_*. pagetable_alloc() is defined to allocate new ptdesc pages as compound pages. This is to standardize ptdescs by allowing for one allocation and one free function, in contrast to 2 allocation and 2 free functions. Link: https://lkml.kernel.org/r/20230807230513.102486-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Mike Rapoport (IBM) Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 12 ++++++++++ 2 files changed, 73 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index c63ec57a54dc..f75058783359 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2772,6 +2772,57 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a } #endif /* CONFIG_MMU */ +static inline struct ptdesc *virt_to_ptdesc(const void *x) +{ + return page_ptdesc(virt_to_page(x)); +} + +static inline void *ptdesc_to_virt(const struct ptdesc *pt) +{ + return page_to_virt(ptdesc_page(pt)); +} + +static inline void *ptdesc_address(const struct ptdesc *pt) +{ + return folio_address(ptdesc_folio(pt)); +} + +static inline bool pagetable_is_reserved(struct ptdesc *pt) +{ + return folio_test_reserved(ptdesc_folio(pt)); +} + +/** + * pagetable_alloc - Allocate pagetables + * @gfp: GFP flags + * @order: desired pagetable order + * + * pagetable_alloc allocates memory for page tables as well as a page table + * descriptor to describe that memory. + * + * Return: The ptdesc describing the allocated page tables. + */ +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order) +{ + struct page *page = alloc_pages(gfp | __GFP_COMP, order); + + return page_ptdesc(page); +} + +/** + * pagetable_free - Free pagetables + * @pt: The page table descriptor + * + * pagetable_free frees the memory of all page tables described by a page + * table descriptor and the memory for the descriptor itself. + */ +static inline void pagetable_free(struct ptdesc *pt) +{ + struct page *page = ptdesc_page(pt); + + __free_pages(page, compound_order(page)); +} + #if USE_SPLIT_PTE_PTLOCKS #if ALLOC_SPLIT_PTLOCKS void __init ptlock_cache_init(void); @@ -2898,6 +2949,11 @@ static inline struct page *pmd_pgtable_page(pmd_t *pmd) return virt_to_page((void *)((unsigned long) pmd & mask)); } +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd) +{ + return page_ptdesc(pmd_pgtable_page(pmd)); +} + static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) { return ptlock_ptr(pmd_pgtable_page(pmd)); @@ -3010,6 +3066,11 @@ static inline void mark_page_reserved(struct page *page) adjust_managed_page_count(page, -1); } +static inline void free_reserved_ptdesc(struct ptdesc *pt) +{ + free_reserved_page(ptdesc_page(pt)); +} + /* * Default method to free all the __init memory into the buddy system. * The freed pages will be poisoned with pattern "poison" if it's within diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d4ebcefe482e..c57e940e60d0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -467,6 +467,18 @@ TABLE_MATCH(memcg_data, pt_memcg_data); #undef TABLE_MATCH static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); +#define ptdesc_page(pt) (_Generic((pt), \ + const struct ptdesc *: (const struct page *)(pt), \ + struct ptdesc *: (struct page *)(pt))) + +#define ptdesc_folio(pt) (_Generic((pt), \ + const struct ptdesc *: (const struct folio *)(pt), \ + struct ptdesc *: (struct folio *)(pt))) + +#define page_ptdesc(p) (_Generic((p), \ + const struct page *: (const struct ptdesc *)(p), \ + struct page *: (struct ptdesc *)(p))) + /* * Used for sizing the vmemmap region on some architectures */ -- cgit v1.2.3 From f8546d8494ca22fe530c8ba79beaf1509b47f6d1 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:46 -0700 Subject: mm: convert pmd_pgtable_page() callers to use pmd_ptdesc() Converts internal pmd_pgtable_page() callers to use pmd_ptdesc(). This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index f75058783359..6fee233dfccc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2956,7 +2956,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd) static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) { - return ptlock_ptr(pmd_pgtable_page(pmd)); + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd))); } static inline bool pmd_ptlock_init(struct page *page) @@ -2975,7 +2975,7 @@ static inline void pmd_ptlock_free(struct page *page) ptlock_free(page); } -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte) +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte) #else -- cgit v1.2.3 From f5ecca06b3a5d0371ee27ee08aa06c686407a8af Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:47 -0700 Subject: mm: convert ptlock_alloc() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 6fee233dfccc..ccea0665247c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2826,7 +2826,7 @@ static inline void pagetable_free(struct ptdesc *pt) #if USE_SPLIT_PTE_PTLOCKS #if ALLOC_SPLIT_PTLOCKS void __init ptlock_cache_init(void); -extern bool ptlock_alloc(struct page *page); +bool ptlock_alloc(struct ptdesc *ptdesc); extern void ptlock_free(struct page *page); static inline spinlock_t *ptlock_ptr(struct page *page) @@ -2838,7 +2838,7 @@ static inline void ptlock_cache_init(void) { } -static inline bool ptlock_alloc(struct page *page) +static inline bool ptlock_alloc(struct ptdesc *ptdesc) { return true; } @@ -2868,7 +2868,7 @@ static inline bool ptlock_init(struct page *page) * slab code uses page->slab_cache, which share storage with page->ptl. */ VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); - if (!ptlock_alloc(page)) + if (!ptlock_alloc(page_ptdesc(page))) return false; spin_lock_init(ptlock_ptr(page)); return true; -- cgit v1.2.3 From 1865484af6b2ced2f367c1042b5f3816c11db148 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:48 -0700 Subject: mm: convert ptlock_ptr() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-7-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index ccea0665247c..7860529737d4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2829,9 +2829,9 @@ void __init ptlock_cache_init(void); bool ptlock_alloc(struct ptdesc *ptdesc); extern void ptlock_free(struct page *page); -static inline spinlock_t *ptlock_ptr(struct page *page) +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) { - return page->ptl; + return ptdesc->ptl; } #else /* ALLOC_SPLIT_PTLOCKS */ static inline void ptlock_cache_init(void) @@ -2847,15 +2847,15 @@ static inline void ptlock_free(struct page *page) { } -static inline spinlock_t *ptlock_ptr(struct page *page) +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) { - return &page->ptl; + return &ptdesc->ptl; } #endif /* ALLOC_SPLIT_PTLOCKS */ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) { - return ptlock_ptr(pmd_page(*pmd)); + return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); } static inline bool ptlock_init(struct page *page) @@ -2870,7 +2870,7 @@ static inline bool ptlock_init(struct page *page) VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); if (!ptlock_alloc(page_ptdesc(page))) return false; - spin_lock_init(ptlock_ptr(page)); + spin_lock_init(ptlock_ptr(page_ptdesc(page))); return true; } @@ -2956,7 +2956,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd) static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) { - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd))); + return ptlock_ptr(pmd_ptdesc(pmd)); } static inline bool pmd_ptlock_init(struct page *page) -- cgit v1.2.3 From edbaefe53c6418ea11372e6b3ce952ab8caa1f78 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:49 -0700 Subject: mm: convert pmd_ptlock_init() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-8-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 7860529737d4..d70366169c3d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2959,12 +2959,12 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) return ptlock_ptr(pmd_ptdesc(pmd)); } -static inline bool pmd_ptlock_init(struct page *page) +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE - page->pmd_huge_pte = NULL; + ptdesc->pmd_huge_pte = NULL; #endif - return ptlock_init(page); + return ptlock_init(ptdesc_page(ptdesc)); } static inline void pmd_ptlock_free(struct page *page) @@ -2984,7 +2984,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) return &mm->page_table_lock; } -static inline bool pmd_ptlock_init(struct page *page) { return true; } +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void pmd_ptlock_free(struct page *page) {} #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte) @@ -3000,7 +3000,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd) static inline bool pgtable_pmd_page_ctor(struct page *page) { - if (!pmd_ptlock_init(page)) + if (!pmd_ptlock_init(page_ptdesc(page))) return false; __SetPageTable(page); inc_lruvec_page_state(page, NR_PAGETABLE); -- cgit v1.2.3 From 75b25d49ca6638f9a4eac47cff508b174743d907 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:50 -0700 Subject: mm: convert ptlock_init() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-9-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d70366169c3d..e53ef2eb95bb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2858,7 +2858,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); } -static inline bool ptlock_init(struct page *page) +static inline bool ptlock_init(struct ptdesc *ptdesc) { /* * prep_new_page() initialize page->private (and therefore page->ptl) @@ -2867,10 +2867,10 @@ static inline bool ptlock_init(struct page *page) * It can happen if arch try to use slab for page table allocation: * slab code uses page->slab_cache, which share storage with page->ptl. */ - VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); - if (!ptlock_alloc(page_ptdesc(page))) + VM_BUG_ON_PAGE(*(unsigned long *)&ptdesc->ptl, ptdesc_page(ptdesc)); + if (!ptlock_alloc(ptdesc)) return false; - spin_lock_init(ptlock_ptr(page_ptdesc(page))); + spin_lock_init(ptlock_ptr(ptdesc)); return true; } @@ -2883,13 +2883,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) return &mm->page_table_lock; } static inline void ptlock_cache_init(void) {} -static inline bool ptlock_init(struct page *page) { return true; } +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct page *page) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ static inline bool pgtable_pte_page_ctor(struct page *page) { - if (!ptlock_init(page)) + if (!ptlock_init(page_ptdesc(page))) return false; __SetPageTable(page); inc_lruvec_page_state(page, NR_PAGETABLE); @@ -2964,7 +2964,7 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) #ifdef CONFIG_TRANSPARENT_HUGEPAGE ptdesc->pmd_huge_pte = NULL; #endif - return ptlock_init(ptdesc_page(ptdesc)); + return ptlock_init(ptdesc); } static inline void pmd_ptlock_free(struct page *page) -- cgit v1.2.3 From 7e5f42ae3413785c68c383acb787f9ce8f243096 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:51 -0700 Subject: mm: convert pmd_ptlock_free() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-10-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e53ef2eb95bb..8884c700dfc6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2967,12 +2967,12 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) return ptlock_init(ptdesc); } -static inline void pmd_ptlock_free(struct page *page) +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE - VM_BUG_ON_PAGE(page->pmd_huge_pte, page); + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc)); #endif - ptlock_free(page); + ptlock_free(ptdesc_page(ptdesc)); } #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte) @@ -2985,7 +2985,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) } static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; } -static inline void pmd_ptlock_free(struct page *page) {} +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {} #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte) @@ -3009,7 +3009,7 @@ static inline bool pgtable_pmd_page_ctor(struct page *page) static inline void pgtable_pmd_page_dtor(struct page *page) { - pmd_ptlock_free(page); + pmd_ptlock_free(page_ptdesc(page)); __ClearPageTable(page); dec_lruvec_page_state(page, NR_PAGETABLE); } -- cgit v1.2.3 From 6ed1b8a09deb0b99fd3b54e11535c80284689555 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:52 -0700 Subject: mm: convert ptlock_free() to use ptdescs This removes some direct accesses to struct page, working towards splitting out struct ptdesc from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-11-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 8884c700dfc6..d0fb31bcd482 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2827,7 +2827,7 @@ static inline void pagetable_free(struct ptdesc *pt) #if ALLOC_SPLIT_PTLOCKS void __init ptlock_cache_init(void); bool ptlock_alloc(struct ptdesc *ptdesc); -extern void ptlock_free(struct page *page); +void ptlock_free(struct ptdesc *ptdesc); static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) { @@ -2843,7 +2843,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc) return true; } -static inline void ptlock_free(struct page *page) +static inline void ptlock_free(struct ptdesc *ptdesc) { } @@ -2884,7 +2884,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) } static inline void ptlock_cache_init(void) {} static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } -static inline void ptlock_free(struct page *page) {} +static inline void ptlock_free(struct ptdesc *ptdesc) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ static inline bool pgtable_pte_page_ctor(struct page *page) @@ -2898,7 +2898,7 @@ static inline bool pgtable_pte_page_ctor(struct page *page) static inline void pgtable_pte_page_dtor(struct page *page) { - ptlock_free(page); + ptlock_free(page_ptdesc(page)); __ClearPageTable(page); dec_lruvec_page_state(page, NR_PAGETABLE); } @@ -2972,7 +2972,7 @@ static inline void pmd_ptlock_free(struct ptdesc *ptdesc) #ifdef CONFIG_TRANSPARENT_HUGEPAGE VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc)); #endif - ptlock_free(ptdesc_page(ptdesc)); + ptlock_free(ptdesc); } #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte) -- cgit v1.2.3 From 7e11dca14b27e11596a0c49c7d20bc7816cb0508 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:53 -0700 Subject: mm: create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor} Create pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(), and pagetable_pmd_dtor() and make the original pgtable constructor/destructors wrappers. Link: https://lkml.kernel.org/r/20230807230513.102486-12-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 56 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d0fb31bcd482..6fdc294ada0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2887,20 +2887,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct ptdesc *ptdesc) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ -static inline bool pgtable_pte_page_ctor(struct page *page) +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc) { - if (!ptlock_init(page_ptdesc(page))) + struct folio *folio = ptdesc_folio(ptdesc); + + if (!ptlock_init(ptdesc)) return false; - __SetPageTable(page); - inc_lruvec_page_state(page, NR_PAGETABLE); + __folio_set_pgtable(folio); + lruvec_stat_add_folio(folio, NR_PAGETABLE); return true; } +static inline bool pgtable_pte_page_ctor(struct page *page) +{ + return pagetable_pte_ctor(page_ptdesc(page)); +} + +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc) +{ + struct folio *folio = ptdesc_folio(ptdesc); + + ptlock_free(ptdesc); + __folio_clear_pgtable(folio); + lruvec_stat_sub_folio(folio, NR_PAGETABLE); +} + static inline void pgtable_pte_page_dtor(struct page *page) { - ptlock_free(page_ptdesc(page)); - __ClearPageTable(page); - dec_lruvec_page_state(page, NR_PAGETABLE); + pagetable_pte_dtor(page_ptdesc(page)); } pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); @@ -2998,20 +3012,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd) return ptl; } -static inline bool pgtable_pmd_page_ctor(struct page *page) +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc) { - if (!pmd_ptlock_init(page_ptdesc(page))) + struct folio *folio = ptdesc_folio(ptdesc); + + if (!pmd_ptlock_init(ptdesc)) return false; - __SetPageTable(page); - inc_lruvec_page_state(page, NR_PAGETABLE); + __folio_set_pgtable(folio); + lruvec_stat_add_folio(folio, NR_PAGETABLE); return true; } +static inline bool pgtable_pmd_page_ctor(struct page *page) +{ + return pagetable_pmd_ctor(page_ptdesc(page)); +} + +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc) +{ + struct folio *folio = ptdesc_folio(ptdesc); + + pmd_ptlock_free(ptdesc); + __folio_clear_pgtable(folio); + lruvec_stat_sub_folio(folio, NR_PAGETABLE); +} + static inline void pgtable_pmd_page_dtor(struct page *page) { - pmd_ptlock_free(page_ptdesc(page)); - __ClearPageTable(page); - dec_lruvec_page_state(page, NR_PAGETABLE); + pagetable_pmd_dtor(page_ptdesc(page)); } /* -- cgit v1.2.3 From 4f054c28f425b2f1623c9fdc2c2f69719a22190b Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:04:57 -0700 Subject: mm: remove page table members from struct page The page table members are now split out into their own ptdesc struct. Remove them from struct page. Link: https://lkml.kernel.org/r/20230807230513.102486-16-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 21 --------------------- 1 file changed, 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c57e940e60d0..369b7fd35d03 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -141,24 +141,6 @@ struct page { struct { /* Tail pages of compound page */ unsigned long compound_head; /* Bit zero is set */ }; - struct { /* Page table pages */ - unsigned long _pt_pad_1; /* compound_head */ - pgtable_t pmd_huge_pte; /* protected by page->ptl */ - /* - * A PTE page table page might be freed by use of - * rcu_head: which overlays those two fields above. - */ - unsigned long _pt_pad_2; /* mapping */ - union { - struct mm_struct *pt_mm; /* x86 pgds only */ - atomic_t pt_frag_refcount; /* powerpc */ - }; -#if ALLOC_SPLIT_PTLOCKS - spinlock_t *ptl; -#else - spinlock_t ptl; -#endif - }; struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; @@ -454,10 +436,7 @@ struct ptdesc { TABLE_MATCH(flags, __page_flags); TABLE_MATCH(compound_head, pt_list); TABLE_MATCH(compound_head, _pt_pad_1); -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte); TABLE_MATCH(mapping, __page_mapping); -TABLE_MATCH(pt_mm, pt_mm); -TABLE_MATCH(ptl, ptl); TABLE_MATCH(rcu_head, pt_rcu_head); TABLE_MATCH(page_type, __page_type); TABLE_MATCH(_refcount, _refcount); -- cgit v1.2.3 From 9a4bbd8d975e01a777005e00c0e26d72bb6cc15a Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Mon, 7 Aug 2023 16:05:13 -0700 Subject: mm: remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers These functions are no longer necessary. Remove them and cleanup Documentation referencing them. Link: https://lkml.kernel.org/r/20230807230513.102486-32-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Acked-by: Mike Rapoport (IBM) Cc: Arnd Bergmann Cc: Catalin Marinas Cc: Christophe Leroy Cc: Claudio Imbrenda Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Huacai Chen Cc: Hugh Dickins Cc: John Paul Adrian Glaubitz Cc: Jonas Bonn Cc: Matthew Wilcox Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Weinberger Cc: Thomas Bogendoerfer Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 20 -------------------- 1 file changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 6fdc294ada0d..88ac4cf9d3ec 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2898,11 +2898,6 @@ static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc) return true; } -static inline bool pgtable_pte_page_ctor(struct page *page) -{ - return pagetable_pte_ctor(page_ptdesc(page)); -} - static inline void pagetable_pte_dtor(struct ptdesc *ptdesc) { struct folio *folio = ptdesc_folio(ptdesc); @@ -2912,11 +2907,6 @@ static inline void pagetable_pte_dtor(struct ptdesc *ptdesc) lruvec_stat_sub_folio(folio, NR_PAGETABLE); } -static inline void pgtable_pte_page_dtor(struct page *page) -{ - pagetable_pte_dtor(page_ptdesc(page)); -} - pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) { @@ -3023,11 +3013,6 @@ static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc) return true; } -static inline bool pgtable_pmd_page_ctor(struct page *page) -{ - return pagetable_pmd_ctor(page_ptdesc(page)); -} - static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc) { struct folio *folio = ptdesc_folio(ptdesc); @@ -3037,11 +3022,6 @@ static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc) lruvec_stat_sub_folio(folio, NR_PAGETABLE); } -static inline void pgtable_pmd_page_dtor(struct page *page) -{ - pagetable_pmd_dtor(page_ptdesc(page)); -} - /* * No scalability reason to split PUD locks yet, but follow the same pattern * as the PMD locks to make it easier if we decide to. The VM should not be -- cgit v1.2.3 From 202e14222fadb246dfdf182e67de1518e86a1e20 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:40:58 +1000 Subject: memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2 Given the difficulty of auditing all of userspace to figure out whether every memfd_create() user has switched to passing MFD_EXEC and MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible for older programs that don't make use of executable memfds to run under vm.memfd_noexec=2. Otherwise, a small dependency change can result in spurious errors. For programs that don't use executable memfds, passing MFD_NOEXEC_SEAL is functionally a no-op and thus having the same In addition, every failure under vm.memfd_noexec=2 needs to print to the kernel log so that userspace can figure out where the error came from. The concerns about pr_warn_ratelimited() spam that caused the switch to pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case. This is a user-visible API change, but as it allows programs to do something that would be blocked before, and the sysctl itself was broken and recently released, it seems unlikely this will cause any issues. [1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-2-7ff9e3e10ba6@cyphar.com Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai Cc: Dominique Martinet Cc: Christian Brauner Cc: Daniel Verkamp Cc: Jeff Xu Cc: Kees Cook Cc: Shuah Khan Cc: Signed-off-by: Andrew Morton --- include/linux/pid_namespace.h | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index c758809d5bcf..53974d79d98e 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -17,18 +17,10 @@ struct fs_pin; #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) -/* - * sysctl for vm.memfd_noexec - * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL - * acts like MFD_EXEC was set. - * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL - * acts like MFD_NOEXEC_SEAL was set. - * 2: memfd_create() without MFD_NOEXEC_SEAL will be - * rejected. - */ -#define MEMFD_NOEXEC_SCOPE_EXEC 0 -#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 -#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 +/* modes for vm.memfd_noexec sysctl */ +#define MEMFD_NOEXEC_SCOPE_EXEC 0 /* MFD_EXEC implied if unset */ +#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 /* MFD_NOEXEC_SEAL implied if unset */ +#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 /* same as 1, except MFD_EXEC rejected */ #endif struct pid_namespace { -- cgit v1.2.3 From 9876cfe8ec1cb3c88de31f4d58d57b0e7e22bcc4 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:41:00 +1000 Subject: memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy This sysctl has the very unusual behaviour of not allowing any user (even CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were to set this sysctl to a more restrictive option in the host pidns you would need to reboot your machine in order to reset it. The justification given in [1] is that this is a security feature and thus it should not be possible to disable. Aside from the fact that we have plenty of security-related sysctls that can be disabled after being enabled (fs.protected_symlinks for instance), the protection provided by the sysctl is to stop users from being able to create a binary and then execute it. A user with CAP_SYS_ADMIN can trivially do this without memfd_create(2): % cat mount-memfd.c #include #include #include #include #include #include #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:" int main(void) { int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC); assert(fsfd >= 0); assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2)); int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0); assert(dfd >= 0); int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782); assert(execfd >= 0); assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE)); assert(!close(execfd)); char *execpath = NULL; char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL }; execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC); assert(execfd >= 0); assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0); assert(!execve(execpath, argv, envp)); } % ./mount-memfd this file was executed from this totally private tmpfs: /proc/self/fd/5 % Given that it is possible for CAP_SYS_ADMIN users to create executable binaries without memfd_create(2) and without touching the host filesystem (not to mention the many other things a CAP_SYS_ADMIN process would be able to do that would be equivalent or worse), it seems strange to cause a fair amount of headache to admins when there doesn't appear to be an actual security benefit to blocking this. There appear to be concerns about confused-deputy-esque attacks[2] but a confused deputy that can write to arbitrary sysctls is a bigger security issue than executable memfds. /* New API */ The primary requirement from the original author appears to be more based on the need to be able to restrict an entire system in a hierarchical manner[3], such that child namespaces cannot re-enable executable memfds. So, implement that behaviour explicitly -- the vm.memfd_noexec scope is evaluated up the pidns tree to &init_pid_ns and you have the most restrictive value applied to you. The new lower limit you can set vm.memfd_noexec is whatever limit applies to your parent. Note that a pidns will inherit a copy of the parent pidns's effective vm.memfd_noexec setting at unshare() time. This matches the existing behaviour, and it also ensures that a pidns will never have its vm.memfd_noexec setting *lowered* behind its back (but it will be raised if the parent raises theirs). /* Backwards Compatibility */ As the previous version of the sysctl didn't allow you to lower the setting at all, there are no backwards compatibility issues with this aspect of the change. However it should be noted that now that the setting is completely hierarchical. Previously, a cloned pidns would just copy the current pidns setting, meaning that if the parent's vm.memfd_noexec was changed it wouldn't propoagate to existing pid namespaces. Now, the restriction applies recursively. This is a uAPI change, however: * The sysctl is very new, having been merged in 6.3. * Several aspects of the sysctl were broken up until this patchset and the other patchset by Jeff Xu last month. And thus it seems incredibly unlikely that any real users would run into this issue. In the worst case, if this causes userspace isues we could make it so that modifying the setting follows the hierarchical rules but the restriction checking uses the cached copy. [1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/ [2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg@mail.gmail.com/ [3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai Cc: Dominique Martinet Cc: Christian Brauner Cc: Daniel Verkamp Cc: Jeff Xu Cc: Kees Cook Cc: Shuah Khan Cc: Signed-off-by: Andrew Morton --- include/linux/pid_namespace.h | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 53974d79d98e..f9f9931e02d6 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -39,7 +39,6 @@ struct pid_namespace { int reboot; /* group exit code if this pidns was rebooted */ struct ns_common ns; #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) - /* sysctl for vm.memfd_noexec */ int memfd_noexec_scope; #endif } __randomize_layout; @@ -56,6 +55,23 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) return ns; } +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + int scope = MEMFD_NOEXEC_SCOPE_EXEC; + + for (; ns; ns = ns->parent) + scope = max(scope, READ_ONCE(ns->memfd_noexec_scope)); + + return scope; +} +#else +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + return 0; +} +#endif + extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns); extern void zap_pid_ns_processes(struct pid_namespace *pid_ns); @@ -70,6 +86,11 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) return ns; } +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + return 0; +} + static inline struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns) { -- cgit v1.2.3 From 1b6754fea43cebd72d969618219347c5ec01eb8d Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Sat, 12 Aug 2023 11:01:28 +0000 Subject: writeback: remove unused delaration of bdi_async_bio_wq It seems it was introduced by commit d3f77dfdc718 ("blkcg: implement REQ_CGROUP_PUNT") unintentionally, but the definition does not exist, remove it. Link: https://lkml.kernel.org/r/20230812110128.482650-1-xiujianfeng@huaweicloud.com Signed-off-by: Xiu Jianfeng Acked-by: Tejun Heo Cc: Stefan Roesch Signed-off-by: Andrew Morton --- include/linux/backing-dev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index fbad4fcd408e..1a97277f99b1 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -46,7 +46,6 @@ extern spinlock_t bdi_lock; extern struct list_head bdi_list; extern struct workqueue_struct *bdi_wq; -extern struct workqueue_struct *bdi_async_bio_wq; static inline bool wb_has_dirty_io(struct bdi_writeback *wb) { -- cgit v1.2.3 From e45a2e947dfa6da2d73e2cf91ed6399c12522d4f Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Tue, 15 Aug 2023 11:06:09 +0800 Subject: pagemap: remove wait_on_page_locked_killable() There is no users of wait_on_page_locked_killable(), remove it. Link: https://lkml.kernel.org/r/20230815030609.39313-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0ab0f2362b9b..f4f24b594cd7 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1060,11 +1060,6 @@ static inline void wait_on_page_locked(struct page *page) folio_wait_locked(page_folio(page)); } -static inline int wait_on_page_locked_killable(struct page *page) -{ - return folio_wait_locked_killable(page_folio(page)); -} - void wait_on_page_writeback(struct page *page); void folio_wait_writeback(struct folio *folio); int folio_wait_writeback_killable(struct folio *folio); -- cgit v1.2.3 From 39ced19b9e60f6f5b80db6b72965a8cf1982439d Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 14 Aug 2023 19:33:43 +0300 Subject: lib/vsprintf: split out sprintf() and friends Patch series "lib/vsprintf: Rework header inclusions", v3. Some patches that reduce the mess with the header inclusions related to vsprintf.c module. Each patch has its own description, and has no dependencies to each other, except the collisions over modifications of the same places. Hence the series. This patch (of 2): kernel.h is being used as a dump for all kinds of stuff for a long time. sprintf() and friends are used in many drivers without need of the full kernel.h dependency train with it. Here is the attempt on cleaning it up by splitting out sprintf() and friends. Link: https://lkml.kernel.org/r/20230814163344.17429-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20230814163344.17429-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Reviewed-by: Petr Mladek Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Marco Elver Cc: Rasmus Villemoes Cc: Sergey Senozhatsky Cc: Steven Rostedt (Google) Signed-off-by: Andrew Morton --- include/linux/kernel.h | 30 +----------------------------- include/linux/sprintf.h | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 29 deletions(-) create mode 100644 include/linux/sprintf.h (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index b9e76f717a7e..cee8fe87e9f4 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -203,35 +204,6 @@ static inline void might_fault(void) { } void do_exit(long error_code) __noreturn; -extern int num_to_str(char *buf, int size, - unsigned long long num, unsigned int width); - -/* lib/printf utilities */ - -extern __printf(2, 3) int sprintf(char *buf, const char * fmt, ...); -extern __printf(2, 0) int vsprintf(char *buf, const char *, va_list); -extern __printf(3, 4) -int snprintf(char *buf, size_t size, const char *fmt, ...); -extern __printf(3, 0) -int vsnprintf(char *buf, size_t size, const char *fmt, va_list args); -extern __printf(3, 4) -int scnprintf(char *buf, size_t size, const char *fmt, ...); -extern __printf(3, 0) -int vscnprintf(char *buf, size_t size, const char *fmt, va_list args); -extern __printf(2, 3) __malloc -char *kasprintf(gfp_t gfp, const char *fmt, ...); -extern __printf(2, 0) __malloc -char *kvasprintf(gfp_t gfp, const char *fmt, va_list args); -extern __printf(2, 0) -const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args); - -extern __scanf(2, 3) -int sscanf(const char *, const char *, ...); -extern __scanf(2, 0) -int vsscanf(const char *, const char *, va_list); - -extern int no_hash_pointers_enable(char *str); - extern int get_option(char **str, int *pint); extern char *get_options(const char *str, int nints, int *ints); extern unsigned long long memparse(const char *ptr, char **retptr); diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h new file mode 100644 index 000000000000..9ca23bcf9f42 --- /dev/null +++ b/include/linux/sprintf.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_KERNEL_SPRINTF_H_ +#define _LINUX_KERNEL_SPRINTF_H_ + +#include +#include + +int num_to_str(char *buf, int size, unsigned long long num, unsigned int width); + +__printf(2, 3) int sprintf(char *buf, const char * fmt, ...); +__printf(2, 0) int vsprintf(char *buf, const char *, va_list); +__printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...); +__printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args); +__printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...); +__printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args); +__printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...); +__printf(2, 0) __malloc char *kvasprintf(gfp_t gfp, const char *fmt, va_list args); +__printf(2, 0) const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args); + +__scanf(2, 3) int sscanf(const char *, const char *, ...); +__scanf(2, 0) int vsscanf(const char *, const char *, va_list); + +int no_hash_pointers_enable(char *str); + +#endif /* _LINUX_KERNEL_SPRINTF_H */ -- cgit v1.2.3 From 665536092355f17f0e2ea291eec70f9787dccd32 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 14 Aug 2023 19:33:44 +0300 Subject: lib/vsprintf: declare no_hash_pointers in sprintf.h Sparse is not happy to see non-static variable without declaration: lib/vsprintf.c:61:6: warning: symbol 'no_hash_pointers' was not declared. Should it be static? Declare respective variable in the sprintf.h. With this, add a comment to discourage its use if no real need. Link: https://lkml.kernel.org/r/20230814163344.17429-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Acked-by: Marco Elver Reviewed-by: Petr Mladek Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Rasmus Villemoes Cc: Sergey Senozhatsky Cc: Steven Rostedt (Google) Signed-off-by: Andrew Morton --- include/linux/sprintf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h index 9ca23bcf9f42..33dcbec71925 100644 --- a/include/linux/sprintf.h +++ b/include/linux/sprintf.h @@ -20,6 +20,8 @@ __printf(2, 0) const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list __scanf(2, 3) int sscanf(const char *, const char *, ...); __scanf(2, 0) int vsscanf(const char *, const char *, va_list); +/* These are for specific cases, do not use without real need */ +extern bool no_hash_pointers; int no_hash_pointers_enable(char *str); #endif /* _LINUX_KERNEL_SPRINTF_H */ -- cgit v1.2.3 From 5ffd2c37cb7a53d52099e5ed1fd7ccbc9e358791 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 17 Aug 2023 18:37:08 +0200 Subject: kill do_each_thread() Eric has pointed out that we still have 3 users of do_each_thread(). Change them to use for_each_process_thread() and kill this helper. There is a subtle change, after do_each_thread/while_each_thread g == t == &init_task, while after for_each_process_thread() they both point to nowhere, but this doesn't matter. > Why is for_each_process_thread() better than do_each_thread()? Say, for_each_process_thread() is rcu safe, do_each_thread() is not. And certainly for_each_process_thread(p, t) { do_something(p, t); } looks better than do_each_thread(p, t) { do_something(p, t); } while_each_thread(p, t); And again, there are only 3 users of this awkward helper left. It should have been killed years ago and in fact I thought it had already been killed. It uses while_each_thread() which needs some changes. Link: https://lkml.kernel.org/r/20230817163708.GA8248@redhat.com Signed-off-by: Oleg Nesterov Reviewed-by: Kees Cook Cc: "Christian Brauner (Microsoft)" Cc: Eric W. Biederman Cc: Jiri Slaby # tty/serial Signed-off-by: Andrew Morton --- include/linux/sched/signal.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 669e8cff40c7..0deebe2ab07d 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -648,13 +648,6 @@ extern void flush_itimer_signals(void); extern bool current_is_single_threaded(void); -/* - * Careful: do_each_thread/while_each_thread is a double loop so - * 'break' will not work as expected - use goto instead. - */ -#define do_each_thread(g, t) \ - for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do - #define while_each_thread(g, t) \ while ((t = next_thread(t)) != g) -- cgit v1.2.3 From 14fb1fd751fa09440634b909e6f5f53e2cba9ae0 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 3 Aug 2023 16:32:06 +0200 Subject: pgtable: improve pte_protnone() comment Especially the "For PROT_NONE VMAs, the PTEs are not marked _PAGE_PROTNONE" part is wrong: doing an mprotect(PROT_NONE) will end up marking all PTEs on x86_64 as _PAGE_PROTNONE, making pte_protnone() indicate "yes". So let's improve the comment, so it's easier to grasp which semantics pte_protnone() actually has. Link: https://lkml.kernel.org/r/20230803143208.383663-6-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Mel Gorman Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Linus Torvalds Cc: liubo Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Paolo Bonzini Cc: Peter Xu Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f34e0f2cb4d8..6064f454c8e3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1333,12 +1333,16 @@ static inline int pud_trans_unstable(pud_t *pud) #ifndef CONFIG_NUMA_BALANCING /* - * Technically a PTE can be PROTNONE even when not doing NUMA balancing but - * the only case the kernel cares is for NUMA balancing and is only ever set - * when the VMA is accessible. For PROT_NONE VMAs, the PTEs are not marked - * _PAGE_PROTNONE so by default, implement the helper as "always no". It - * is the responsibility of the caller to distinguish between PROT_NONE - * protections and NUMA hinting fault protections. + * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". It is + * perfectly valid to indicate "no" in that case, which is why our default + * implementation defaults to "always no". + * + * In an accessible VMA, however, pte_protnone() reliably indicates PROT_NONE + * page protection due to NUMA hinting. NUMA hinting faults only apply in + * accessible VMAs. + * + * So, to reliably identify PROT_NONE PTEs that require a NUMA hinting fault, + * looking at the VMA accessibility is sufficient. */ static inline int pte_protnone(pte_t pte) { -- cgit v1.2.3 From dd6fa0b61814492476463149c91110e529364e82 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:50 +0100 Subject: mm: call free_huge_page() directly Indirect calls are expensive, thanks to Spectre. Call free_huge_page() directly if the folio belongs to hugetlb. Link: https://lkml.kernel.org/r/20230816151201.3655946-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0a393bc02f25..5a1dfaffbd80 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -26,6 +26,8 @@ typedef struct { unsigned long pd; } hugepd_t; #define __hugepd(x) ((hugepd_t) { (x) }) #endif +void free_huge_page(struct page *page); + #ifdef CONFIG_HUGETLB_PAGE #include @@ -165,7 +167,6 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, bool *migratable_cleared); void folio_putback_active_hugetlb(struct folio *folio); void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason); -void free_huge_page(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -- cgit v1.2.3 From 454a00c40a21c59e99c526fe8cc57bd029cf8f0e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:51 +0100 Subject: mm: convert free_huge_page() to free_huge_folio() Pass a folio instead of the head page to save a few instructions. Update the documentation, at least in English. Link: https://lkml.kernel.org/r/20230816151201.3655946-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Sidhartha Kumar Cc: Yanteng Si Cc: David Hildenbrand Cc: Jens Axboe Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5a1dfaffbd80..5b2626063f4f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -26,7 +26,7 @@ typedef struct { unsigned long pd; } hugepd_t; #define __hugepd(x) ((hugepd_t) { (x) }) #endif -void free_huge_page(struct page *page); +void free_huge_folio(struct folio *folio); #ifdef CONFIG_HUGETLB_PAGE -- cgit v1.2.3 From 8dc4a8f1e038189cb575f89bcd23364698b88cc1 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:52 +0100 Subject: mm: convert free_transhuge_folio() to folio_undo_large_rmappable() Indirect calls are expensive, thanks to Spectre. Test for TRANSHUGE_PAGE_DTOR and destroy the folio appropriately. Move the free_compound_page() call into destroy_large_folio() to simplify later patches. Link: https://lkml.kernel.org/r/20230816151201.3655946-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 2 -- include/linux/mm.h | 2 -- 2 files changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e718dbe928ba..ceda26a20830 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -141,8 +141,6 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); void prep_transhuge_page(struct page *page); -void free_transhuge_page(struct page *page); - bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) diff --git a/include/linux/mm.h b/include/linux/mm.h index 55eb2789794e..0d14e2045658 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1253,9 +1253,7 @@ enum compound_dtor_id { #ifdef CONFIG_HUGETLB_PAGE HUGETLB_PAGE_DTOR, #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE TRANSHUGE_PAGE_DTOR, -#endif NR_COMPOUND_DTORS, }; -- cgit v1.2.3 From da6e7bf3a0315025e4199d599bd31763f0df3b4a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:53 +0100 Subject: mm: convert prep_transhuge_page() to folio_prep_large_rmappable() Match folio_undo_large_rmappable(), and move the casting from page to folio into the callers (which they were largely doing anyway). Link: https://lkml.kernel.org/r/20230816151201.3655946-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ceda26a20830..fa0350b0812a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -140,7 +140,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); -void prep_transhuge_page(struct page *page); +void folio_prep_large_rmappable(struct folio *folio); bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) @@ -280,7 +280,7 @@ static inline bool hugepage_vma_check(struct vm_area_struct *vma, return false; } -static inline void prep_transhuge_page(struct page *page) {} +static inline void folio_prep_large_rmappable(struct folio *folio) {} #define transparent_hugepage_flags 0UL -- cgit v1.2.3 From 0f2f43fabb95192c73b19586ef7536d7ac7c2f8c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:54 +0100 Subject: mm: remove free_compound_page() and the compound_page_dtors array The only remaining destructor is free_compound_page(). Inline it into destroy_large_folio() and remove the array it used to live in. Link: https://lkml.kernel.org/r/20230816151201.3655946-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0d14e2045658..0955b6b13fd0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1239,14 +1239,6 @@ void folio_copy(struct folio *dst, struct folio *src); unsigned long nr_free_buffer_pages(void); -/* - * Compound pages have a destructor function. Provide a - * prototype for that function and accessor functions. - * These are _only_ valid on the head of a compound page. - */ -typedef void compound_page_dtor(struct page *); - -/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */ enum compound_dtor_id { NULL_COMPOUND_DTOR, COMPOUND_PAGE_DTOR, @@ -1299,8 +1291,6 @@ static inline unsigned long thp_size(struct page *page) return PAGE_SIZE << thp_order(page); } -void free_compound_page(struct page *page); - #ifdef CONFIG_MMU /* * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when -- cgit v1.2.3 From 9c5ccf2db04b8d7c3df363fdd4856c2b79ab2c6a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:55 +0100 Subject: mm: remove HUGETLB_PAGE_DTOR We can use a bit in page[1].flags to indicate that this folio belongs to hugetlb instead of using a value in page[1].dtors. That lets folio_test_hugetlb() become an inline function like it should be. We can also get rid of NULL_COMPOUND_DTOR. Link: https://lkml.kernel.org/r/20230816151201.3655946-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ---- include/linux/page-flags.h | 43 +++++++++++++++++++++++++++++++++---------- 2 files changed, 33 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0955b6b13fd0..e241f5ee4dc4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1240,11 +1240,7 @@ void folio_copy(struct folio *dst, struct folio *src); unsigned long nr_free_buffer_pages(void); enum compound_dtor_id { - NULL_COMPOUND_DTOR, COMPOUND_PAGE_DTOR, -#ifdef CONFIG_HUGETLB_PAGE - HUGETLB_PAGE_DTOR, -#endif TRANSHUGE_PAGE_DTOR, NR_COMPOUND_DTORS, }; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9218028caf33..e9e7cc45352d 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -171,15 +171,6 @@ enum pageflags { /* Remapped by swiotlb-xen. */ PG_xen_remapped = PG_owner_priv_1, -#ifdef CONFIG_MEMORY_FAILURE - /* - * Compound pages. Stored in first tail page's flags. - * Indicates that at least one subpage is hwpoisoned in the - * THP. - */ - PG_has_hwpoisoned = PG_error, -#endif - /* non-lru isolated movable page */ PG_isolated = PG_reclaim, @@ -190,6 +181,15 @@ enum pageflags { /* For self-hosted memmap pages */ PG_vmemmap_self_hosted = PG_owner_priv_1, #endif + + /* + * Flags only valid for compound pages. Stored in first tail page's + * flags word. + */ + + /* At least one page in this folio has the hwpoison flag set */ + PG_has_hwpoisoned = PG_error, + PG_hugetlb = PG_active, }; #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -812,7 +812,23 @@ static inline void ClearPageCompound(struct page *page) #ifdef CONFIG_HUGETLB_PAGE int PageHuge(struct page *page); -bool folio_test_hugetlb(struct folio *folio); +SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND) +CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND) + +/** + * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs + * @folio: The folio to test. + * + * Context: Any context. Caller should have a reference on the folio to + * prevent it from being turned into a tail page. + * Return: True for hugetlbfs folios, false for anon folios or folios + * belonging to other filesystems. + */ +static inline bool folio_test_hugetlb(struct folio *folio) +{ + return folio_test_large(folio) && + test_bit(PG_hugetlb, folio_flags(folio, 1)); +} #else TESTPAGEFLAG_FALSE(Huge, hugetlb) #endif @@ -1056,6 +1072,13 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) #define PAGE_FLAGS_CHECK_AT_PREP \ ((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK) +/* + * Flags stored in the second page of a compound page. They may overlap + * the CHECK_AT_FREE flags above, so need to be cleared. + */ +#define PAGE_FLAGS_SECOND \ + (1UL << PG_has_hwpoisoned | 1UL << PG_hugetlb) + #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) /** -- cgit v1.2.3 From de53c05f2ae3d47d30db58e9c4e54e3bbc868377 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:56 +0100 Subject: mm: add large_rmappable page flag Stored in the first tail page's flags, this flag replaces the destructor. That removes the last of the destructors, so remove all references to folio_dtor and compound_dtor. Link: https://lkml.kernel.org/r/20230816151201.3655946-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/mm.h | 13 ------------- include/linux/mm_types.h | 2 -- include/linux/page-flags.h | 7 ++++++- 3 files changed, 6 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e241f5ee4dc4..63d573781973 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1239,19 +1239,6 @@ void folio_copy(struct folio *dst, struct folio *src); unsigned long nr_free_buffer_pages(void); -enum compound_dtor_id { - COMPOUND_PAGE_DTOR, - TRANSHUGE_PAGE_DTOR, - NR_COMPOUND_DTORS, -}; - -static inline void folio_set_compound_dtor(struct folio *folio, - enum compound_dtor_id compound_dtor) -{ - VM_BUG_ON_FOLIO(compound_dtor >= NR_COMPOUND_DTORS, folio); - folio->_folio_dtor = compound_dtor; -} - void destroy_large_folio(struct folio *folio); /* Returns the number of bytes in this potentially compound page. */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de5ac95572c8..1c5c2349c18e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -264,7 +264,6 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page) * @_refcount: Do not access this member directly. Use folio_ref_count() * to find how many references there are to this folio. * @memcg_data: Memory Control Group data. - * @_folio_dtor: Which destructor to use for this folio. * @_folio_order: Do not use directly, call folio_order(). * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). * @_nr_pages_mapped: Do not use directly, call folio_mapcount(). @@ -318,7 +317,6 @@ struct folio { unsigned long _flags_1; unsigned long _head_1; /* public: */ - unsigned char _folio_dtor; unsigned char _folio_order; atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e9e7cc45352d..85d54b6c9e0b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -190,6 +190,7 @@ enum pageflags { /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned = PG_error, PG_hugetlb = PG_active, + PG_large_rmappable = PG_workingset, /* anon or file-backed */ }; #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -806,6 +807,9 @@ static inline void ClearPageCompound(struct page *page) BUG_ON(!PageHead(page)); ClearPageHead(page); } +PAGEFLAG(LargeRmappable, large_rmappable, PF_SECOND) +#else +TESTPAGEFLAG_FALSE(LargeRmappable, large_rmappable) #endif #define PG_head_mask ((1UL << PG_head)) @@ -1077,7 +1081,8 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) * the CHECK_AT_FREE flags above, so need to be cleared. */ #define PAGE_FLAGS_SECOND \ - (1UL << PG_has_hwpoisoned | 1UL << PG_hugetlb) + (1UL << PG_has_hwpoisoned | 1UL << PG_hugetlb | \ + 1UL << PG_large_rmappable) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) -- cgit v1.2.3 From c704ae9797843402436190a6f094a035237fd012 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:57 +0100 Subject: mm: rearrange page flags Move PG_writeback into bottom byte so that it can use PG_waiters in a later patch. Move PG_head into bottom byte as well to match with where 'order' is moving next. PG_active and PG_workingset move into the second byte to make room for them. By putting PG_head in bit 6, we ensure that it is cleared by assigning the folio order to the bottom byte of the first tail page (since the order cannot be larger than 63). Link: https://lkml.kernel.org/r/20230816151201.3655946-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 85d54b6c9e0b..46fc05c648ff 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -99,13 +99,15 @@ */ enum pageflags { PG_locked, /* Page is locked. Don't touch. */ + PG_writeback, /* Page is under writeback */ PG_referenced, PG_uptodate, PG_dirty, PG_lru, + PG_head, /* Must be in bit 6 */ + PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ PG_active, PG_workingset, - PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ PG_error, PG_slab, PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ @@ -113,8 +115,6 @@ enum pageflags { PG_reserved, PG_private, /* If pagecache, has fs-private data */ PG_private_2, /* If pagecache, has fs aux data */ - PG_writeback, /* Page is under writeback */ - PG_head, /* A head page */ PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ PG_swapbacked, /* Page is backed by RAM/swap */ -- cgit v1.2.3 From ebc1baf5c9b46c2240c580a2fd992b2e48606dfa Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:58 +0100 Subject: mm: free up a word in the first tail page Store the folio order in the low byte of the flags word in the first tail page. This frees up the word that was being used to store the order and dtor bytes previously. Link: https://lkml.kernel.org/r/20230816151201.3655946-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 +++++----- include/linux/mm_types.h | 3 +-- include/linux/page-flags.h | 7 ++++--- 3 files changed, 10 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 63d573781973..939386e0aeda 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1000,7 +1000,7 @@ struct inode; * compound_order() can be called without holding a reference, which means * that niceties like page_folio() don't work. These callers should be * prepared to handle wild return values. For example, PG_head may be - * set before _folio_order is initialised, or this may be a tail page. + * set before the order is initialised, or this may be a tail page. * See compaction.c for some good examples. */ static inline unsigned int compound_order(struct page *page) @@ -1009,7 +1009,7 @@ static inline unsigned int compound_order(struct page *page) if (!test_bit(PG_head, &folio->flags)) return 0; - return folio->_folio_order; + return folio->_flags_1 & 0xff; } /** @@ -1025,7 +1025,7 @@ static inline unsigned int folio_order(struct folio *folio) { if (!folio_test_large(folio)) return 0; - return folio->_folio_order; + return folio->_flags_1 & 0xff; } #include @@ -1996,7 +1996,7 @@ static inline long folio_nr_pages(struct folio *folio) #ifdef CONFIG_64BIT return folio->_folio_nr_pages; #else - return 1L << folio->_folio_order; + return 1L << (folio->_flags_1 & 0xff); #endif } @@ -2014,7 +2014,7 @@ static inline unsigned long compound_nr(struct page *page) #ifdef CONFIG_64BIT return folio->_folio_nr_pages; #else - return 1L << folio->_folio_order; + return 1L << (folio->_flags_1 & 0xff); #endif } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1c5c2349c18e..40afb3bbc309 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -264,7 +264,6 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page) * @_refcount: Do not access this member directly. Use folio_ref_count() * to find how many references there are to this folio. * @memcg_data: Memory Control Group data. - * @_folio_order: Do not use directly, call folio_order(). * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). * @_nr_pages_mapped: Do not use directly, call folio_mapcount(). * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). @@ -316,8 +315,8 @@ struct folio { struct { unsigned long _flags_1; unsigned long _head_1; + unsigned long _folio_avail; /* public: */ - unsigned char _folio_order; atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; atomic_t _pincount; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 46fc05c648ff..638b0a96b4c5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -184,7 +184,8 @@ enum pageflags { /* * Flags only valid for compound pages. Stored in first tail page's - * flags word. + * flags word. Cannot use the first 8 flags or any flag marked as + * PF_ANY. */ /* At least one page in this folio has the hwpoison flag set */ @@ -1081,8 +1082,8 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) * the CHECK_AT_FREE flags above, so need to be cleared. */ #define PAGE_FLAGS_SECOND \ - (1UL << PG_has_hwpoisoned | 1UL << PG_hugetlb | \ - 1UL << PG_large_rmappable) + (0xffUL /* order */ | 1UL << PG_has_hwpoisoned | \ + 1UL << PG_hugetlb | 1UL << PG_large_rmappable) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) -- cgit v1.2.3 From 6199277baf73a4877b42991b97c40e173e530756 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:11:59 +0100 Subject: mm: remove folio_test_transhuge() This function is misleading; people think it means "Is this a THP", when all it actually does is check whether this is a large folio. Remove it; the one remaining user should have been checking to see whether the folio is PMD sized or not. Link: https://lkml.kernel.org/r/20230816151201.3655946-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 638b0a96b4c5..5c02720c53a5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -853,11 +853,6 @@ static inline int PageTransHuge(struct page *page) return PageHead(page); } -static inline bool folio_test_transhuge(struct folio *folio) -{ - return folio_test_head(folio); -} - /* * PageTransCompound returns true for both transparent huge pages * and hugetlbfs pages, so it should only be called when it's known -- cgit v1.2.3 From b10ff04dc0ec7cc7dbb0eac98c4202ec8d28c21b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 16 Aug 2023 16:12:00 +0100 Subject: mm: add tail private fields to struct folio Because THP_SWAP uses page->private for each page, we must not use the space which overlaps that field for anything which would conflict with that. We avoid the conflict on 32-bit systems by disallowing THP_SWAP on 32-bit. Link: https://lkml.kernel.org/r/20230816151201.3655946-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Cc: David Hildenbrand Cc: Jens Axboe Cc: Sidhartha Kumar Cc: Yanteng Si Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 40afb3bbc309..06e9315bfe7c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -322,8 +322,11 @@ struct folio { atomic_t _pincount; #ifdef CONFIG_64BIT unsigned int _folio_nr_pages; -#endif + /* 4 byte gap here */ /* private: the union with struct page is transitional */ + /* Fix THP_SWAP to not use tail->private */ + unsigned long _private_1; +#endif }; struct page __page_1; }; @@ -344,6 +347,9 @@ struct folio { /* public: */ struct list_head _deferred_list; /* private: the union with struct page is transitional */ + unsigned long _avail_2a; + /* Fix THP_SWAP to not use tail->private */ + unsigned long _private_2a; }; struct page __page_2; }; @@ -368,12 +374,18 @@ FOLIO_MATCH(memcg_data, memcg_data); offsetof(struct page, pg) + sizeof(struct page)) FOLIO_MATCH(flags, _flags_1); FOLIO_MATCH(compound_head, _head_1); +#ifdef CONFIG_64BIT +FOLIO_MATCH(private, _private_1); +#endif #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) == \ offsetof(struct page, pg) + 2 * sizeof(struct page)) FOLIO_MATCH(flags, _flags_2); FOLIO_MATCH(compound_head, _head_2); +FOLIO_MATCH(flags, _flags_2a); +FOLIO_MATCH(compound_head, _head_2a); +FOLIO_MATCH(private, _private_2a); #undef FOLIO_MATCH /** -- cgit v1.2.3 From 875386b98857822b77ac7f95bdf367b70af5b78c Mon Sep 17 00:00:00 2001 From: Manish Rangankar Date: Mon, 21 Aug 2023 18:30:37 +0530 Subject: scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe Introduce infrastructure in the driver to support the processing of unsolicited LS (Link Service) requests. This will involve the utilization of a new pass-up of unsolicited FC-NVMe request IOCB interface. Unsolicited requests will be submitted to the NVMe transport layer through nvme_fc_rcv_ls_req(). Any received LS responses, which are sent using xmt_ls_rsp(), will be forwarded to the firmware through the existing Pass-Through IOCB interface, responsible for sending FC-NVMe Link Service requests and responses. Signed-off-by: Manish Rangankar Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20230821130045.34850-2-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen --- include/linux/nvme-fc-driver.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h index 4109f1bd6128..f6ef8cf5d774 100644 --- a/include/linux/nvme-fc-driver.h +++ b/include/linux/nvme-fc-driver.h @@ -53,10 +53,10 @@ struct nvmefc_ls_req { void *rqstaddr; dma_addr_t rqstdma; - u32 rqstlen; + __le32 rqstlen; void *rspaddr; dma_addr_t rspdma; - u32 rsplen; + __le32 rsplen; u32 timeout; void *private; @@ -120,7 +120,7 @@ struct nvmefc_ls_req { struct nvmefc_ls_rsp { void *rspbuf; dma_addr_t rspdma; - u16 rsplen; + __le32 rsplen; void (*done)(struct nvmefc_ls_rsp *rsp); void *nvme_fc_private; /* LLDD is not to access !! */ -- cgit v1.2.3 From dd4904f3b924b489a0adbd88768fadaadab94156 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 13:42:15 -0700 Subject: interconnect: qcom: Annotate struct icc_onecell_data with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct icc_onecell_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Andy Gross Cc: Bjorn Andersson Cc: Konrad Dybcio Cc: Georgi Djakov Cc: linux-arm-msm@vger.kernel.org Cc: linux-pm@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Acked-by: Konrad Dybcio Link: https://lore.kernel.org/r/20230817204215.never.916-kees@kernel.org Signed-off-by: Georgi Djakov --- include/linux/interconnect-provider.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h index e6d8aca6886d..7ba183f221f1 100644 --- a/include/linux/interconnect-provider.h +++ b/include/linux/interconnect-provider.h @@ -33,7 +33,7 @@ struct icc_node_data { */ struct icc_onecell_data { unsigned int num_nodes; - struct icc_node *nodes[]; + struct icc_node *nodes[] __counted_by(num_nodes); }; struct icc_node *of_icc_xlate_onecell(struct of_phandle_args *spec, -- cgit v1.2.3 From 89ae89f53d201143560f1e9ed4bfa62eee34f88e Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 9 Aug 2023 10:34:15 +0200 Subject: bpf: Add multi uprobe link Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko Acked-by: Yafang Shao Signed-off-by: Jiri Olsa Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/trace_events.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index e66d04dbe56a..5b85cf18c350 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -752,6 +752,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr); int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { @@ -798,6 +799,11 @@ bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } +static inline int +bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} #endif enum { -- cgit v1.2.3 From 8387ce06d70bbbb97a0c168a52b68268ae0da075 Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Fri, 18 Aug 2023 06:29:12 -0400 Subject: x86/hyperv: Set Virtual Trust Level in VMBus init message SEV-SNP guests on Hyper-V can run at multiple Virtual Trust Levels (VTL). During boot, get the VTL at which we're running using the GET_VP_REGISTERs hypercall, and save the value for future use. Then during VMBus initialization, set the VTL with the saved value as required in the VMBus init message. Reviewed-by: Dexuan Cui Reviewed-by: Michael Kelley Signed-off-by: Tianyu Lan Signed-off-by: Wei Liu Link: https://lore.kernel.org/r/20230818102919.1318039-3-ltykernel@gmail.com --- include/linux/hyperv.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index bfbc37ce223b..1f2bfec4abde 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact { u64 interrupt_page; struct { u8 msg_sint; - u8 padding1[3]; - u32 padding2; + u8 msg_vtl; + u8 reserved[6]; }; }; u64 monitor_page1; -- cgit v1.2.3 From bb9b0e46b84c19d3dd7d453a2da71a0fdc172b31 Mon Sep 17 00:00:00 2001 From: Saurabh Sengar Date: Tue, 15 Aug 2023 21:34:38 -0700 Subject: hv: hyperv.h: Replace one-element array with flexible-array member One-element and zero-length arrays are deprecated. Replace one-element array in struct vmtransfer_page_packet_header with flexible-array member. This change fixes below warning: [ 2.593788] ================================================================================ [ 2.593908] UBSAN: array-index-out-of-bounds in drivers/net/hyperv/netvsc.c:1445:41 [ 2.593989] index 1 is out of range for type 'vmtransfer_page_range [1]' [ 2.594049] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.5.0-rc4-next-20230803+ #1 [ 2.594114] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 04/20/2023 [ 2.594121] Call Trace: [ 2.594126] [ 2.594133] dump_stack_lvl+0x4c/0x70 [ 2.594154] dump_stack+0x14/0x20 [ 2.594162] __ubsan_handle_out_of_bounds+0xa6/0xf0 [ 2.594224] netvsc_poll+0xc01/0xc90 [hv_netvsc] [ 2.594258] __napi_poll+0x30/0x1e0 [ 2.594320] net_rx_action+0x194/0x2f0 [ 2.594333] __do_softirq+0xde/0x31e [ 2.594345] __irq_exit_rcu+0x6b/0x90 [ 2.594357] irq_exit_rcu+0x12/0x20 [ 2.594366] sysvec_hyperv_callback+0x84/0x90 [ 2.594376] [ 2.594379] [ 2.594383] asm_sysvec_hyperv_callback+0x1f/0x30 [ 2.594394] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 2.594452] Code: 0b 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 05 35 3f 00 fb f4 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 [ 2.594459] RSP: 0018:ffffb841c00d3e88 EFLAGS: 00000256 [ 2.594469] RAX: ffff9d18c326f4a0 RBX: ffff9d18c031df40 RCX: 4000000000000000 [ 2.594475] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 00000000000268dc [ 2.594481] RBP: ffffb841c00d3e90 R08: 00000066a171109b R09: 00000000d33d2600 [ 2.594486] R10: 000000009a41bf00 R11: 0000000000000000 R12: 0000000000000001 [ 2.594491] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2.594501] ? ct_kernel_exit.constprop.0+0x7d/0x90 [ 2.594513] ? default_idle+0xd/0x20 [ 2.594523] arch_cpu_idle+0xd/0x20 [ 2.594532] default_idle_call+0x30/0xe0 [ 2.594542] do_idle+0x200/0x240 [ 2.594553] ? complete+0x71/0x80 [ 2.594613] cpu_startup_entry+0x24/0x30 [ 2.594624] start_secondary+0x12d/0x160 [ 2.594634] secondary_startup_64_no_verify+0x17e/0x18b [ 2.594649] [ 2.594656] ================================================================================ With this change the structure size is reduced by 8 bytes, below is the pahole output. struct vmtransfer_page_packet_header { struct vmpacket_descriptor d; /* 0 16 */ u16 xfer_pageset_id; /* 16 2 */ u8 sender_owns_set; /* 18 1 */ u8 reserved; /* 19 1 */ u32 range_cnt; /* 20 4 */ struct vmtransfer_page_range ranges[]; /* 24 0 */ /* size: 24, cachelines: 1, members: 6 */ /* last cacheline: 24 bytes */ }; The validation code in the netvsc driver is affected by changing the struct size, but the effects have been examined and have been determined to be appropriate. Signed-off-by: Saurabh Sengar Reviewed-by: Michael Kelley Signed-off-by: Wei Liu Link: https://lore.kernel.org/r/1692160478-18469-1-git-send-email-ssengar@linux.microsoft.com --- include/linux/hyperv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 1f2bfec4abde..a922d4526de4 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -348,7 +348,7 @@ struct vmtransfer_page_packet_header { u8 sender_owns_set; u8 reserved; u32 range_cnt; - struct vmtransfer_page_range ranges[1]; + struct vmtransfer_page_range ranges[]; } __packed; struct vmgpadl_packet_header { -- cgit v1.2.3 From 08b8a0440eeec83f8330349f829908858fd52d31 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 14 Mar 2022 15:47:18 -0400 Subject: libceph: add spinlock around osd->o_requests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In a later patch, we're going to need to search for a request in the rbtree, but taking the o_mutex is inconvenient as we already hold the con mutex at the point where we need it. Add a new spinlock that we take when inserting and erasing entries from the o_requests tree. Search of the rbtree can be done with either the mutex or the spinlock, but insertion and removal requires both. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/osd_client.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index fb6be72104df..92addef18738 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -29,7 +29,12 @@ typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *); #define CEPH_HOMELESS_OSD -1 -/* a given osd we're communicating with */ +/* + * A given osd we're communicating with. + * + * Note that the o_requests tree can be searched while holding the "lock" mutex + * or the "o_requests_lock" spinlock. Insertion or removal requires both! + */ struct ceph_osd { refcount_t o_ref; struct ceph_osd_client *o_osdc; @@ -37,6 +42,7 @@ struct ceph_osd { int o_incarnation; struct rb_node o_node; struct ceph_connection o_con; + spinlock_t o_requests_lock; struct rb_root o_requests; struct rb_root o_linger_requests; struct rb_root o_backoff_mappings; -- cgit v1.2.3 From a679e50f728648f7b2f3b349e082448abd388038 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 16 Mar 2022 15:23:00 -0400 Subject: libceph: define struct ceph_sparse_extent and add some helpers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When the OSD sends back a sparse read reply, it contains an array of these structures. Define the structure and add a couple of helpers for dealing with them. Also add a place in struct ceph_osd_req_op to store the extent buffer, and code to free it if it's populated when the req is torn down. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/osd_client.h | 43 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 92addef18738..05da1e755b7b 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -29,6 +29,17 @@ typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *); #define CEPH_HOMELESS_OSD -1 +/* + * A single extent in a SPARSE_READ reply. + * + * Note that these come from the OSD as little-endian values. On BE arches, + * we convert them in-place after receipt. + */ +struct ceph_sparse_extent { + u64 off; + u64 len; +} __packed; + /* * A given osd we're communicating with. * @@ -104,6 +115,8 @@ struct ceph_osd_req_op { u64 offset, length; u64 truncate_size; u32 truncate_seq; + int sparse_ext_cnt; + struct ceph_sparse_extent *sparse_ext; struct ceph_osd_data osd_data; } extent; struct { @@ -510,6 +523,20 @@ extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *, u32 truncate_seq, u64 truncate_size, bool use_mempool); +int __ceph_alloc_sparse_ext_map(struct ceph_osd_req_op *op, int cnt); + +/* + * How big an extent array should we preallocate for a sparse read? This is + * just a starting value. If we get more than this back from the OSD, the + * receiver will reallocate. + */ +#define CEPH_SPARSE_EXT_ARRAY_INITIAL 16 + +static inline int ceph_alloc_sparse_ext_map(struct ceph_osd_req_op *op) +{ + return __ceph_alloc_sparse_ext_map(op, CEPH_SPARSE_EXT_ARRAY_INITIAL); +} + extern void ceph_osdc_get_request(struct ceph_osd_request *req); extern void ceph_osdc_put_request(struct ceph_osd_request *req); @@ -564,5 +591,19 @@ int ceph_osdc_list_watchers(struct ceph_osd_client *osdc, struct ceph_object_locator *oloc, struct ceph_watch_item **watchers, u32 *num_watchers); -#endif +/* Find offset into the buffer of the end of the extent map */ +static inline u64 ceph_sparse_ext_map_end(struct ceph_osd_req_op *op) +{ + struct ceph_sparse_extent *ext; + + /* No extents? No data */ + if (op->extent.sparse_ext_cnt == 0) + return 0; + + ext = &op->extent.sparse_ext[op->extent.sparse_ext_cnt - 1]; + + return ext->off + ext->len - op->extent.offset; +} + +#endif -- cgit v1.2.3 From ec3bc567eac12c557a2b99bd0b34b5dff12cab23 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 25 Jan 2022 08:26:31 -0500 Subject: libceph: new sparse_read op, support sparse reads on msgr2 crc codepath MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add support for a new sparse_read ceph_connection operation. The idea is that the client driver can define this operation use it to do special handling for incoming reads. The alloc_msg routine will look at the request and determine whether the reply is expected to be sparse. If it is, then we'll dispatch to a different set of state machine states that will repeatedly call the driver's sparse_read op to get length and placement info for reading the extent map, and the extents themselves. This necessitates adding some new field to some other structs: - The msg gets a new bool to track whether it's a sparse_read request. - A new field is added to the cursor to track the amount remaining in the current extent. This is used to cap the read from the socket into the msg_data - Handing a revoke with all of this is particularly difficult, so I've added a new data_len_remain field to the v2 connection info, and then use that to skip that much on a revoke. We may want to expand the use of that to the normal read path as well, just for consistency's sake. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/messenger.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 99c1726be6ee..8a6938fa324e 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -17,6 +17,7 @@ struct ceph_msg; struct ceph_connection; +struct ceph_msg_data_cursor; /* * Ceph defines these callbacks for handling connection events. @@ -70,6 +71,30 @@ struct ceph_connection_operations { int used_proto, int result, const int *allowed_protos, int proto_cnt, const int *allowed_modes, int mode_cnt); + + /** + * sparse_read: read sparse data + * @con: connection we're reading from + * @cursor: data cursor for reading extents + * @buf: optional buffer to read into + * + * This should be called more than once, each time setting up to + * receive an extent into the current cursor position, and zeroing + * the holes between them. + * + * Returns amount of data to be read (in bytes), 0 if reading is + * complete, or -errno if there was an error. + * + * If @buf is set on a >0 return, then the data should be read into + * the provided buffer. Otherwise, it should be read into the cursor. + * + * The sparse read operation is expected to initialize the cursor + * with a length covering up to the end of the last extent. + */ + int (*sparse_read)(struct ceph_connection *con, + struct ceph_msg_data_cursor *cursor, + char **buf); + }; /* use format string %s%lld */ @@ -207,6 +232,7 @@ struct ceph_msg_data_cursor { struct ceph_msg_data *data; /* current data item */ size_t resid; /* bytes not yet consumed */ + int sr_resid; /* residual sparse_read len */ bool need_crc; /* crc update needed */ union { #ifdef CONFIG_BLOCK @@ -251,6 +277,7 @@ struct ceph_msg { struct kref kref; bool more_to_follow; bool needs_out_seq; + bool sparse_read; int front_alloc_len; struct ceph_msgpool *pool; @@ -395,6 +422,7 @@ struct ceph_connection_v2_info { void *conn_bufs[16]; int conn_buf_cnt; + int data_len_remain; struct kvec in_sign_kvecs[8]; struct kvec out_sign_kvecs[8]; -- cgit v1.2.3 From d396f89db39a2f259e2125ca43b4c31bb65afcad Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 24 Mar 2022 13:33:06 -0400 Subject: libceph: add sparse read support to msgr1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add 2 new fields to ceph_connection_v1_info to track the necessary info in sparse reads. Skip initializing the cursor for a sparse read. Break out read_partial_message_section into a wrapper around a new read_partial_message_chunk function that doesn't zero out the crc first. Add new helper functions to drive receiving into the destinations provided by the sparse_read state machine. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/messenger.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 8a6938fa324e..9fd7255172ad 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -336,6 +336,10 @@ struct ceph_connection_v1_info { int in_base_pos; /* bytes read */ + /* sparse reads */ + struct kvec in_sr_kvec; /* current location to receive into */ + u64 in_sr_len; /* amount of data in this extent */ + /* message in temps */ u8 in_tag; /* protocol control byte */ struct ceph_msg_header in_hdr; -- cgit v1.2.3 From f628d799972799023d32c2542bb2639eb8c4f84e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 11 Feb 2022 11:38:02 -0500 Subject: libceph: add sparse read support to OSD client MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Have get_reply check for the presence of sparse read ops in the request and set the sparse_read boolean in the msg. That will queue the messenger layer to use the sparse read codepath instead of the normal data receive. Add a new sparse_read operation for the OSD client, driven by its own state machine. The messenger will repeatedly call the sparse_read operation, and it will pass back the necessary info to set up to read the next extent of data, while zero-filling the sparse regions. The state machine will stop at the end of the last extent, and will attach the extent map buffer to the ceph_osd_req_op so that the caller can use it. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/osd_client.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 05da1e755b7b..bfa4813590da 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -40,6 +40,36 @@ struct ceph_sparse_extent { u64 len; } __packed; +/* Sparse read state machine state values */ +enum ceph_sparse_read_state { + CEPH_SPARSE_READ_HDR = 0, + CEPH_SPARSE_READ_EXTENTS, + CEPH_SPARSE_READ_DATA_LEN, + CEPH_SPARSE_READ_DATA, +}; + +/* + * A SPARSE_READ reply is a 32-bit count of extents, followed by an array of + * 64-bit offset/length pairs, and then all of the actual file data + * concatenated after it (sans holes). + * + * Unfortunately, we don't know how long the extent array is until we've + * started reading the data section of the reply. The caller should send down + * a destination buffer for the array, but we'll alloc one if it's too small + * or if the caller doesn't. + */ +struct ceph_sparse_read { + enum ceph_sparse_read_state sr_state; /* state machine state */ + u64 sr_req_off; /* orig request offset */ + u64 sr_req_len; /* orig request length */ + u64 sr_pos; /* current pos in buffer */ + int sr_index; /* current extent index */ + __le32 sr_datalen; /* length of actual data */ + u32 sr_count; /* extent count in reply */ + int sr_ext_len; /* length of extent array */ + struct ceph_sparse_extent *sr_extent; /* extent array */ +}; + /* * A given osd we're communicating with. * @@ -48,6 +78,7 @@ struct ceph_sparse_extent { */ struct ceph_osd { refcount_t o_ref; + int o_sparse_op_idx; struct ceph_osd_client *o_osdc; int o_osd; int o_incarnation; @@ -63,6 +94,7 @@ struct ceph_osd { unsigned long lru_ttl; struct list_head o_keepalive_item; struct mutex lock; + struct ceph_sparse_read o_sparse_read; }; #define CEPH_OSD_SLAB_OPS 2 -- cgit v1.2.3 From dee0c5f834605ce9b384ee8b9c7032ffd8db4eca Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 1 Jul 2022 06:30:12 -0400 Subject: libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor. Instead of requiring a list of pages or bvecs, we can just use an iov_iter directly, and avoid extra allocations. We assume that the pages represented by the iter are pinned such that they shouldn't incur page faults, which is the case for the iov_iters created by netfs. While working on this, Al Viro informed me that he was going to change iov_iter_get_pages to auto-advance the iterator as that pattern is more or less required for ITER_PIPE anyway. We emulate that here for now by advancing in the _next op and tracking that amount in the "lastlen" field. In the event that _next is called twice without an intervening _advance, we revert the iov_iter by the remaining lastlen before calling iov_iter_get_pages. Cc: Al Viro Cc: David Howells Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/messenger.h | 8 ++++++++ include/linux/ceph/osd_client.h | 4 ++++ 2 files changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 9fd7255172ad..2eaaabbe98cb 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -123,6 +123,7 @@ enum ceph_msg_data_type { CEPH_MSG_DATA_BIO, /* data source/destination is a bio list */ #endif /* CONFIG_BLOCK */ CEPH_MSG_DATA_BVECS, /* data source/destination is a bio_vec array */ + CEPH_MSG_DATA_ITER, /* data source/destination is an iov_iter */ }; #ifdef CONFIG_BLOCK @@ -224,6 +225,7 @@ struct ceph_msg_data { bool own_pages; }; struct ceph_pagelist *pagelist; + struct iov_iter iter; }; }; @@ -248,6 +250,10 @@ struct ceph_msg_data_cursor { struct page *page; /* page from list */ size_t offset; /* bytes from list */ }; + struct { + struct iov_iter iov_iter; + unsigned int lastlen; + }; }; }; @@ -605,6 +611,8 @@ void ceph_msg_data_add_bio(struct ceph_msg *msg, struct ceph_bio_iter *bio_pos, #endif /* CONFIG_BLOCK */ void ceph_msg_data_add_bvecs(struct ceph_msg *msg, struct ceph_bvec_iter *bvec_pos); +void ceph_msg_data_add_iter(struct ceph_msg *msg, + struct iov_iter *iter); struct ceph_msg *ceph_msg_new2(int type, int front_len, int max_data_items, gfp_t flags, bool can_fail); diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index bfa4813590da..8f5d2b5bbba2 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -108,6 +108,7 @@ enum ceph_osd_data_type { CEPH_OSD_DATA_TYPE_BIO, #endif /* CONFIG_BLOCK */ CEPH_OSD_DATA_TYPE_BVECS, + CEPH_OSD_DATA_TYPE_ITER, }; struct ceph_osd_data { @@ -131,6 +132,7 @@ struct ceph_osd_data { struct ceph_bvec_iter bvec_pos; u32 num_bvecs; }; + struct iov_iter iter; }; }; @@ -501,6 +503,8 @@ void osd_req_op_extent_osd_data_bvecs(struct ceph_osd_request *osd_req, void osd_req_op_extent_osd_data_bvec_pos(struct ceph_osd_request *osd_req, unsigned int which, struct ceph_bvec_iter *bvec_pos); +void osd_req_op_extent_osd_iter(struct ceph_osd_request *osd_req, + unsigned int which, struct iov_iter *iter); extern void osd_req_op_cls_request_data_pagelist(struct ceph_osd_request *, unsigned int which, -- cgit v1.2.3 From 2d332d5bc424404911540006a8bb450fbb96b178 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 27 Jul 2020 10:16:09 -0400 Subject: ceph: fscrypt_auth handling for ceph MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Most fscrypt-enabled filesystems store the crypto context in an xattr, but that's problematic for ceph as xatts are governed by the XATTR cap, but we really want the crypto context as part of the AUTH cap. Because of this, the MDS has added two new inode metadata fields: fscrypt_auth and fscrypt_file. The former is used to hold the crypto context, and the latter is used to track the real file size. Parse new fscrypt_auth and fscrypt_file fields in inode traces. For now, we don't use fscrypt_file, but fscrypt_auth is used to hold the fscrypt context. Allow the client to use a setattr request for setting the fscrypt_auth field. Since this is not a standard setattr request from the VFS, we add a new field to __ceph_setattr that carries ceph-specific inode attrs. Have the set_context op do a setattr that sets the fscrypt_auth value, and get_context just return the contents of that field (since it should always be available). Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 49586ff26152..45f8ce61e103 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -359,14 +359,19 @@ enum { extern const char *ceph_mds_op_name(int op); - -#define CEPH_SETATTR_MODE 1 -#define CEPH_SETATTR_UID 2 -#define CEPH_SETATTR_GID 4 -#define CEPH_SETATTR_MTIME 8 -#define CEPH_SETATTR_ATIME 16 -#define CEPH_SETATTR_SIZE 32 -#define CEPH_SETATTR_CTIME 64 +#define CEPH_SETATTR_MODE (1 << 0) +#define CEPH_SETATTR_UID (1 << 1) +#define CEPH_SETATTR_GID (1 << 2) +#define CEPH_SETATTR_MTIME (1 << 3) +#define CEPH_SETATTR_ATIME (1 << 4) +#define CEPH_SETATTR_SIZE (1 << 5) +#define CEPH_SETATTR_CTIME (1 << 6) +#define CEPH_SETATTR_MTIME_NOW (1 << 7) +#define CEPH_SETATTR_ATIME_NOW (1 << 8) +#define CEPH_SETATTR_BTIME (1 << 9) +#define CEPH_SETATTR_KILL_SGUID (1 << 10) +#define CEPH_SETATTR_FSCRYPT_AUTH (1 << 11) +#define CEPH_SETATTR_FSCRYPT_FILE (1 << 12) /* * Ceph setxattr request flags. -- cgit v1.2.3 From e38abdab441c0dcfb8f28dd0daf1dfa44b494492 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sun, 2 Jul 2023 18:40:39 +0200 Subject: efi/runtime-wrappers: Remove duplicated macro for service returning void __efi_call_virt() exists as an alternative for efi_call_virt() for the sole reason that ResetSystem() returns void, and so we cannot use a call to it in the RHS of an assignment. Given that there is only a single user, let's drop the macro, and expand it into the caller. That way, the remaining macro can be tightened somewhat in terms of type safety too. Note that the use of typeof() on the runtime service invocation does not result in an actual call being made, but it does require a few pointer types to be fixed up and converted into the proper function pointer prototypes. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 23 ++++------------------- 1 file changed, 4 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 9d09dba116ce..28ac4e4a725d 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1170,8 +1170,7 @@ static inline void efi_check_for_embedded_firmwares(void) { } #define arch_efi_call_virt(p, f, args...) ((p)->f(args)) /* - * Arch code can implement the following three template macros, avoiding - * reptition for the void/non-void return cases of {__,}efi_call_virt(): + * Arch code must implement the following three routines: * * * arch_efi_call_virt_setup() * @@ -1180,9 +1179,8 @@ static inline void efi_check_for_embedded_firmwares(void) { } * * * arch_efi_call_virt() * - * Performs the call. The last expression in the macro must be the call - * itself, allowing the logic to be shared by the void and non-void - * cases. + * Performs the call. This routine takes a variable number of arguments so + * it must be implemented as a variadic preprocessor macro. * * * arch_efi_call_virt_teardown() * @@ -1191,7 +1189,7 @@ static inline void efi_check_for_embedded_firmwares(void) { } #define efi_call_virt_pointer(p, f, args...) \ ({ \ - efi_status_t __s; \ + typeof((p)->f(args)) __s; \ unsigned long __flags; \ \ arch_efi_call_virt_setup(); \ @@ -1205,19 +1203,6 @@ static inline void efi_check_for_embedded_firmwares(void) { } __s; \ }) -#define __efi_call_virt_pointer(p, f, args...) \ -({ \ - unsigned long __flags; \ - \ - arch_efi_call_virt_setup(); \ - \ - __flags = efi_call_virt_save_flags(); \ - arch_efi_call_virt(p, f, args); \ - efi_call_virt_check_flags(__flags, __stringify(f)); \ - \ - arch_efi_call_virt_teardown(); \ -}) - #define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE struct linux_efi_random_seed { -- cgit v1.2.3 From 3c17ae4161093fa2263b70064e34a033dc16fef5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 8 Aug 2023 12:47:51 +0200 Subject: efi/runtime-wrappers: Don't duplicate setup/teardown code Avoid duplicating the EFI arch setup and teardown routine calls numerous times in efi_call_rts(). Instead, expand the efi_call_virt_pointer() macro into efi_call_rts(), taking the pre and post parts out of the switch. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 28ac4e4a725d..e0b346b014df 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1129,7 +1129,7 @@ extern bool efi_runtime_disabled(void); static inline bool efi_runtime_disabled(void) { return true; } #endif -extern void efi_call_virt_check_flags(unsigned long flags, const char *call); +extern void efi_call_virt_check_flags(unsigned long flags, const void *caller); extern unsigned long efi_call_virt_save_flags(void); enum efi_secureboot_mode { @@ -1196,7 +1196,7 @@ static inline void efi_check_for_embedded_firmwares(void) { } \ __flags = efi_call_virt_save_flags(); \ __s = arch_efi_call_virt(p, f, args); \ - efi_call_virt_check_flags(__flags, __stringify(f)); \ + efi_call_virt_check_flags(__flags, NULL); \ \ arch_efi_call_virt_teardown(); \ \ @@ -1257,6 +1257,7 @@ union efi_rts_args; * @status: Status of executing EFI Runtime Service * @efi_rts_id: EFI Runtime Service function identifier * @efi_rts_comp: Struct used for handling completions + * @caller: The caller of the runtime service */ struct efi_runtime_work { union efi_rts_args *args; @@ -1264,6 +1265,7 @@ struct efi_runtime_work { struct work_struct work; enum efi_rts_ids efi_rts_id; struct completion efi_rts_comp; + const void *caller; }; extern struct efi_runtime_work efi_rts_work; -- cgit v1.2.3 From 5894cf571e14fb393a4d0a82538de032127b9d8b Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sun, 2 Jul 2023 11:57:34 +0200 Subject: acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers Instead of bypassing the kernel's adaptation layer for performing EFI runtime calls, wire up ACPI PRM handling into it. This means these calls can no longer occur concurrently with EFI runtime calls, and will be made from the EFI runtime workqueue. It also means any page faults occurring during PRM handling will be identified correctly as originating in firmware code. Acked-by: Rafael J. Wysocki Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index e0b346b014df..5a1e39df8b26 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1228,6 +1228,10 @@ extern int efi_tpm_final_log_size; extern unsigned long rci2_table_phys; +efi_status_t +efi_call_acpi_prm_handler(efi_status_t (__efiapi *handler_addr)(u64, void *), + u64 param_buffer_addr, void *context); + /* * efi_runtime_service() function identifiers. * "NONE" is used by efi_recover_from_page_fault() to check if the page @@ -1247,6 +1251,7 @@ enum efi_rts_ids { EFI_RESET_SYSTEM, EFI_UPDATE_CAPSULE, EFI_QUERY_CAPSULE_CAPS, + EFI_ACPI_PRM_HANDLER, }; union efi_rts_args; -- cgit v1.2.3 From 39f7c41c908bc82947ce47cdd021520ff486c53f Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Fri, 7 Jul 2023 18:21:48 +0100 Subject: tracing/filters: Enable filtering a cpumask field by another cpumask The recently introduced ipi_send_cpumask trace event contains a cpumask field, but it currently cannot be used in filter expressions. Make event filtering aware of cpumask fields, and allow these to be filtered by a user-provided cpumask. The user-provided cpumask is to be given in cpulist format and wrapped as: "CPUS{$cpulist}". The use of curly braces instead of parentheses is to prevent predicate_parse() from parsing the contents of CPUS{...} as a full-fledged predicate subexpression. This enables e.g.: $ trace-cmd record -e 'ipi_send_cpumask' -f 'cpumask & CPUS{2,4,6,8-32}' Link: https://lkml.kernel.org/r/20230707172155.70873-3-vschneid@redhat.com Cc: Masami Hiramatsu Cc: Jonathan Corbet Cc: Juri Lelli Cc: Daniel Bristot de Oliveira Cc: Marcelo Tosatti Cc: Leonardo Bras Cc: Frederic Weisbecker Signed-off-by: Valentin Schneider Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index c17623c78029..1600aeb8e1a3 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -808,6 +808,7 @@ enum { FILTER_RDYN_STRING, FILTER_PTR_STRING, FILTER_TRACE_FN, + FILTER_CPUMASK, FILTER_COMM, FILTER_CPU, FILTER_STACKTRACE, -- cgit v1.2.3 From c8f05f2f41d7e4950d6767e6ebe9fba3dcaf45ff Mon Sep 17 00:00:00 2001 From: Zhang Zekun Date: Fri, 4 Aug 2023 09:36:36 +0800 Subject: ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon() The definition of ftrace_enable_daemon() and ftrace_disable_daemon() has been removed since commit cb7be3b2fc2c ("ftrace: remove daemon"), remain the declarations in the header files, so remove it. Link: https://lore.kernel.org/linux-trace-kernel/20230804013636.115940-1-zhangzekun11@huawei.com Cc: Cc: Signed-off-by: Zhang Zekun Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index aad9cf8876b5..e8921871ef9a 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -862,13 +862,8 @@ extern int skip_trace(unsigned long ip); extern void ftrace_module_init(struct module *mod); extern void ftrace_module_enable(struct module *mod); extern void ftrace_release_mod(struct module *mod); - -extern void ftrace_disable_daemon(void); -extern void ftrace_enable_daemon(void); #else /* CONFIG_DYNAMIC_FTRACE */ static inline int skip_trace(unsigned long ip) { return 0; } -static inline void ftrace_disable_daemon(void) { } -static inline void ftrace_enable_daemon(void) { } static inline void ftrace_module_init(struct module *mod) { } static inline void ftrace_module_enable(struct module *mod) { } static inline void ftrace_release_mod(struct module *mod) { } -- cgit v1.2.3 From f23643306430f86e2f413ee2b986e0773e79da31 Mon Sep 17 00:00:00 2001 From: RD Babiera Date: Mon, 14 Aug 2023 18:05:59 +0000 Subject: usb: typec: bus: verify partner exists in typec_altmode_attention Some usb hubs will negotiate DisplayPort Alt mode with the device but will then negotiate a data role swap after entering the alt mode. The data role swap causes the device to unregister all alt modes, however the usb hub will still send Attention messages even after failing to reregister the Alt Mode. type_altmode_attention currently does not verify whether or not a device's altmode partner exists, which results in a NULL pointer error when dereferencing the typec_altmode and typec_altmode_ops belonging to the altmode partner. Verify the presence of a device's altmode partner before sending the Attention message to the Alt Mode driver. Fixes: 8a37d87d72f0 ("usb: typec: Bus type for alternate modes") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera Reviewed-by: Heikki Krogerus Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20230814180559.923475-1-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_altmode.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/typec_altmode.h b/include/linux/usb/typec_altmode.h index 350d49012659..28aeef8f9e7b 100644 --- a/include/linux/usb/typec_altmode.h +++ b/include/linux/usb/typec_altmode.h @@ -67,7 +67,7 @@ struct typec_altmode_ops { int typec_altmode_enter(struct typec_altmode *altmode, u32 *vdo); int typec_altmode_exit(struct typec_altmode *altmode); -void typec_altmode_attention(struct typec_altmode *altmode, u32 vdo); +int typec_altmode_attention(struct typec_altmode *altmode, u32 vdo); int typec_altmode_vdm(struct typec_altmode *altmode, const u32 header, const u32 *vdo, int count); int typec_altmode_notify(struct typec_altmode *altmode, unsigned long conf, -- cgit v1.2.3 From 23e60c8daf5ec2ab1b731310761b668745fcf6ed Mon Sep 17 00:00:00 2001 From: Marco Felsch Date: Wed, 16 Aug 2023 14:25:02 -0300 Subject: usb: typec: tcpci: clear the fault status bit According the "USB Type-C Port Controller Interface Specification v2.0" the TCPC sets the fault status register bit-7 (AllRegistersResetToDefault) once the registers have been reset to their default values. This triggers an alert(-irq) on PTN5110 devices albeit we do mask the fault-irq, which may cause a kernel hang. Fix this generically by writing a one to the corresponding bit-7. Cc: stable@vger.kernel.org Fixes: 74e656d6b055 ("staging: typec: Type-C Port Controller Interface driver (tcpci)") Reported-by: "Angus Ainslie (Purism)" Closes: https://lore.kernel.org/all/20190508002749.14816-2-angus@akkea.ca/ Reported-by: Christian Bach Closes: https://lore.kernel.org/regressions/ZR0P278MB07737E5F1D48632897D51AC3EB329@ZR0P278MB0773.CHEP278.PROD.OUTLOOK.COM/t/ Signed-off-by: Marco Felsch Signed-off-by: Fabio Estevam Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20230816172502.1155079-1-festevam@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/tcpci.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/tcpci.h b/include/linux/usb/tcpci.h index 85e95a3251d3..83376473ac76 100644 --- a/include/linux/usb/tcpci.h +++ b/include/linux/usb/tcpci.h @@ -103,6 +103,7 @@ #define TCPC_POWER_STATUS_SINKING_VBUS BIT(0) #define TCPC_FAULT_STATUS 0x1f +#define TCPC_FAULT_STATUS_ALL_REG_RST_TO_DEFAULT BIT(7) #define TCPC_ALERT_EXTENDED 0x21 -- cgit v1.2.3 From d4d13ff3ac78b41bfaefd5ad1efd72cfbf43f288 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Wed, 16 Aug 2023 12:55:21 +0200 Subject: tty: tty_buffer: switch data type to u8 There is no reason to have tty_buffer::data typed as unsigned long. Switch to u8, but preserve the ulong alignment using __aligned. This allows for the cast removal from char_buf_ptr(). And for use of struct_size() in the allocation in tty_buffer_alloc() -- in the next patch. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230816105530.3335-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index e45cba81d0e9..31125e3be3c5 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -19,12 +19,12 @@ struct tty_buffer { unsigned int read; bool flags; /* Data points here */ - unsigned long data[]; + u8 data[] __aligned(sizeof(unsigned long)); }; static inline u8 *char_buf_ptr(struct tty_buffer *b, unsigned int ofs) { - return ((u8 *)b->data) + ofs; + return b->data + ofs; } static inline u8 *flag_buf_ptr(struct tty_buffer *b, unsigned int ofs) -- cgit v1.2.3 From c26405fd289b74225b111b8f23e9e7eae1a02513 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Wed, 16 Aug 2023 12:55:23 +0200 Subject: tty: tty_buffer: unify tty_insert_flip_string_{fixed_flag,flags}() They both do the same except for flags. One mem-copies the flags from the caller, the other mem-sets to one flag given by the caller. This can be unified with a simple if in the unified function. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230816105530.3335-4-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_flip.h | 46 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index d33aed2172c7..8b781f709605 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -10,14 +10,52 @@ struct tty_ldisc; int tty_buffer_set_limit(struct tty_port *port, int limit); unsigned int tty_buffer_space_avail(struct tty_port *port); int tty_buffer_request_room(struct tty_port *port, size_t size); -int tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, - const u8 *flags, size_t size); -int tty_insert_flip_string_fixed_flag(struct tty_port *port, const u8 *chars, - u8 flag, size_t size); +int __tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, + const u8 *flags, bool mutable_flags, + size_t size); int tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); +/** + * tty_insert_flip_string_fixed_flag - add characters to the tty buffer + * @port: tty port + * @chars: characters + * @flag: flag value for each character + * @size: size + * + * Queue a series of bytes to the tty buffering. All the characters passed are + * marked with the supplied flag. + * + * Returns: the number added. + */ +static inline int tty_insert_flip_string_fixed_flag(struct tty_port *port, + const u8 *chars, u8 flag, + size_t size) +{ + return __tty_insert_flip_string_flags(port, chars, &flag, false, size); +} + +/** + * tty_insert_flip_string_flags - add characters to the tty buffer + * @port: tty port + * @chars: characters + * @flags: flag bytes + * @size: size + * + * Queue a series of bytes to the tty buffering. For each character the flags + * array indicates the status of the character. + * + * Returns: the number added. + */ +static inline int tty_insert_flip_string_flags(struct tty_port *port, + const u8 *chars, const u8 *flags, + size_t size) +{ + return __tty_insert_flip_string_flags(port, chars, flags, true, size); +} + + static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) { struct tty_buffer *tb = port->buf.tail; -- cgit v1.2.3 From 6144922e17677204cbead4ee544699af69785a84 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Wed, 16 Aug 2023 12:55:25 +0200 Subject: tty: tty_buffer: switch insert functions to size_t All the functions accept size_t as a size argument. They finally return the same size (or less). It is quite unexpected that they return a signed value and can confuse users to check for negative values. Instead, return the same size_t as accepted to make clear we return values >= 0, where zero in fact means failure. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230816105530.3335-6-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_flip.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index 8b781f709605..569747364ae5 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -10,9 +10,9 @@ struct tty_ldisc; int tty_buffer_set_limit(struct tty_port *port, int limit); unsigned int tty_buffer_space_avail(struct tty_port *port); int tty_buffer_request_room(struct tty_port *port, size_t size); -int __tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, - const u8 *flags, bool mutable_flags, - size_t size); +size_t __tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, + const u8 *flags, bool mutable_flags, + size_t size); int tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); @@ -29,9 +29,9 @@ int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); * * Returns: the number added. */ -static inline int tty_insert_flip_string_fixed_flag(struct tty_port *port, - const u8 *chars, u8 flag, - size_t size) +static inline size_t tty_insert_flip_string_fixed_flag(struct tty_port *port, + const u8 *chars, u8 flag, + size_t size) { return __tty_insert_flip_string_flags(port, chars, &flag, false, size); } @@ -48,15 +48,15 @@ static inline int tty_insert_flip_string_fixed_flag(struct tty_port *port, * * Returns: the number added. */ -static inline int tty_insert_flip_string_flags(struct tty_port *port, - const u8 *chars, const u8 *flags, - size_t size) +static inline size_t tty_insert_flip_string_flags(struct tty_port *port, + const u8 *chars, + const u8 *flags, size_t size) { return __tty_insert_flip_string_flags(port, chars, flags, true, size); } -static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) +static inline size_t tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) { struct tty_buffer *tb = port->buf.tail; int change; @@ -71,8 +71,8 @@ static inline int tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) return __tty_insert_flip_char(port, ch, flag); } -static inline int tty_insert_flip_string(struct tty_port *port, - const u8 *chars, size_t size) +static inline size_t tty_insert_flip_string(struct tty_port *port, + const u8 *chars, size_t size) { return tty_insert_flip_string_fixed_flag(port, chars, TTY_NORMAL, size); } -- cgit v1.2.3 From 2ce2983c24c11beaeb6ef6da93bf7228679bd08b Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Wed, 16 Aug 2023 12:55:26 +0200 Subject: tty: tty_buffer: let tty_prepare_flip_string() return size_t The same as in the previous patch, tty_prepare_flip_string() accepts size_t as an size argument. It returns the same size (or less). It is unexpected that it returns a signed value and can confuse users to check for negative values. Instead, return the same size_t as accepted to make clear we return values >= 0, where zero in fact means failure. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230816105530.3335-7-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_flip.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index 569747364ae5..efd03d9c11f8 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -13,7 +13,7 @@ int tty_buffer_request_room(struct tty_port *port, size_t size); size_t __tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, const u8 *flags, bool mutable_flags, size_t size); -int tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); +size_t tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); -- cgit v1.2.3 From b49a0ff7328f5f9acfd027906f4c7d313a225361 Mon Sep 17 00:00:00 2001 From: "Jiri Slaby (SUSE)" Date: Wed, 16 Aug 2023 12:55:27 +0200 Subject: tty: tty_buffer: use __tty_insert_flip_string_flags() in tty_insert_flip_char() Use __tty_insert_flip_string_flags() for the slow path of tty_insert_flip_char(). The former is generic enough, so there is no reason to reimplement the injection once again. So now we have a single function stuffing into tty buffers. Signed-off-by: "Jiri Slaby (SUSE)" Link: https://lore.kernel.org/r/20230816105530.3335-8-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_flip.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h index efd03d9c11f8..af4fce98f64e 100644 --- a/include/linux/tty_flip.h +++ b/include/linux/tty_flip.h @@ -15,7 +15,6 @@ size_t __tty_insert_flip_string_flags(struct tty_port *port, const u8 *chars, size_t size); size_t tty_prepare_flip_string(struct tty_port *port, u8 **chars, size_t size); void tty_flip_buffer_push(struct tty_port *port); -int __tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag); /** * tty_insert_flip_string_fixed_flag - add characters to the tty buffer @@ -55,7 +54,14 @@ static inline size_t tty_insert_flip_string_flags(struct tty_port *port, return __tty_insert_flip_string_flags(port, chars, flags, true, size); } - +/** + * tty_insert_flip_char - add one character to the tty buffer + * @port: tty port + * @ch: character + * @flag: flag byte + * + * Queue a single byte @ch to the tty buffering, with an optional flag. + */ static inline size_t tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) { struct tty_buffer *tb = port->buf.tail; @@ -68,7 +74,7 @@ static inline size_t tty_insert_flip_char(struct tty_port *port, u8 ch, u8 flag) *char_buf_ptr(tb, tb->used++) = ch; return 1; } - return __tty_insert_flip_char(port, ch, flag); + return __tty_insert_flip_string_flags(port, &ch, &flag, false, 1); } static inline size_t tty_insert_flip_string(struct tty_port *port, -- cgit v1.2.3 From e4711d131aac951c4fdb8ddbee7cfccb36ec4be0 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 18 Aug 2023 20:43:38 +0800 Subject: greybus: svc: Remove unused declarations Commit 84427943d2da ("greybus: svc: drop legacy-protocol dependency") removed these implementations but not the declarations. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230818124338.37880-1-yuehaibing@huawei.com Signed-off-by: Greg Kroah-Hartman --- include/linux/greybus/svc.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/greybus/svc.h b/include/linux/greybus/svc.h index 5afaf5f06856..da547fb9071b 100644 --- a/include/linux/greybus/svc.h +++ b/include/linux/greybus/svc.h @@ -100,7 +100,4 @@ bool gb_svc_watchdog_enabled(struct gb_svc *svc); int gb_svc_watchdog_enable(struct gb_svc *svc); int gb_svc_watchdog_disable(struct gb_svc *svc); -int gb_svc_protocol_init(void); -void gb_svc_protocol_exit(void); - #endif /* __SVC_H */ -- cgit v1.2.3 From 9fb10726ecc5145550180aec4fd0adf0a7b1d634 Mon Sep 17 00:00:00 2001 From: Greg Joyce Date: Fri, 21 Jul 2023 16:15:32 -0500 Subject: block: sed-opal: Implement IOC_OPAL_DISCOVERY Add IOC_OPAL_DISCOVERY ioctl to return raw discovery data to a SED Opal application. This allows the application to display drive capabilities and state. Signed-off-by: Greg Joyce Reviewed-by: Christoph Hellwig Reviewed-by: Jonathan Derrick Acked-by: Jarkko Sakkinen Link: https://lore.kernel.org/r/20230721211534.3437070-2-gjoyce@linux.vnet.ibm.com Signed-off-by: Jens Axboe --- include/linux/sed-opal.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h index bbae1e52ab4f..ef65f589fbeb 100644 --- a/include/linux/sed-opal.h +++ b/include/linux/sed-opal.h @@ -47,6 +47,7 @@ static inline bool is_sed_ioctl(unsigned int cmd) case IOC_OPAL_GET_STATUS: case IOC_OPAL_GET_LR_STATUS: case IOC_OPAL_GET_GEOMETRY: + case IOC_OPAL_DISCOVERY: return true; } return false; -- cgit v1.2.3 From 5c82efc1aee8eb0919aa67a0d2559de5a326bd7c Mon Sep 17 00:00:00 2001 From: Greg Joyce Date: Fri, 21 Jul 2023 16:15:33 -0500 Subject: block: sed-opal: Implement IOC_OPAL_REVERT_LSP This is used in conjunction with IOC_OPAL_REVERT_TPR to return a drive to Original Factory State without erasing the data. If IOC_OPAL_REVERT_LSP is called with opal_revert_lsp.options bit OPAL_PRESERVE set prior to calling IOC_OPAL_REVERT_TPR, the drive global locking range will not be erased. Signed-off-by: Greg Joyce Reviewed-by: Christoph Hellwig Reviewed-by: Jonathan Derrick Acked-by: Jarkko Sakkinen Link: https://lore.kernel.org/r/20230721211534.3437070-3-gjoyce@linux.vnet.ibm.com Signed-off-by: Jens Axboe --- include/linux/sed-opal.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h index ef65f589fbeb..2f189546e133 100644 --- a/include/linux/sed-opal.h +++ b/include/linux/sed-opal.h @@ -48,6 +48,7 @@ static inline bool is_sed_ioctl(unsigned int cmd) case IOC_OPAL_GET_LR_STATUS: case IOC_OPAL_GET_GEOMETRY: case IOC_OPAL_DISCOVERY: + case IOC_OPAL_REVERT_LSP: return true; } return false; -- cgit v1.2.3 From 3bfeb61256643281ac4be5b8a57e9d9da3db4335 Mon Sep 17 00:00:00 2001 From: Greg Joyce Date: Fri, 21 Jul 2023 16:15:34 -0500 Subject: block: sed-opal: keyring support for SED keys Extend the SED block driver so it can alternatively obtain a key from a sed-opal kernel keyring. The SED ioctls will indicate the source of the key, either directly in the ioctl data or from the keyring. This allows the use of SED commands in scripts such as udev scripts so that drives may be automatically unlocked as they become available. Signed-off-by: Greg Joyce Reviewed-by: Jonathan Derrick Acked-by: Jarkko Sakkinen Link: https://lore.kernel.org/r/20230721211534.3437070-4-gjoyce@linux.vnet.ibm.com Signed-off-by: Jens Axboe --- include/linux/sed-opal.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h index 2f189546e133..2ac50822554e 100644 --- a/include/linux/sed-opal.h +++ b/include/linux/sed-opal.h @@ -25,6 +25,9 @@ bool opal_unlock_from_suspend(struct opal_dev *dev); struct opal_dev *init_opal_dev(void *data, sec_send_recv *send_recv); int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *ioctl_ptr); +#define OPAL_AUTH_KEY "opal-boot-pin" +#define OPAL_AUTH_KEY_PREV "opal-boot-pin-prev" + static inline bool is_sed_ioctl(unsigned int cmd) { switch (cmd) { -- cgit v1.2.3 From e26a99dd1522caefcde023aabc0f79d7e6a016bf Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 21 Jul 2023 20:12:16 +0800 Subject: PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended() Commit 76e267d822f2 ("PM / Runtime: Remove device fields related to suspend time, v2") left behind this. Signed-off-by: YueHaibing Signed-off-by: Rafael J. Wysocki --- include/linux/pm_runtime.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 9a8151a2bdea..7c9b35448563 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -85,8 +85,6 @@ extern void pm_runtime_irq_safe(struct device *dev); extern void __pm_runtime_use_autosuspend(struct device *dev, bool use); extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); extern u64 pm_runtime_autosuspend_expiration(struct device *dev); -extern void pm_runtime_update_max_time_suspended(struct device *dev, - s64 delta_ns); extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable); extern void pm_runtime_get_suppliers(struct device *dev); extern void pm_runtime_put_suppliers(struct device *dev); -- cgit v1.2.3 From 9e261e6da0a814f4ee1856ab06b19c25190aeffb Mon Sep 17 00:00:00 2001 From: Jeff Johnson Date: Tue, 22 Aug 2023 09:30:17 -0700 Subject: wifi: Fix ieee80211.h kernel-doc issues The kernel-doc script identified multiple issues in ieee80211.h, so fix them. In the process update some references to the latest applicable specification. include/linux/ieee80211.h:848: warning: Function parameter or member 'count' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'period' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'duration' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'offset' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'token' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'mode' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'type' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'request' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'mode' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'new_ch_num' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'count' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'mode' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'new_operating_class' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'new_ch_num' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'count' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_ttl' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_flags' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_reason' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_pre_value' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_channel_width' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_center_freq_seg0' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_center_freq_seg1' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:926: warning: expecting prototype for struct ieee80211_tim. Prototype was for struct ieee80211_tim_ie instead include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_psel' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_pmetric' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_congest' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_synch' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_auth' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_form' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_cap' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:964: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * mesh channel switch parameters element's flag indicator include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_flags' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_hopcount' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_ttl' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_addr' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_seq' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_interval' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_metric' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:1019: warning: expecting prototype for enum ieee80211_opmode_bits. Prototype was for enum ieee80211_vht_opmode_bits instead include/linux/ieee80211.h:1052: warning: Function parameter or member 'tx_power' not described in 'ieee80211_tpc_report_ie' include/linux/ieee80211.h:1052: warning: Function parameter or member 'link_margin' not described in 'ieee80211_tpc_report_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'compat_info' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'beacon_int' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'tsf_completion' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'ch_width' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'oper_class' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'primary_ch' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'oper_ch' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'basic_mcs_nss' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'aid' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'switch_count' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'response_int' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_STATUS' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MINOR_REASON' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_CAPABILITY' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_DEVICE_ID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GO_INTENT' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GO_CONFIG_TIMEOUT' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_LISTEN_CHANNEL' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_BSSID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_EXT_LISTEN_TIMING' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INTENDED_IFACE_ADDR' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MANAGABILITY' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_CHANNEL_LIST' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_ABSENCE_NOTICE' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_DEVICE_INFO' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_INFO' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_ID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INTERFACE' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_OPER_CHANNEL' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INVITE_FLAGS' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_VENDOR_SPECIFIC' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MAX' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1554: warning: Function parameter or member 'frame_control' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'duration' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'ra' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'ta' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'control' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'start_seq_num' not described in 'ieee80211_bar' include/linux/ieee80211.h:1579: warning: Function parameter or member 'reserved' not described in 'ieee80211_mcs_info' include/linux/ieee80211.h:1618: warning: Function parameter or member 'cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'ampdu_params_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'mcs' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'extended_ht_cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'tx_BF_cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'antenna_selection_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1704: warning: Function parameter or member 'primary_chan' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'ht_param' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'operation_mode' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'stbc_param' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'basic_set' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1872: warning: Function parameter or member 'mac_cap_info' not described in 'ieee80211_he_cap_elem' include/linux/ieee80211.h:1872: warning: Function parameter or member 'phy_cap_info' not described in 'ieee80211_he_cap_elem' include/linux/ieee80211.h:1936: warning: Function parameter or member 'he_oper_params' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1936: warning: Function parameter or member 'he_mcs_nss_set' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1936: warning: Function parameter or member 'optional' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1948: warning: Function parameter or member 'he_sr_control' not described in 'ieee80211_he_spr' include/linux/ieee80211.h:1948: warning: Function parameter or member 'optional' not described in 'ieee80211_he_spr' include/linux/ieee80211.h:1960: warning: Function parameter or member 'aifsn' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1960: warning: Function parameter or member 'ecw_min_max' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1960: warning: Function parameter or member 'mu_edca_timer' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1974: warning: Function parameter or member 'mu_qos_info' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_be' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_bk' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_vi' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_vo' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_LPI_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_SP_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_VLP_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_SP' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_VLP' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_LPI' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2577: warning: cannot understand function prototype: 'struct ieee80211_he_6ghz_oper ' include/linux/ieee80211.h:2624: warning: Function parameter or member 'tx_power_info' not described in 'ieee80211_tx_pwr_env' include/linux/ieee80211.h:2624: warning: Function parameter or member 'tx_power' not described in 'ieee80211_tx_pwr_env' include/linux/ieee80211.h:4485: warning: expecting prototype for RSNX Capabilities(). Prototype was for WLAN_RSNX_CAPA_PROTECTED_TWT() instead include/linux/ieee80211.h:4734: warning: expecting prototype for ieee80211_mle_get_eml_sync_delay(). Prototype was for ieee80211_mle_get_eml_med_sync_delay() instead 117 warnings as Errors Signed-off-by: Jeff Johnson Link: https://lore.kernel.org/r/20230822-kerneldoc-v1-1-0d42ce5029bf@quicinc.com Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 235 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 177 insertions(+), 58 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 4b998090898e..bd2f6e19c357 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -836,9 +836,14 @@ enum ieee80211_preq_target_flags { }; /** - * struct ieee80211_quiet_ie + * struct ieee80211_quiet_ie - Quiet element + * @count: Quiet Count + * @period: Quiet Period + * @duration: Quiet Duration + * @offset: Quiet Offset * - * This structure refers to "Quiet information element" + * This structure represents the payload of the "Quiet element" as + * described in IEEE Std 802.11-2020 section 9.4.2.22. */ struct ieee80211_quiet_ie { u8 count; @@ -848,9 +853,15 @@ struct ieee80211_quiet_ie { } __packed; /** - * struct ieee80211_msrment_ie + * struct ieee80211_msrment_ie - Measurement element + * @token: Measurement Token + * @mode: Measurement Report Mode + * @type: Measurement Type + * @request: Measurement Request or Measurement Report * - * This structure refers to "Measurement Request/Report information element" + * This structure represents the payload of both the "Measurement + * Request element" and the "Measurement Report element" as described + * in IEEE Std 802.11-2020 sections 9.4.2.20 and 9.4.2.21. */ struct ieee80211_msrment_ie { u8 token; @@ -860,9 +871,14 @@ struct ieee80211_msrment_ie { } __packed; /** - * struct ieee80211_channel_sw_ie + * struct ieee80211_channel_sw_ie - Channel Switch Announcement element + * @mode: Channel Switch Mode + * @new_ch_num: New Channel Number + * @count: Channel Switch Count * - * This structure refers to "Channel Switch Announcement information element" + * This structure represents the payload of the "Channel Switch + * Announcement element" as described in IEEE Std 802.11-2020 section + * 9.4.2.18. */ struct ieee80211_channel_sw_ie { u8 mode; @@ -871,9 +887,14 @@ struct ieee80211_channel_sw_ie { } __packed; /** - * struct ieee80211_ext_chansw_ie + * struct ieee80211_ext_chansw_ie - Extended Channel Switch Announcement element + * @mode: Channel Switch Mode + * @new_operating_class: New Operating Class + * @new_ch_num: New Channel Number + * @count: Channel Switch Count * - * This structure represents the "Extended Channel Switch Announcement element" + * This structure represents the "Extended Channel Switch Announcement + * element" as described in IEEE Std 802.11-2020 section 9.4.2.52. */ struct ieee80211_ext_chansw_ie { u8 mode; @@ -894,8 +915,14 @@ struct ieee80211_sec_chan_offs_ie { /** * struct ieee80211_mesh_chansw_params_ie - mesh channel switch parameters IE + * @mesh_ttl: Time To Live + * @mesh_flags: Flags + * @mesh_reason: Reason Code + * @mesh_pre_value: Precedence Value * - * This structure represents the "Mesh Channel Switch Paramters element" + * This structure represents the payload of the "Mesh Channel Switch + * Parameters element" as described in IEEE Std 802.11-2020 section + * 9.4.2.102. */ struct ieee80211_mesh_chansw_params_ie { u8 mesh_ttl; @@ -906,6 +933,13 @@ struct ieee80211_mesh_chansw_params_ie { /** * struct ieee80211_wide_bw_chansw_ie - wide bandwidth channel switch IE + * @new_channel_width: New Channel Width + * @new_center_freq_seg0: New Channel Center Frequency Segment 0 + * @new_center_freq_seg1: New Channel Center Frequency Segment 1 + * + * This structure represents the payload of the "Wide Bandwidth + * Channel Switch element" as described in IEEE Std 802.11-2020 + * section 9.4.2.160. */ struct ieee80211_wide_bw_chansw_ie { u8 new_channel_width; @@ -913,9 +947,14 @@ struct ieee80211_wide_bw_chansw_ie { } __packed; /** - * struct ieee80211_tim + * struct ieee80211_tim_ie - Traffic Indication Map information element + * @dtim_count: DTIM Count + * @dtim_period: DTIM Period + * @bitmap_ctrl: Bitmap Control + * @virtual_map: Partial Virtual Bitmap * - * This structure refers to "Traffic Indication Map information element" + * This structure represents the payload of the "TIM element" as + * described in IEEE Std 802.11-2020 section 9.4.2.5. */ struct ieee80211_tim_ie { u8 dtim_count; @@ -926,9 +965,17 @@ struct ieee80211_tim_ie { } __packed; /** - * struct ieee80211_meshconf_ie + * struct ieee80211_meshconf_ie - Mesh Configuration element + * @meshconf_psel: Active Path Selection Protocol Identifier + * @meshconf_pmetric: Active Path Selection Metric Identifier + * @meshconf_congest: Congestion Control Mode Identifier + * @meshconf_synch: Synchronization Method Identifier + * @meshconf_auth: Authentication Protocol Identifier + * @meshconf_form: Mesh Formation Info + * @meshconf_cap: Mesh Capability (see &enum mesh_config_capab_flags) * - * This structure refers to "Mesh Configuration information element" + * This structure represents the payload of the "Mesh Configuration + * element" as described in IEEE Std 802.11-2020 section 9.4.2.97. */ struct ieee80211_meshconf_ie { u8 meshconf_psel; @@ -950,6 +997,9 @@ struct ieee80211_meshconf_ie { * is ongoing * @IEEE80211_MESHCONF_CAPAB_POWER_SAVE_LEVEL: STA is in deep sleep mode or has * neighbors in deep sleep mode + * + * Enumerates the "Mesh Capability" as described in IEEE Std + * 802.11-2020 section 9.4.2.97.7. */ enum mesh_config_capab_flags { IEEE80211_MESHCONF_CAPAB_ACCEPT_PLINKS = 0x01, @@ -960,7 +1010,7 @@ enum mesh_config_capab_flags { #define IEEE80211_MESHCONF_FORM_CONNECTED_TO_GATE 0x1 -/** +/* * mesh channel switch parameters element's flag indicator * */ @@ -969,9 +1019,17 @@ enum mesh_config_capab_flags { #define WLAN_EID_CHAN_SWITCH_PARAM_REASON BIT(2) /** - * struct ieee80211_rann_ie + * struct ieee80211_rann_ie - RANN (root announcement) element + * @rann_flags: Flags + * @rann_hopcount: Hop Count + * @rann_ttl: Element TTL + * @rann_addr: Root Mesh STA Address + * @rann_seq: HWMP Sequence Number + * @rann_interval: Interval + * @rann_metric: Metric * - * This structure refers to "Root Announcement information element" + * This structure represents the payload of the "RANN element" as + * described in IEEE Std 802.11-2020 section 9.4.2.111. */ struct ieee80211_rann_ie { u8 rann_flags; @@ -993,7 +1051,7 @@ enum ieee80211_ht_chanwidth_values { }; /** - * enum ieee80211_opmode_bits - VHT operating mode field bits + * enum ieee80211_vht_opmode_bits - VHT operating mode field bits * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_MASK: channel width mask * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_20MHZ: 20 MHz channel width * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_40MHZ: 40 MHz channel width @@ -1042,9 +1100,12 @@ enum ieee80211_s1g_chanwidth { #define WLAN_USER_POSITION_LEN 16 /** - * struct ieee80211_tpc_report_ie + * struct ieee80211_tpc_report_ie - TPC Report element + * @tx_power: Transmit Power + * @link_margin: Link Margin * - * This structure refers to "TPC Report element" + * This structure represents the payload of the "TPC Report element" as + * described in IEEE Std 802.11-2020 section 9.4.2.16. */ struct ieee80211_tpc_report_ie { u8 tx_power; @@ -1062,9 +1123,14 @@ struct ieee80211_addba_ext_ie { } __packed; /** - * struct ieee80211_s1g_bcn_compat_ie + * struct ieee80211_s1g_bcn_compat_ie - S1G Beacon Compatibility element + * @compat_info: Compatibility Information + * @beacon_int: Beacon Interval + * @tsf_completion: TSF Completion * - * S1G Beacon Compatibility element + * This structure represents the payload of the "S1G Beacon + * Compatibility element" as described in IEEE Std 802.11-2020 section + * 9.4.2.196. */ struct ieee80211_s1g_bcn_compat_ie { __le16 compat_info; @@ -1073,9 +1139,15 @@ struct ieee80211_s1g_bcn_compat_ie { } __packed; /** - * struct ieee80211_s1g_oper_ie + * struct ieee80211_s1g_oper_ie - S1G Operation element + * @ch_width: S1G Operation Information Channel Width + * @oper_class: S1G Operation Information Operating Class + * @primary_ch: S1G Operation Information Primary Channel Number + * @oper_ch: S1G Operation Information Channel Center Frequency + * @basic_mcs_nss: Basic S1G-MCS and NSS Set * - * S1G Operation element + * This structure represents the payload of the "S1G Operation + * element" as described in IEEE Std 802.11-2020 section 9.4.2.212. */ struct ieee80211_s1g_oper_ie { u8 ch_width; @@ -1086,9 +1158,13 @@ struct ieee80211_s1g_oper_ie { } __packed; /** - * struct ieee80211_aid_response_ie + * struct ieee80211_aid_response_ie - AID Response element + * @aid: AID/Group AID + * @switch_count: AID Switch Count + * @response_int: AID Response Interval * - * AID Response element + * This structure represents the payload of the "AID Response element" + * as described in IEEE Std 802.11-2020 section 9.4.2.194. */ struct ieee80211_aid_response_ie { __le16 aid; @@ -1489,7 +1565,7 @@ struct ieee80211_tdls_data { /* * Peer-to-Peer IE attribute related definitions. */ -/** +/* * enum ieee80211_p2p_attr_id - identifies type of peer-to-peer attribute. */ enum ieee80211_p2p_attr_id { @@ -1539,11 +1615,17 @@ struct ieee80211_p2p_noa_attr { #define IEEE80211_P2P_OPPPS_CTWINDOW_MASK 0x7F /** - * struct ieee80211_bar - HT Block Ack Request + * struct ieee80211_bar - Block Ack Request frame format + * @frame_control: Frame Control + * @duration: Duration + * @ra: RA + * @ta: TA + * @control: BAR Control + * @start_seq_num: Starting Sequence Number (see Figure 9-37) * - * This structure refers to "HT BlockAckReq" as - * described in 802.11n draft section 7.2.1.7.1 - */ + * This structure represents the "BlockAckReq frame format" + * as described in IEEE Std 802.11-2020 section 9.3.1.7. +*/ struct ieee80211_bar { __le16 frame_control; __le16 duration; @@ -1563,13 +1645,17 @@ struct ieee80211_bar { #define IEEE80211_HT_MCS_MASK_LEN 10 /** - * struct ieee80211_mcs_info - MCS information + * struct ieee80211_mcs_info - Supported MCS Set field * @rx_mask: RX mask * @rx_highest: highest supported RX rate. If set represents * the highest supported RX data rate in units of 1 Mbps. * If this field is 0 this value should not be used to * consider the highest RX data rate supported. * @tx_params: TX parameters + * @reserved: Reserved bits + * + * This structure represents the "Supported MCS Set field" as + * described in IEEE Std 802.11-2020 section 9.4.2.55.4. */ struct ieee80211_mcs_info { u8 rx_mask[IEEE80211_HT_MCS_MASK_LEN]; @@ -1600,10 +1686,16 @@ struct ieee80211_mcs_info { (IEEE80211_HT_MCS_UNEQUAL_MODULATION_START / 8) /** - * struct ieee80211_ht_cap - HT capabilities + * struct ieee80211_ht_cap - HT capabilities element + * @cap_info: HT Capability Information + * @ampdu_params_info: A-MPDU Parameters + * @mcs: Supported MCS Set + * @extended_ht_cap_info: HT Extended Capabilities + * @tx_BF_cap_info: Transmit Beamforming Capabilities + * @antenna_selection_info: ASEL Capability * - * This structure is the "HT capabilities element" as - * described in 802.11n D5.0 7.3.2.57 + * This structure represents the payload of the "HT Capabilities + * element" as described in IEEE Std 802.11-2020 section 9.4.2.55. */ struct ieee80211_ht_cap { __le16 cap_info; @@ -1691,9 +1783,14 @@ enum ieee80211_min_mpdu_spacing { /** * struct ieee80211_ht_operation - HT operation IE + * @primary_chan: Primary Channel + * @ht_param: HT Operation Information parameters + * @operation_mode: HT Operation Information operation mode + * @stbc_param: HT Operation Information STBC params + * @basic_set: Basic HT-MCS Set * - * This structure is the "HT operation element" as - * described in 802.11n-2009 7.3.2.57 + * This structure represents the payload of the "HT Operation + * element" as described in IEEE Std 802.11-2020 section 9.4.2.56. */ struct ieee80211_ht_operation { u8 primary_chan; @@ -1862,9 +1959,12 @@ struct ieee80211_vht_operation { /** * struct ieee80211_he_cap_elem - HE capabilities element + * @mac_cap_info: HE MAC Capabilities Information + * @phy_cap_info: HE PHY Capabilities Information * - * This structure is the "HE capabilities element" fixed fields as - * described in P802.11ax_D4.0 section 9.4.2.242.2 and 9.4.2.242.3 + * This structure represents the fixed fields of the payload of the + * "HE capabilities element" as described in IEEE Std 802.11ax-2021 + * sections 9.4.2.248.2 and 9.4.2.248.3. */ struct ieee80211_he_cap_elem { u8 mac_cap_info[6]; @@ -1923,35 +2023,45 @@ struct ieee80211_he_mcs_nss_supp { } __packed; /** - * struct ieee80211_he_operation - HE capabilities element + * struct ieee80211_he_operation - HE Operation element + * @he_oper_params: HE Operation Parameters + BSS Color Information + * @he_mcs_nss_set: Basic HE-MCS And NSS Set + * @optional: Optional fields VHT Operation Information, Max Co-Hosted + * BSSID Indicator, and 6 GHz Operation Information * - * This structure is the "HE operation element" fields as - * described in P802.11ax_D4.0 section 9.4.2.243 + * This structure represents the payload of the "HE Operation + * element" as described in IEEE Std 802.11ax-2021 section 9.4.2.249. */ struct ieee80211_he_operation { __le32 he_oper_params; __le16 he_mcs_nss_set; - /* Optional 0,1,3,4,5,7 or 8 bytes: depends on @he_oper_params */ u8 optional[]; } __packed; /** - * struct ieee80211_he_spr - HE spatial reuse element + * struct ieee80211_he_spr - Spatial Reuse Parameter Set element + * @he_sr_control: SR Control + * @optional: Optional fields Non-SRG OBSS PD Max Offset, SRG OBSS PD + * Min Offset, SRG OBSS PD Max Offset, SRG BSS Color + * Bitmap, and SRG Partial BSSID Bitmap * - * This structure is the "HE spatial reuse element" element as - * described in P802.11ax_D4.0 section 9.4.2.241 + * This structure represents the payload of the "Spatial Reuse + * Parameter Set element" as described in IEEE Std 802.11ax-2021 + * section 9.4.2.252. */ struct ieee80211_he_spr { u8 he_sr_control; - /* Optional 0 to 19 bytes: depends on @he_sr_control */ u8 optional[]; } __packed; /** * struct ieee80211_he_mu_edca_param_ac_rec - MU AC Parameter Record field + * @aifsn: ACI/AIFSN + * @ecw_min_max: ECWmin/ECWmax + * @mu_edca_timer: MU EDCA Timer * - * This structure is the "MU AC Parameter Record" fields as - * described in P802.11ax_D4.0 section 9.4.2.245 + * This structure represents the "MU AC Parameter Record" as described + * in IEEE Std 802.11ax-2021 section 9.4.2.251, Figure 9-788p. */ struct ieee80211_he_mu_edca_param_ac_rec { u8 aifsn; @@ -1961,9 +2071,14 @@ struct ieee80211_he_mu_edca_param_ac_rec { /** * struct ieee80211_mu_edca_param_set - MU EDCA Parameter Set element + * @mu_qos_info: QoS Info + * @ac_be: MU AC_BE Parameter Record + * @ac_bk: MU AC_BK Parameter Record + * @ac_vi: MU AC_VI Parameter Record + * @ac_vo: MU AC_VO Parameter Record * - * This structure is the "MU EDCA Parameter Set element" fields as - * described in P802.11ax_D4.0 section 9.4.2.245 + * This structure represents the payload of the "MU EDCA Parameter Set + * element" as described in IEEE Std 802.11ax-2021 section 9.4.2.251. */ struct ieee80211_mu_edca_param_set { u8 mu_qos_info; @@ -2177,9 +2292,9 @@ int ieee80211_get_vht_max_nss(struct ieee80211_vht_cap *cap, * enum ieee80211_ap_reg_power - regulatory power for a Access Point * * @IEEE80211_REG_UNSET_AP: Access Point has no regulatory power mode - * @IEEE80211_REG_LPI: Indoor Access Point - * @IEEE80211_REG_SP: Standard power Access Point - * @IEEE80211_REG_VLP: Very low power Access Point + * @IEEE80211_REG_LPI_AP: Indoor Access Point + * @IEEE80211_REG_SP_AP: Standard power Access Point + * @IEEE80211_REG_VLP_AP: Very low power Access Point * @IEEE80211_REG_AP_POWER_AFTER_LAST: internal * @IEEE80211_REG_AP_POWER_MAX: maximum value */ @@ -2567,7 +2682,7 @@ static inline bool ieee80211_he_capa_size_ok(const u8 *data, u8 len) #define IEEE80211_6GHZ_CTRL_REG_SP_AP 1 /** - * ieee80211_he_6ghz_oper - HE 6 GHz operation Information field + * struct ieee80211_he_6ghz_oper - HE 6 GHz operation Information field * @primary: primary channel * @control: control flags * @ccfs0: channel center frequency segment 0 @@ -2614,9 +2729,13 @@ enum ieee80211_tx_power_intrpt_type { }; /** - * struct ieee80211_tx_pwr_env + * struct ieee80211_tx_pwr_env - Transmit Power Envelope + * @tx_power_info: Transmit Power Information field + * @tx_power: Maximum Transmit Power field * - * This structure represents the "Transmit Power Envelope element" + * This structure represents the payload of the "Transmit Power + * Envelope element" as described in IEEE Std 802.11ax-2021 section + * 9.4.2.161 */ struct ieee80211_tx_pwr_env { u8 tx_power_info; @@ -4478,7 +4597,7 @@ static inline bool for_each_element_completed(const struct element *element, return (const u8 *)element == (const u8 *)data + datalen; } -/** +/* * RSNX Capabilities: * bits 0-3: Field length (n-1) */ @@ -4721,7 +4840,7 @@ ieee80211_mle_get_bss_param_ch_cnt(const struct ieee80211_multi_link_elem *mle) } /** - * ieee80211_mle_get_eml_sync_delay - returns the medium sync delay + * ieee80211_mle_get_eml_med_sync_delay - returns the medium sync delay * @data: pointer to the multi link EHT IE * * The element is assumed to be of the correct type (BASIC) and big enough, -- cgit v1.2.3 From b544fc2b8606d718d0cc788ff2ea2492871df488 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Tue, 15 Aug 2023 10:19:56 -0700 Subject: of: dynamic: Add interfaces for creating device node dynamically MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit of_changeset_create_node() creates device node dynamically and attaches the newly created node to a changeset. Expand of_changeset APIs to handle specific types of properties. of_changeset_add_prop_string() of_changeset_add_prop_string_array() of_changeset_add_prop_u32_array() Signed-off-by: Clément Léger Signed-off-by: Lizhi Hou Link: https://lore.kernel.org/r/1692120000-46900-2-git-send-email-lizhi.hou@amd.com Signed-off-by: Rob Herring --- include/linux/of.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 7656cc637746..bc2e37f016ec 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1579,6 +1579,29 @@ static inline int of_changeset_update_property(struct of_changeset *ocs, { return of_changeset_action(ocs, OF_RECONFIG_UPDATE_PROPERTY, np, prop); } + +struct device_node *of_changeset_create_node(struct of_changeset *ocs, + struct device_node *parent, + const char *full_name); +int of_changeset_add_prop_string(struct of_changeset *ocs, + struct device_node *np, + const char *prop_name, const char *str); +int of_changeset_add_prop_string_array(struct of_changeset *ocs, + struct device_node *np, + const char *prop_name, + const char **str_array, size_t sz); +int of_changeset_add_prop_u32_array(struct of_changeset *ocs, + struct device_node *np, + const char *prop_name, + const u32 *array, size_t sz); +static inline int of_changeset_add_prop_u32(struct of_changeset *ocs, + struct device_node *np, + const char *prop_name, + const u32 val) +{ + return of_changeset_add_prop_u32_array(ocs, np, prop_name, &val, 1); +} + #else /* CONFIG_OF_DYNAMIC */ static inline int of_reconfig_notifier_register(struct notifier_block *nb) { -- cgit v1.2.3 From 47284862bfc7fd5672e731e827f43f26bdbd155c Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Tue, 15 Aug 2023 10:19:59 -0700 Subject: of: overlay: Extend of_overlay_fdt_apply() to specify the target node Currently, in an overlay fdt fragment, it needs to specify the exact location in base DT. In another word, when the fdt fragment is generated, the base DT location for the fragment is already known. There is new use case that the base DT location is unknown when fdt fragment is generated. For example, the add-on device provide a fdt overlay with its firmware to describe its downstream devices. Because it is add-on device which can be plugged to different systems, its firmware will not be able to know the overlay location in base DT. Instead, the device driver will load the overlay fdt and apply it to base DT at runtime. In this case, of_overlay_fdt_apply() needs to be extended to specify the target node for device driver to apply overlay fdt. int overlay_fdt_apply(..., struct device_node *base); Signed-off-by: Lizhi Hou Link: https://lore.kernel.org/r/1692120000-46900-5-git-send-email-lizhi.hou@amd.com Signed-off-by: Rob Herring --- include/linux/of.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index bc2e37f016ec..ed679819c279 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1667,7 +1667,7 @@ struct of_overlay_notify_data { #ifdef CONFIG_OF_OVERLAY int of_overlay_fdt_apply(const void *overlay_fdt, u32 overlay_fdt_size, - int *ovcs_id); + int *ovcs_id, struct device_node *target_base); int of_overlay_remove(int *ovcs_id); int of_overlay_remove_all(void); -- cgit v1.2.3 From 10bb4e4ab7dd3898f413b5b4a3b81f6e9b1f6bf5 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Thu, 10 Aug 2023 18:21:19 +0200 Subject: PM: sleep: Add helpers to allow a device to remain powered-on On some platforms a device and its corresponding PM domain, may need to remain powered-on during system wide suspend, to support various use cases. For example, when the console_suspend_enabled flag is unset for a serial controller, the corresponding device may need to remain powered on. Other use cases exists too. In fact, we already have the mechanism in the PM core to deal with these kind of use cases. However, the current naming of the corresponding functions/flags clearly suggests these should be use for system wakeup. See device_wakeup_path(), device_set_wakeup_path and dev->power.wakeup_path. As a way to extend the use of the existing mechanism, let's introduce two new helpers functions, device_awake_path() and device_set_awake_path(). At this point, let them act as wrappers of the existing functions. Ideally, when all users have been converted to use the new helpers, we may decide to drop the old ones and rename the flag. Signed-off-by: Ulf Hansson Reviewed-by: Peng Fan Signed-off-by: Rafael J. Wysocki --- include/linux/pm_wakeup.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h index 77f4849e3418..6eb9adaef52b 100644 --- a/include/linux/pm_wakeup.h +++ b/include/linux/pm_wakeup.h @@ -194,6 +194,16 @@ static inline void pm_wakeup_dev_event(struct device *dev, unsigned int msec, #endif /* !CONFIG_PM_SLEEP */ +static inline bool device_awake_path(struct device *dev) +{ + return device_wakeup_path(dev); +} + +static inline void device_set_awake_path(struct device *dev) +{ + device_set_wakeup_path(dev); +} + static inline void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) { return pm_wakeup_ws_event(ws, msec, false); -- cgit v1.2.3 From a436ae9434ec491eefb4ed4c77bdb9c762925976 Mon Sep 17 00:00:00 2001 From: Liao Chang Date: Wed, 16 Aug 2023 01:58:53 +0000 Subject: cpufreq: Use clamp() helper macro to improve the code readability The valid values of policy.{min, max} should be between 'min' and 'max', so use clamp() helper macro to makes cpufreq_verify_within_limits() easier to follow. Signed-off-by: Liao Chang [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki --- include/linux/cpufreq.h | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 172ff51c1b2a..9bf94ae08158 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -19,6 +19,7 @@ #include #include #include +#include /********************************************************************* * CPUFREQ INTERFACE * @@ -467,17 +468,8 @@ static inline void cpufreq_verify_within_limits(struct cpufreq_policy_data *poli unsigned int min, unsigned int max) { - if (policy->min < min) - policy->min = min; - if (policy->max < min) - policy->max = min; - if (policy->min > max) - policy->min = max; - if (policy->max > max) - policy->max = max; - if (policy->min > policy->max) - policy->min = policy->max; - return; + policy->max = clamp(policy->max, min, max); + policy->min = clamp(policy->min, min, policy->max); } static inline void -- cgit v1.2.3 From f316cdff8d677db9ad9c90acb44c4cd535b0ee27 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 13:30:22 -0700 Subject: clk: Annotate struct clk_hw_onecell_data with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct clk_hw_onecell_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Michael Turquette Cc: Stephen Boyd Cc: Joel Stanley Cc: Andrew Jeffery Cc: Taichi Sugaya Cc: Takao Orito Cc: Qin Jian Cc: Andrew Lunn Cc: Gregory Clement Cc: Sebastian Hesselbarth Cc: Andy Gross Cc: Bjorn Andersson Cc: Konrad Dybcio Cc: Sergio Paracuellos Cc: Matthias Brugger Cc: AngeloGioacchino Del Regno Cc: Maxime Ripard Cc: Chen-Yu Tsai Cc: Jernej Skrabec Cc: David Airlie Cc: Daniel Vetter Cc: Samuel Holland Cc: Vinod Koul Cc: Kishon Vijay Abraham I Cc: linux-clk@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-aspeed@lists.ozlabs.org Cc: linux-arm-msm@vger.kernel.org Cc: linux-mediatek@lists.infradead.org Cc: dri-devel@lists.freedesktop.org Cc: linux-sunxi@lists.linux.dev Cc: linux-phy@lists.infradead.org Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20230817203019.never.795-kees@kernel.org Reviewed-by: Gustavo A. R. Silva Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 0f0cd01906b4..ec32ec58c59f 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1379,7 +1379,7 @@ struct clk_onecell_data { struct clk_hw_onecell_data { unsigned int num; - struct clk_hw *hws[]; + struct clk_hw *hws[] __counted_by(num); }; #define CLK_OF_DECLARE(name, compat, fn) \ -- cgit v1.2.3 From 979663c3d273a1b36362186607acfc311b521848 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Duje=20Mihanovi=C4=87?= Date: Fri, 4 Aug 2023 15:49:32 +0200 Subject: clk: mmp: Remove old non-OF clock drivers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are no MMP2, PXA168 or PXA910 boards still using board files which would use these drivers, so remove them. Signed-off-by: Duje Mihanović Link: https://lore.kernel.org/r/20230804-drop-old-mmp-clk-v1-1-0c07db6cee90@skole.hr Signed-off-by: Stephen Boyd --- include/linux/clk/mmp.h | 18 ------------------ 1 file changed, 18 deletions(-) delete mode 100644 include/linux/clk/mmp.h (limited to 'include/linux') diff --git a/include/linux/clk/mmp.h b/include/linux/clk/mmp.h deleted file mode 100644 index 445130460380..000000000000 --- a/include/linux/clk/mmp.h +++ /dev/null @@ -1,18 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __CLK_MMP_H -#define __CLK_MMP_H - -#include - -extern void pxa168_clk_init(phys_addr_t mpmu_phys, - phys_addr_t apmu_phys, - phys_addr_t apbc_phys); -extern void pxa910_clk_init(phys_addr_t mpmu_phys, - phys_addr_t apmu_phys, - phys_addr_t apbc_phys, - phys_addr_t apbcp_phys); -extern void mmp2_clk_init(phys_addr_t mpmu_phys, - phys_addr_t apmu_phys, - phys_addr_t apbc_phys); - -#endif -- cgit v1.2.3 From b1d1e90490b671444ebf66292201572c1059d323 Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Wed, 23 Aug 2023 01:25:42 +0900 Subject: tracing/probes: Support BTF argument on module functions Since the btf returned from bpf_get_btf_vmlinux() only covers functions in the vmlinux, BTF argument is not available on the functions in the modules. Use bpf_find_btf_id() instead of bpf_get_btf_vmlinux()+btf_find_name_kind() so that BTF argument can find the correct struct btf and btf_type in it. With this fix, fprobe events can use `$arg*` on module functions as below # grep nf_log_ip_packet /proc/kallsyms ffffffffa0005c00 t nf_log_ip_packet [nf_log_syslog] ffffffffa0005bf0 t __pfx_nf_log_ip_packet [nf_log_syslog] # echo 'f nf_log_ip_packet $arg*' > dynamic_events # cat dynamic_events f:fprobes/nf_log_ip_packet__entry nf_log_ip_packet net=net pf=pf hooknum=hooknum skb=skb in=in out=out loginfo=loginfo prefix=prefix To support the module's btf which is removable, the struct btf needs to be ref-counted. So this also records the btf in the traceprobe_parse_context and returns the refcount when the parse has done. Link: https://lore.kernel.org/all/169272154223.160970.3507930084247934031.stgit@devnote2/ Suggested-by: Alexei Starovoitov Signed-off-by: Masami Hiramatsu (Google) Acked-by: Steven Rostedt (Google) --- include/linux/btf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index cac9f304e27a..dbfe41a09c4b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -211,6 +211,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); bool btf_type_is_void(const struct btf_type *t); s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); +s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p); const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, u32 id, u32 *res_id); const struct btf_type *btf_type_resolve_ptr(const struct btf *btf, -- cgit v1.2.3 From eb6603246ab9a3787e9603f0fd6a321b46623e46 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 21 Aug 2023 21:00:02 +0800 Subject: qed/qede: Remove unused declarations Commit 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra") removed qede_udp_tunnel_{add,del}() but not the declarations. Commit 0ebcebbef1cc ("qed: Read device port count from the shmem") removed qed_device_num_engines() but not its declaration. Commit 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") declared but never implemented qed_fcoe_set_pf_params(). Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/qed/qed_fcoe_if.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/qed/qed_fcoe_if.h b/include/linux/qed/qed_fcoe_if.h index 90e3045b2dcb..0d3b6ed21628 100644 --- a/include/linux/qed/qed_fcoe_if.h +++ b/include/linux/qed/qed_fcoe_if.h @@ -67,9 +67,6 @@ struct qed_fcoe_cb_ops { u32 (*get_login_failures)(void *cookie); }; -void qed_fcoe_set_pf_params(struct qed_dev *cdev, - struct qed_fcoe_pf_params *params); - /** * struct qed_fcoe_ops - qed FCoE operations. * @common: common operations pointer -- cgit v1.2.3 From 71ab55a9af80fcffe1f42b9b8dba4d0e3dd0c351 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:15 +0200 Subject: mlx4: Get rid of the mlx4_interface.get_dev callback Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 1834c8fad12e..923951e19300 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -59,7 +59,6 @@ struct mlx4_interface { void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, unsigned long param); - void * (*get_dev)(struct mlx4_dev *dev, void *context, u8 port); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; @@ -88,8 +87,6 @@ struct mlx4_port_map { int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); -void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port); - struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port); #endif /* MLX4_DRIVER_H */ -- cgit v1.2.3 From 7ba189ac52acab44129f09b302069e5a86fd92c2 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:17 +0200 Subject: mlx4: Use 'void *' as the event param of mlx4_dispatch_event() Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void *' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void *' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 923951e19300..032d7f5bfef6 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,7 +58,7 @@ struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, - enum mlx4_dev_event event, unsigned long param); + enum mlx4_dev_event event, void *param); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; -- cgit v1.2.3 From 73d68002a02efd370dba6b8fc570427326e36d1a Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:18 +0200 Subject: mlx4: Replace the mlx4_interface.event callback with a notifier Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Acked-by: Tariq Toukan Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 032d7f5bfef6..228da8ed7e75 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -34,6 +34,7 @@ #define MLX4_DRIVER_H #include +#include #include struct mlx4_dev; @@ -57,8 +58,6 @@ enum { struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); - void (*event) (struct mlx4_dev *dev, void *context, - enum mlx4_dev_event event, void *param); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; @@ -87,6 +86,11 @@ struct mlx4_port_map { int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); +int mlx4_register_event_notifier(struct mlx4_dev *dev, + struct notifier_block *nb); +int mlx4_unregister_event_notifier(struct mlx4_dev *dev, + struct notifier_block *nb); + struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port); #endif /* MLX4_DRIVER_H */ -- cgit v1.2.3 From 13f857111cb23f2a8dbcd0271c3ff1824913d980 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:19 +0200 Subject: mlx4: Get rid of the mlx4_interface.activate callback The mlx4_interface.activate callback was introduced in commit 79857cd31fe7 ("net/mlx4: Postpone the registration of net_device"). It dealt with a situation when a netdev notifier received a NETDEV_REGISTER event for a new net_device created by mlx4_en but the same device was not yet visible to mlx4_get_protocol_dev(). The callback can be removed now that mlx4_get_protocol_dev() is gone. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 228da8ed7e75..0f8c9ba4c574 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,7 +58,6 @@ enum { struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); - void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; int flags; -- cgit v1.2.3 From e2fb47d4eb5cd245c38c8c57d969ac6b12efc764 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:20 +0200 Subject: mlx4: Move the bond work to the core driver Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond reconfiguration. It gathers data about a new port map setting, takes a reference on the netdev that triggered the change and queues a work object on mlx4_en_priv.mdev.workqueue to perform the operation. The scheduled work is mlx4_en_bond_work() which calls mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond(). At the same time, function mlx4_change_port_types() in mlx4_core might be invoked to change the port type configuration. As part of its logic, it re-registers the whole device by calling mlx4_unregister_device(), followed by mlx4_register_device(). The two operations can result in concurrent access to the data about currently active interfaces on the device. Functions mlx4_register_device() and mlx4_unregister_device() lock the intf_mutex to gain exclusive access to this data. The current implementation of mlx4_do_bond() doesn't do that which could result in an unexpected behavior. An updated version of mlx4_do_bond() for use with an auxiliary bus goes and locks the intf_mutex when accessing a new auxiliary device array. However, doing so can then result in the following deadlock: * A two-port mlx4 device is configured as an Ethernet bond. * One of the ports is changed from eth to ib, for instance, by writing into a mlx4_port sysfs attribute file. * mlx4_change_port_types() is called to update port types. It invokes mlx4_unregister_device() to unregister the device which locks the intf_mutex and starts removing all associated interfaces. * Function mlx4_en_remove() gets invoked and starts destroying its first netdev. This triggers mlx4_en_netdev_event() which recognizes that the configured bond is broken. It runs mlx4_en_queue_bond_work() which takes a reference on the netdev. Removing the netdev now cannot proceed until the work is completed. * Work function mlx4_en_bond_work() gets scheduled. It calls mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the intf_mutex but that is not possible because it is held already by mlx4_unregister_device(). This particular case could be possibly solved by unregistering the mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it seems better to decouple mlx4_en more and break this reference order. Avoid then this scenario by recognizing that the bond reconfiguration operates only on a mlx4_dev. The logic to queue and execute the bond work can be moved into the mlx4_core driver. Only a reference on the respective mlx4_dev object is needed to be taken during the work's lifetime. This removes a call from mlx4_en that can directly result in needing to lock the intf_mutex, it remains a privilege of the core driver. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/device.h | 13 +++++++++++++ include/linux/mlx4/driver.h | 19 ------------------- 2 files changed, 13 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 6646634a0b9d..049d8a4b044d 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -1087,6 +1087,19 @@ static inline void *mlx4_buf_offset(struct mlx4_buf *buf, int offset) (offset & (PAGE_SIZE - 1)); } +static inline int mlx4_is_bonded(struct mlx4_dev *dev) +{ + return !!(dev->flags & MLX4_FLAG_BONDED); +} + +static inline int mlx4_is_mf_bonded(struct mlx4_dev *dev) +{ + return (mlx4_is_bonded(dev) && mlx4_is_mfunc(dev)); +} + +int mlx4_queue_bond_work(struct mlx4_dev *dev, int is_bonded, u8 v2p_p1, + u8 v2p_p2); + int mlx4_pd_alloc(struct mlx4_dev *dev, u32 *pdn); void mlx4_pd_free(struct mlx4_dev *dev, u32 pdn); int mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 0f8c9ba4c574..781d5a0c2faa 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -66,25 +66,6 @@ struct mlx4_interface { int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); -int mlx4_bond(struct mlx4_dev *dev); -int mlx4_unbond(struct mlx4_dev *dev); -static inline int mlx4_is_bonded(struct mlx4_dev *dev) -{ - return !!(dev->flags & MLX4_FLAG_BONDED); -} - -static inline int mlx4_is_mf_bonded(struct mlx4_dev *dev) -{ - return (mlx4_is_bonded(dev) && mlx4_is_mfunc(dev)); -} - -struct mlx4_port_map { - u8 port1; - u8 port2; -}; - -int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); - int mlx4_register_event_notifier(struct mlx4_dev *dev, struct notifier_block *nb); int mlx4_unregister_event_notifier(struct mlx4_dev *dev, -- cgit v1.2.3 From 8c2d2b87719bad9c1db4a7919e74f7818a8ca3de Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:22 +0200 Subject: mlx4: Register mlx4 devices to an auxiliary virtual bus Add an auxiliary virtual bus to model the mlx4 driver structure. The code is added along the current custom device management logic. Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and the old interface is then removed. Structure mlx4_priv gains a new adev dynamic array to keep track of its auxiliary devices. Access to the array is protected by the global mlx4_intf mutex. Functions mlx4_register_device() and mlx4_unregister_device() are updated to expose auxiliary devices on the bus in order to load mlx4_en and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and mlx4_unregister_auxiliary_driver() are added to substitute mlx4_register_interface() and mlx4_unregister_interface(), respectively. Function mlx4_do_bond() is adjusted to walk over the adev array and re-adds a specific auxiliary device if its driver sets the MLX4_INTFF_BONDING flag. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/device.h | 7 +++++++ include/linux/mlx4/driver.h | 11 +++++++++++ 2 files changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 049d8a4b044d..27f42f713c89 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -33,6 +33,7 @@ #ifndef MLX4_DEVICE_H #define MLX4_DEVICE_H +#include #include #include #include @@ -889,6 +890,12 @@ struct mlx4_dev { u8 uar_page_shift; }; +struct mlx4_adev { + struct auxiliary_device adev; + struct mlx4_dev *mdev; + int idx; +}; + struct mlx4_clock_params { u64 offset; u8 bar; diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 781d5a0c2faa..9cf157d381c6 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -34,9 +34,12 @@ #define MLX4_DRIVER_H #include +#include #include #include +#define MLX4_ADEV_NAME "mlx4_core" + struct mlx4_dev; #define MLX4_MAC_MASK 0xffffffffffffULL @@ -63,8 +66,16 @@ struct mlx4_interface { int flags; }; +struct mlx4_adrv { + struct auxiliary_driver adrv; + enum mlx4_protocol protocol; + int flags; +}; + int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +int mlx4_register_auxiliary_driver(struct mlx4_adrv *madrv); +void mlx4_unregister_auxiliary_driver(struct mlx4_adrv *madrv); int mlx4_register_event_notifier(struct mlx4_dev *dev, struct notifier_block *nb); -- cgit v1.2.3 From c138cdb89a14ce00d45e77d3db18263e4b9f9465 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:25 +0200 Subject: mlx4: Delete custom device management logic After the conversion to use the auxiliary bus, the custom device management is not needed anymore and can be deleted. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 9cf157d381c6..69825223081f 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,22 +58,12 @@ enum { MLX4_INTFF_BONDING = 1 << 0 }; -struct mlx4_interface { - void * (*add) (struct mlx4_dev *dev); - void (*remove)(struct mlx4_dev *dev, void *context); - struct list_head list; - enum mlx4_protocol protocol; - int flags; -}; - struct mlx4_adrv { struct auxiliary_driver adrv; enum mlx4_protocol protocol; int flags; }; -int mlx4_register_interface(struct mlx4_interface *intf); -void mlx4_unregister_interface(struct mlx4_interface *intf); int mlx4_register_auxiliary_driver(struct mlx4_adrv *madrv); void mlx4_unregister_auxiliary_driver(struct mlx4_adrv *madrv); -- cgit v1.2.3 From 1dfe3a5a7cefbe2162cecb759f3933baea22c393 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 21 Aug 2023 17:35:26 +0100 Subject: entry: Remove empty addr_limit_user_check() Back when set_fs() was a generic API for altering the address limit, addr_limit_user_check() was a safety measure to prevent userspace being able to issue syscalls with an unbound limit. With the the removal of set_fs() as a generic API, the last user of addr_limit_user_check() was removed in commit: b5a5a01d8e9a44ec ("arm64: uaccess: remove addr_limit_user_check()") ... as since that commit, no architecture defines TIF_FSCHECK, and hence addr_limit_user_check() always expands to nothing. Remove addr_limit_user_check(), updating the comment in exit_to_user_mode_prepare() to no longer refer to it. At the same time, the comment is reworded to be a little more generic so as to cover kmap_assert_nomap() in addition to lockdep_sys_exit(). No functional change. Signed-off-by: Mark Rutland Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20230821163526.2319443-1-mark.rutland@arm.com --- include/linux/syscalls.h | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 03e3d0121d5e..c4b9b66b0d05 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -283,22 +283,6 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) #define SYSCALL32_DEFINE6 SYSCALL_DEFINE6 #endif -/* - * Called before coming back to user-mode. Returning to user-mode with an - * address limit different than USER_DS can allow to overwrite kernel memory. - */ -static inline void addr_limit_user_check(void) -{ -#ifdef TIF_FSCHECK - if (!test_thread_flag(TIF_FSCHECK)) - return; -#endif - -#ifdef TIF_FSCHECK - clear_thread_flag(TIF_FSCHECK); -#endif -} - /* * These syscall function prototypes are kept in the same order as * include/uapi/asm-generic/unistd.h. Architecture specific entries go below, -- cgit v1.2.3 From 81e1d9a39569d315f747c2af19ce502cd08645ed Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 23 Aug 2023 14:27:42 +0100 Subject: nvmem: core: Return NULL when no nvmem layout is found Currently, of_nvmem_layout_get_container() returns NULL on error, or an error pointer if either CONFIG_NVMEM or CONFIG_OF is turned off. We should likely avoid this kind of mix for two reasons: to clarify the intend and anyway fix the !CONFIG_OF which will likely always if we use this helper somewhere else. Let's just return NULL when no layout is found, we don't need an error value here. Link: https://staticthinking.wordpress.com/2022/08/01/mixing-error-pointers-and-null/ Fixes: 266570f496b9 ("nvmem: core: introduce NVMEM layouts") Reported-by: kernel test robot Reported-by: Dan Carpenter Closes: https://lore.kernel.org/r/202308030002.DnSFOrMB-lkp@intel.com/ Signed-off-by: Miquel Raynal Reviewed-by: Michael Walle Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20230823132744.350618-21-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/nvmem-consumer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nvmem-consumer.h b/include/linux/nvmem-consumer.h index fa030d93b768..27373024856d 100644 --- a/include/linux/nvmem-consumer.h +++ b/include/linux/nvmem-consumer.h @@ -256,7 +256,7 @@ static inline struct nvmem_device *of_nvmem_device_get(struct device_node *np, static inline struct device_node * of_nvmem_layout_get_container(struct nvmem_device *nvmem) { - return ERR_PTR(-EOPNOTSUPP); + return NULL; } #endif /* CONFIG_NVMEM && CONFIG_OF */ -- cgit v1.2.3 From eb176cb46191f20314878222d8186106e23cb711 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 23 Aug 2023 14:27:44 +0100 Subject: nvmem: core: Notify when a new layout is registered Tell listeners a new layout was introduced and is now available. Signed-off-by: Miquel Raynal Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20230823132744.350618-23-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/nvmem-consumer.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvmem-consumer.h b/include/linux/nvmem-consumer.h index 27373024856d..4523e4e83319 100644 --- a/include/linux/nvmem-consumer.h +++ b/include/linux/nvmem-consumer.h @@ -43,6 +43,8 @@ enum { NVMEM_REMOVE, NVMEM_CELL_ADD, NVMEM_CELL_REMOVE, + NVMEM_LAYOUT_ADD, + NVMEM_LAYOUT_REMOVE, }; #if IS_ENABLED(CONFIG_NVMEM) -- cgit v1.2.3 From 61182c796d74f54ba66d17bac6f516183ec09af2 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Fri, 23 Jun 2023 11:43:14 -0400 Subject: SUNRPC: kmap() the xdr pages during decode If the pages are in HIGHMEM then we need to make sure they're mapped before trying to read data off of them, otherwise we could end up with a NULL pointer dereference. The downside to this is that we need an extra cleanup step at the end of decode to kunmap() the last page. I introduced an xdr_finish_decode() function to do this. Right now this function only calls the unmap_current_page() function, but other generic cleanup steps could be added in the future if we come across anything else. Reported-by: Krzysztof Kozlowski Signed-off-by: Anna Schumaker --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f89ec4b5ea16..adc844db1ea5 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -224,6 +224,7 @@ struct xdr_stream { struct kvec *iov; /* pointer to the current kvec */ struct kvec scratch; /* Scratch buffer */ struct page **page_ptr; /* pointer to the current page */ + void *page_kaddr; /* kmapped address of the current page */ unsigned int nwords; /* Remaining decode buffer length */ struct rpc_rqst *rqst; /* For debugging */ @@ -255,6 +256,7 @@ extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); extern void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, struct page **pages, unsigned int len); +extern void xdr_finish_decode(struct xdr_stream *xdr); extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes); extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); -- cgit v1.2.3 From b922bf04d2c1355633bdefbc2ed5fba1f0d4df07 Mon Sep 17 00:00:00 2001 From: Greg Ungerer Date: Tue, 11 Jul 2023 23:07:53 +1000 Subject: binfmt_elf_fdpic: support 64-bit systems The binfmt_flat_fdpic code has a number of 32-bit specific data structures associated with it. Extend it to be able to support and be used on 64-bit systems as well. The new code defines a number of key 64-bit variants of the core elf-fdpic data structures - along side the existing 32-bit sized ones. A common set of generic named structures are defined to be either the 32-bit or 64-bit ones as required at compile time. This is a similar technique to that used in the ELF binfmt loader. For example: elf_fdpic_loadseg is either elf32_fdpic_loadseg or elf64_fdpic_loadseg elf_fdpic_loadmap is either elf32_fdpic_loadmap or elf64_fdpic_loadmap the choice based on ELFCLASS32 or ELFCLASS64. Signed-off-by: Greg Ungerer Acked-by: Kees Cook Link: https://lore.kernel.org/r/20230711130754.481209-2-gerg@kernel.org Signed-off-by: Palmer Dabbelt --- include/linux/elf-fdpic.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/elf-fdpic.h b/include/linux/elf-fdpic.h index 3bea95a1af53..e533f4513194 100644 --- a/include/linux/elf-fdpic.h +++ b/include/linux/elf-fdpic.h @@ -10,13 +10,25 @@ #include +#if ELF_CLASS == ELFCLASS32 +#define Elf_Sword Elf32_Sword +#define elf_fdpic_loadseg elf32_fdpic_loadseg +#define elf_fdpic_loadmap elf32_fdpic_loadmap +#define ELF_FDPIC_LOADMAP_VERSION ELF32_FDPIC_LOADMAP_VERSION +#else +#define Elf_Sword Elf64_Sxword +#define elf_fdpic_loadmap elf64_fdpic_loadmap +#define elf_fdpic_loadseg elf64_fdpic_loadseg +#define ELF_FDPIC_LOADMAP_VERSION ELF64_FDPIC_LOADMAP_VERSION +#endif + /* * binfmt binary parameters structure */ struct elf_fdpic_params { struct elfhdr hdr; /* ref copy of ELF header */ struct elf_phdr *phdrs; /* ref copy of PT_PHDR table */ - struct elf32_fdpic_loadmap *loadmap; /* loadmap to be passed to userspace */ + struct elf_fdpic_loadmap *loadmap; /* loadmap to be passed to userspace */ unsigned long elfhdr_addr; /* mapped ELF header user address */ unsigned long ph_addr; /* mapped PT_PHDR user address */ unsigned long map_addr; /* mapped loadmap user address */ -- cgit v1.2.3 From 0215845348fd09a0125da3d9d1a1e2476b49cd70 Mon Sep 17 00:00:00 2001 From: Sui Jingfeng Date: Wed, 9 Aug 2023 06:34:12 +0800 Subject: PCI/VGA: Replace full MIT license text with SPDX identifier Per Documentation/process/license-rules.rst, the SPDX MIT identifier is equivalent to including the entire MIT license text from LICENSES/preferred/MIT. Replace the MIT license text with the equivalent SPDX identifier. Link: https://lore.kernel.org/r/20230808223412.1743176-12-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng Signed-off-by: Bjorn Helgaas Reviewed-by: Andi Shyti --- include/linux/vgaarb.h | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index b4b9137f9792..ed5e5462f1d2 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -1,3 +1,5 @@ +/* SPDX-License-Identifier: MIT */ + /* * The VGA aribiter manages VGA space routing and VGA resource decode to * allow multiple VGA devices to be used in a system in a safe way. @@ -5,27 +7,6 @@ * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS - * IN THE SOFTWARE. - * */ #ifndef LINUX_VGA_H -- cgit v1.2.3 From 69dd3b3930f96b624228000921f417fb0919a6ab Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 25 Aug 2022 09:31:05 -0400 Subject: libceph: add CEPH_OSD_OP_ASSERT_VER support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ...and record the user_version in the reply in a new field in ceph_osd_request, so we can populate the assert_ver appropriately. Shuffle the fields a bit too so that the new field fits in an existing hole on x86_64. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-and-tested-by: Luís Henriques Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/osd_client.h | 6 +++++- include/linux/ceph/rados.h | 4 ++++ 2 files changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 8f5d2b5bbba2..bf9823956758 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -198,6 +198,9 @@ struct ceph_osd_req_op { u32 src_fadvise_flags; struct ceph_osd_data osd_data; } copy_from; + struct { + u64 ver; + } assert_ver; }; }; @@ -252,6 +255,7 @@ struct ceph_osd_request { struct ceph_osd_client *r_osdc; struct kref r_kref; bool r_mempool; + bool r_linger; /* don't resend on failure */ struct completion r_completion; /* private to osd_client.c */ ceph_osdc_callback_t r_callback; @@ -264,9 +268,9 @@ struct ceph_osd_request { struct ceph_snap_context *r_snapc; /* for writes */ struct timespec64 r_mtime; /* ditto */ u64 r_data_offset; /* ditto */ - bool r_linger; /* don't resend on failure */ /* internal */ + u64 r_version; /* data version sent in reply */ unsigned long r_stamp; /* jiffies, send or check time */ unsigned long r_start_stamp; /* jiffies */ ktime_t r_start_latency; /* ktime_t */ diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h index 43a7a1573b51..73c3efbec36c 100644 --- a/include/linux/ceph/rados.h +++ b/include/linux/ceph/rados.h @@ -523,6 +523,10 @@ struct ceph_osd_op { struct { __le64 cookie; } __attribute__ ((packed)) notify; + struct { + __le64 unused; + __le64 ver; + } __attribute__ ((packed)) assert_ver; struct { __le64 offset, length; __le64 src_offset; -- cgit v1.2.3 From e87cf8a28e7592bd19064e8181324ae26bc02932 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 30 Jun 2023 12:46:53 +0300 Subject: SUNRPC: clean up integer overflow check This integer overflow check works as intended but Clang and GCC and warn about it when compiling with W=1. include/linux/sunrpc/xdr.h:539:17: error: comparison is always false due to limited range of data type [-Werror=type-limits] Use size_mul() to prevent the integer overflow. It silences the warning and it's cleaner as well. Reported-by: Dmitry Antipov Closes: https://lore.kernel.org/all/20230601143332.255312-1-dmantipov@yandex.ru/ Signed-off-by: Dan Carpenter Acked-by: Jeff Layton Signed-off-by: Anna Schumaker --- include/linux/sunrpc/xdr.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index adc844db1ea5..68915180a29c 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -777,9 +777,7 @@ xdr_stream_decode_uint32_array(struct xdr_stream *xdr, if (unlikely(xdr_stream_decode_u32(xdr, &len) < 0)) return -EBADMSG; - if (len > SIZE_MAX / sizeof(*p)) - return -EBADMSG; - p = xdr_inline_decode(xdr, len * sizeof(*p)); + p = xdr_inline_decode(xdr, size_mul(len, sizeof(*p))); if (unlikely(!p)) return -EBADMSG; if (array == NULL) -- cgit v1.2.3 From d2ee413884cdbdfcfc1560526615519311a47d33 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 19 Aug 2023 17:32:23 -0400 Subject: SUNRPC: Allow specification of TCP client connect timeout at setup When we create a TCP transport, the connect timeout parameters are currently fixed to be 90s. This is problematic in the pNFS flexfiles case, where we may have multiple mirrors, and we would like to fail over quickly to the next mirror if a data server is down. This patch adds the ability to specify the connection parameters at RPC client creation time. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/sunrpc/clnt.h | 2 ++ include/linux/sunrpc/xprt.h | 2 ++ 2 files changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 4f41d839face..af7358277f1c 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -148,6 +148,8 @@ struct rpc_create_args { const struct cred *cred; unsigned int max_connect; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct rpc_add_xprt_test { diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index b52411bcfe4e..4ecc89301eb7 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -351,6 +351,8 @@ struct xprt_create { struct rpc_xprt_switch *bc_xps; unsigned int flags; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct xprt_class { -- cgit v1.2.3 From cc64ca4b62f5035c87e955f256b338c32dfdbb19 Mon Sep 17 00:00:00 2001 From: Sui Jingfeng Date: Wed, 9 Aug 2023 06:34:07 +0800 Subject: PCI/VGA: Fix typos Fix typos, rewrap to fill 78 columns, convert to conventional multi-line style. [bhelgaas: squash and add more fixes] Link: https://lore.kernel.org/r/20230808223412.1743176-7-sui.jingfeng@linux.dev Link: https://lore.kernel.org/r/20230808223412.1743176-9-sui.jingfeng@linux.dev Link: https://lore.kernel.org/r/20230808223412.1743176-10-sui.jingfeng@linux.dev Link: https://lore.kernel.org/r/20230808223412.1743176-11-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng Signed-off-by: Bjorn Helgaas --- include/linux/vgaarb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index ed5e5462f1d2..97129a1bbb7d 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -77,7 +77,7 @@ static inline int vga_client_register(struct pci_dev *pdev, static inline int vga_get_interruptible(struct pci_dev *pdev, unsigned int rsrc) { - return vga_get(pdev, rsrc, 1); + return vga_get(pdev, rsrc, 1); } /** @@ -92,7 +92,7 @@ static inline int vga_get_interruptible(struct pci_dev *pdev, static inline int vga_get_uninterruptible(struct pci_dev *pdev, unsigned int rsrc) { - return vga_get(pdev, rsrc, 0); + return vga_get(pdev, rsrc, 0); } static inline void vga_client_unregister(struct pci_dev *pdev) -- cgit v1.2.3 From 59da9885767a75df697c84c06aaf2296e10d85a4 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 23 Aug 2023 10:56:32 +0200 Subject: net: dsa: use capital "OR" for multiple licenses in SPDX Documentation/process/license-rules.rst and checkpatch expect the SPDX identifier syntax for multiple licenses to use capital "OR". Correct it to keep consistent format and avoid copy-paste issues. Signed-off-by: Krzysztof Kozlowski Reviewed-by: Kurt Kanzenbach Reviewed-by: FLorian Fainelli Link: https://lore.kernel.org/r/20230823085632.116725-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski --- include/linux/platform_data/hirschmann-hellcreek.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/hirschmann-hellcreek.h b/include/linux/platform_data/hirschmann-hellcreek.h index 6a000df5541f..8748680e9e3c 100644 --- a/include/linux/platform_data/hirschmann-hellcreek.h +++ b/include/linux/platform_data/hirschmann-hellcreek.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: (GPL-2.0 or MIT) */ +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ /* * Hirschmann Hellcreek TSN switch platform data. * -- cgit v1.2.3 From ea91512ded9963d8ca98eab67d7d182a989813d9 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 11 Aug 2023 17:59:33 +0800 Subject: PCI: Remove unused function declarations The following declarations have never been implemented since the beginning of git history, so remove them: u8 acpiphp_get_attention_status(struct acpiphp_slot *slot); u8 cpci_get_latch_status(struct slot *slot); u8 cpci_get_adapter_status(struct slot *slot); int ibmphp_get_total_hp_slots(void); void ibmphp_free_ibm_slot(struct slot *); void pdev_enable_device(struct pci_dev *); Link: https://lore.kernel.org/r/20230811095933.28652-1-yuehaibing@huawei.com Signed-off-by: Yue Haibing Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index b4eae9f2e1fc..7d81aff09153 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1403,7 +1403,6 @@ void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge); void pci_assign_unassigned_bus_resources(struct pci_bus *bus); void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus); int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type); -void pdev_enable_device(struct pci_dev *); int pci_enable_resources(struct pci_dev *, int mask); void pci_assign_irq(struct pci_dev *dev); struct resource *pci_find_resource(struct pci_dev *dev, struct resource *res); -- cgit v1.2.3 From b24c5d752962fa0970cd7e3d74b1cd0e843358de Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:25 +0100 Subject: io_uring: simplify big_cqe handling Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid of ugly REQ_F_CQE32_INIT handling. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/447aa1b2968978c99e655ba88db536e903df0fe9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index f04ce513fadb..9795eda529f7 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -409,7 +409,6 @@ enum { REQ_F_SINGLE_POLL_BIT, REQ_F_DOUBLE_POLL_BIT, REQ_F_PARTIAL_IO_BIT, - REQ_F_CQE32_INIT_BIT, REQ_F_APOLL_MULTISHOT_BIT, REQ_F_CLEAR_POLLIN_BIT, REQ_F_HASH_LOCKED_BIT, @@ -479,8 +478,6 @@ enum { REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), - /* ->extra1 and ->extra2 are initialised */ - REQ_F_CQE32_INIT = BIT(REQ_F_CQE32_INIT_BIT), /* recvmsg special flag, clear EPOLLIN */ REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ @@ -579,13 +576,7 @@ struct io_kiocb { struct io_task_work io_task_work; unsigned nr_tw; /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ - union { - struct hlist_node hash_node; - struct { - u64 extra1; - u64 extra2; - }; - }; + struct hlist_node hash_node; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ @@ -595,6 +586,11 @@ struct io_kiocb { /* custom credentials, valid IFF REQ_F_CREDS is set */ const struct cred *creds; struct io_wq_work work; + + struct { + u64 extra1; + u64 extra2; + } big_cqe; }; struct io_overflow_cqe { -- cgit v1.2.3 From ec26c225f06f5993f8891fa6c79fab3c92981181 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:29 +0100 Subject: io_uring: merge iopoll and normal completion paths io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and make iopoll use __io_submit_flush_completions(), which also helps with inlining and other optimisations. For that, we need to first find all completed iopoll requests and splice them from the iopoll list and then pass it down. This adds one extra list traversal, which should be fine as requests will stay hot in cache. CQ locking is already conditional, introduce ->lockless_cq and skip locking for IOPOLL as it's protected by ->uring_lock. We also add a wakeup optimisation for IOPOLL to __io_cq_unlock_post(), so it works just like io_cqring_ev_posted_iopoll(). Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/3840473f5e8a960de35b77292026691880f6bdbc.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 9795eda529f7..c0c03d8059df 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -205,6 +205,7 @@ struct io_ring_ctx { unsigned int has_evfd: 1; /* all CQEs should be posted only by the submitter task */ unsigned int task_complete: 1; + unsigned int lockless_cq: 1; unsigned int syscall_iopoll: 1; unsigned int poll_activated: 1; unsigned int drain_disabled: 1; -- cgit v1.2.3 From e5598d6ae62626d261b046a2f19347c38681ff51 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:31 +0100 Subject: io_uring: compact SQ/CQ heads/tails Queues heads and tails cache line aligned. That makes sq, cq taking 4 lines or 5 lines if we include the rest of struct io_rings (e.g. sq_flags is frequently accessed). Since modern io_uring is mostly single threaded, it doesn't make much send to spread them as such, it wastes space and puts additional pressure on caches. Put them all into a single line. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/9c8deddf9a7ed32069235a530d1e117fb460bc4c.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c0c03d8059df..608a8e80e881 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -69,8 +69,8 @@ struct io_uring_task { }; struct io_uring { - u32 head ____cacheline_aligned_in_smp; - u32 tail ____cacheline_aligned_in_smp; + u32 head; + u32 tail; }; /* -- cgit v1.2.3 From d7f06fea5d6be78403d42c9637f67bc883870094 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:33 +0100 Subject: io_uring: move non aligned field to the end Move not cache aligned fields down in io_ring_ctx, should change anything, but makes further refactoring easier. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/518e95d7888e9d481b2c5968dcf3f23db9ea47a5.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 608a8e80e881..ad87d6074fb2 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -270,24 +270,6 @@ struct io_ring_ctx { struct io_alloc_cache netmsg_cache; } ____cacheline_aligned_in_smp; - /* IRQ completion list, under ->completion_lock */ - struct io_wq_work_list locked_free_list; - unsigned int locked_free_nr; - - const struct cred *sq_creds; /* cred used for __io_sq_thread() */ - struct io_sq_data *sq_data; /* if using sq thread polling */ - - struct wait_queue_head sqo_sq_wait; - struct list_head sqd_list; - - unsigned long check_cq; - - unsigned int file_alloc_start; - unsigned int file_alloc_end; - - struct xarray personalities; - u32 pers_next; - struct { /* * We cache a range of free CQEs we can use, once exhausted it @@ -332,6 +314,24 @@ struct io_ring_ctx { unsigned cq_last_tm_flush; } ____cacheline_aligned_in_smp; + /* IRQ completion list, under ->completion_lock */ + struct io_wq_work_list locked_free_list; + unsigned int locked_free_nr; + + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ + struct io_sq_data *sq_data; /* if using sq thread polling */ + + struct wait_queue_head sqo_sq_wait; + struct list_head sqd_list; + + unsigned long check_cq; + + unsigned int file_alloc_start; + unsigned int file_alloc_end; + + struct xarray personalities; + u32 pers_next; + /* Keep this last, we don't need it for the fast path */ struct wait_queue_head poll_wq; struct io_restriction restrictions; -- cgit v1.2.3 From 18df385f42f0b3310ed2e4a3e39264bf5e784692 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:34 +0100 Subject: io_uring: banish non-hot data to end of io_ring_ctx Let's move all slow path, setup/init and so on fields to the end of io_ring_ctx, that makes ctx reorganisation later easier. That includes, page arrays used only on tear down, CQ overflow list, old provided buffer caches and used by io-wq poll hashes. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/fc471b63925a0bf90a34943c4d36163c523cfb43.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index ad87d6074fb2..72e609752323 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -211,20 +211,11 @@ struct io_ring_ctx { unsigned int drain_disabled: 1; unsigned int compat: 1; - enum task_work_notify_mode notify_method; + struct task_struct *submitter_task; + struct io_rings *rings; + struct percpu_ref refs; - /* - * If IORING_SETUP_NO_MMAP is used, then the below holds - * the gup'ed pages for the two rings, and the sqes. - */ - unsigned short n_ring_pages; - unsigned short n_sqe_pages; - struct page **ring_pages; - struct page **sqe_pages; - - struct io_rings *rings; - struct task_struct *submitter_task; - struct percpu_ref refs; + enum task_work_notify_mode notify_method; } ____cacheline_aligned_in_smp; /* submission data */ @@ -262,10 +253,8 @@ struct io_ring_ctx { struct io_buffer_list *io_bl; struct xarray io_bl_xa; - struct list_head io_buffers_cache; struct io_hash_table cancel_table_locked; - struct list_head cq_overflow_list; struct io_alloc_cache apoll_cache; struct io_alloc_cache netmsg_cache; } ____cacheline_aligned_in_smp; @@ -298,11 +287,8 @@ struct io_ring_ctx { * manipulate the list, hence no extra locking is needed there. */ struct io_wq_work_list iopoll_list; - struct io_hash_table cancel_table; struct llist_head work_llist; - - struct list_head io_buffers_comp; } ____cacheline_aligned_in_smp; /* timeouts */ @@ -318,6 +304,10 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + struct list_head io_buffers_comp; + struct list_head cq_overflow_list; + struct io_hash_table cancel_table; + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ @@ -332,6 +322,8 @@ struct io_ring_ctx { struct xarray personalities; u32 pers_next; + struct list_head io_buffers_cache; + /* Keep this last, we don't need it for the fast path */ struct wait_queue_head poll_wq; struct io_restriction restrictions; @@ -375,6 +367,15 @@ struct io_ring_ctx { unsigned sq_thread_idle; /* protected by ->completion_lock */ unsigned evfd_last_cq_tail; + + /* + * If IORING_SETUP_NO_MMAP is used, then the below holds + * the gup'ed pages for the two rings, and the sqes. + */ + unsigned short n_ring_pages; + unsigned short n_sqe_pages; + struct page **ring_pages; + struct page **sqe_pages; }; struct io_tw_state { -- cgit v1.2.3 From c9def23dde5238184777340ad811e4903f216a2d Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:35 +0100 Subject: io_uring: separate task_work/waiting cache line task_work's are typically queued up from IRQ/softirq potentially by a random CPU like in case of networking. Batch ctx fields bouncing as this into a separate cache line. We also move ->cq_timeouts there because waiters have to read and check it. We can also conditionally hide ->cq_timeouts in the future from the CQ wait path as a not really useful rudiment. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/b7f3fcb5b6b9cca0238778262c1fdb7ada6286b7.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 72e609752323..5de5dffe29df 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -270,15 +270,25 @@ struct io_ring_ctx { unsigned cached_cq_tail; unsigned cq_entries; struct io_ev_fd __rcu *io_ev_fd; - struct wait_queue_head cq_wait; unsigned cq_extra; } ____cacheline_aligned_in_smp; + /* + * task_work and async notification delivery cacheline. Expected to + * regularly bounce b/w CPUs. + */ + struct { + struct llist_head work_llist; + unsigned long check_cq; + atomic_t cq_wait_nr; + atomic_t cq_timeouts; + struct wait_queue_head cq_wait; + } ____cacheline_aligned_in_smp; + struct { spinlock_t completion_lock; bool poll_multi_queue; - atomic_t cq_wait_nr; /* * ->iopoll_list is protected by the ctx->uring_lock for @@ -287,14 +297,11 @@ struct io_ring_ctx { * manipulate the list, hence no extra locking is needed there. */ struct io_wq_work_list iopoll_list; - - struct llist_head work_llist; } ____cacheline_aligned_in_smp; /* timeouts */ struct { spinlock_t timeout_lock; - atomic_t cq_timeouts; struct list_head timeout_list; struct list_head ltimeout_list; unsigned cq_last_tm_flush; @@ -314,8 +321,6 @@ struct io_ring_ctx { struct wait_queue_head sqo_sq_wait; struct list_head sqd_list; - unsigned long check_cq; - unsigned int file_alloc_start; unsigned int file_alloc_end; -- cgit v1.2.3 From 0aa7aa5f766933d4f91b22d9658cd688e1f15dab Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:36 +0100 Subject: io_uring: move multishot cqe cache in ctx We cache multishot CQEs before flushing them to the CQ in submit_state.cqe. It's a 16 entry cache totalling 256 bytes in the middle of the io_submit_state structure. Move it out of there, it should help with CPU caches for the submission state, and shouldn't affect cached CQEs. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/dbe1f39c043ee23da918836be44fcec252ce6711.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 5de5dffe29df..01bdbc223edd 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -176,7 +176,6 @@ struct io_submit_state { unsigned short submit_nr; unsigned int cqes_count; struct blk_plug plug; - struct io_uring_cqe cqes[16]; }; struct io_ev_fd { @@ -307,6 +306,8 @@ struct io_ring_ctx { unsigned cq_last_tm_flush; } ____cacheline_aligned_in_smp; + struct io_uring_cqe completion_cqes[16]; + /* IRQ completion list, under ->completion_lock */ struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; -- cgit v1.2.3 From 644c4a7a721fb90356cdd42219c9928a3c386230 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 24 Aug 2023 23:53:37 +0100 Subject: io_uring: move iopoll ctx fields around Move poll_multi_queue and iopoll_list to the submission cache line, it doesn't make much sense to keep them separately, and is better place for it in general. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/5b03cf7e6652e350e6e70a917eec72ba9f33b97b.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 01bdbc223edd..13d19b9be9f4 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -256,6 +256,15 @@ struct io_ring_ctx { struct io_hash_table cancel_table_locked; struct io_alloc_cache apoll_cache; struct io_alloc_cache netmsg_cache; + + /* + * ->iopoll_list is protected by the ctx->uring_lock for + * io_uring instances that don't use IORING_SETUP_SQPOLL. + * For SQPOLL, only the single threaded io_sq_thread() will + * manipulate the list, hence no extra locking is needed there. + */ + struct io_wq_work_list iopoll_list; + bool poll_multi_queue; } ____cacheline_aligned_in_smp; struct { @@ -284,20 +293,6 @@ struct io_ring_ctx { struct wait_queue_head cq_wait; } ____cacheline_aligned_in_smp; - struct { - spinlock_t completion_lock; - - bool poll_multi_queue; - - /* - * ->iopoll_list is protected by the ctx->uring_lock for - * io_uring instances that don't use IORING_SETUP_SQPOLL. - * For SQPOLL, only the single threaded io_sq_thread() will - * manipulate the list, hence no extra locking is needed there. - */ - struct io_wq_work_list iopoll_list; - } ____cacheline_aligned_in_smp; - /* timeouts */ struct { spinlock_t timeout_lock; @@ -308,6 +303,8 @@ struct io_ring_ctx { struct io_uring_cqe completion_cqes[16]; + spinlock_t completion_lock; + /* IRQ completion list, under ->completion_lock */ struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; -- cgit v1.2.3 From 7a32b58be9bab8f0440a4af526bfb1269e5affdb Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 30 Jun 2023 14:19:53 -0700 Subject: mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED VM_FAULT_RESULT_TRACE should contain an element for every vm_fault_reason to be used as flag_array inside trace_print_flags_seq(). The element for VM_FAULT_COMPLETED is missing, add it. Link: https://lkml.kernel.org/r/20230630211957.1341547-3-surenb@google.com Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu Cc: Alistair Popple Cc: Al Viro Cc: Christian Brauner Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: Hillf Danton Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Jan Kara Cc: Johannes Weiner Cc: Josef Bacik Cc: Laurent Dufour Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Michal Hocko Cc: Michel Lespinasse Cc: Minchan Kim Cc: Pavel Tatashin Cc: Punit Agrawal Cc: Vlastimil Babka Cc: Yu Zhao Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 06e9315bfe7c..2b9d8be28361 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1184,7 +1184,8 @@ enum vm_fault_reason { { VM_FAULT_RETRY, "RETRY" }, \ { VM_FAULT_FALLBACK, "FALLBACK" }, \ { VM_FAULT_DONE_COW, "DONE_COW" }, \ - { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" } + { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" }, \ + { VM_FAULT_COMPLETED, "COMPLETED" } struct vm_special_mapping { const char *name; /* The name, e.g. "[vdso]". */ -- cgit v1.2.3 From fdc724d6aa44efd75cc9b6a3c3900baac44bc50a Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 30 Jun 2023 14:19:55 -0700 Subject: mm: change folio_lock_or_retry to use vm_fault directly Change folio_lock_or_retry to accept vm_fault struct and return the vm_fault_t directly. Link: https://lkml.kernel.org/r/20230630211957.1341547-5-surenb@google.com Signed-off-by: Suren Baghdasaryan Suggested-by: Matthew Wilcox Acked-by: Peter Xu Cc: Alistair Popple Cc: Al Viro Cc: Christian Brauner Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: Hillf Danton Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Jan Kara Cc: Johannes Weiner Cc: Josef Bacik Cc: Laurent Dufour Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Michel Lespinasse Cc: Minchan Kim Cc: Pavel Tatashin Cc: Punit Agrawal Cc: Vlastimil Babka Cc: Yu Zhao Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f4f24b594cd7..437e4526028c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -916,8 +916,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page, void __folio_lock(struct folio *folio); int __folio_lock_killable(struct folio *folio); -bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, - unsigned int flags); +vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf); void unlock_page(struct page *page); void folio_unlock(struct folio *folio); @@ -1021,11 +1020,13 @@ static inline int folio_lock_killable(struct folio *folio) * Return value and mmap_lock implications depend on flags; see * __folio_lock_or_retry(). */ -static inline bool folio_lock_or_retry(struct folio *folio, - struct mm_struct *mm, unsigned int flags) +static inline vm_fault_t folio_lock_or_retry(struct folio *folio, + struct vm_fault *vmf) { might_sleep(); - return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags); + if (!folio_trylock(folio)) + return __folio_lock_or_retry(folio, vmf); + return 0; } /* -- cgit v1.2.3 From 1235ccd05b6dd6970ff50baea99aa994023fbc4a Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 30 Jun 2023 14:19:56 -0700 Subject: mm: handle swap page faults under per-VMA lock When page fault is handled under per-VMA lock protection, all swap page faults are retried with mmap_lock because folio_lock_or_retry has to drop and reacquire mmap_lock if folio could not be immediately locked. Follow the same pattern as mmap_lock to drop per-VMA lock when waiting for folio and retrying once folio is available. With this obstacle removed, enable do_swap_page to operate under per-VMA lock protection. Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, therefore we have to fall back to mmap_lock in that particular case. Note that the only time do_swap_page calls synchronous swap_readpage is when SWP_SYNCHRONOUS_IO is set, which is only set for QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and pmem). Therefore we don't sleep in this path, and there's no need to drop the mmap or per-VMA lock. Link: https://lkml.kernel.org/r/20230630211957.1341547-6-surenb@google.com Signed-off-by: Suren Baghdasaryan Tested-by: Alistair Popple Reviewed-by: Alistair Popple Acked-by: Peter Xu Cc: Al Viro Cc: Christian Brauner Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: Hillf Danton Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Jan Kara Cc: Johannes Weiner Cc: Josef Bacik Cc: Laurent Dufour Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Michal Hocko Cc: Michel Lespinasse Cc: Minchan Kim Cc: Pavel Tatashin Cc: Punit Agrawal Cc: Vlastimil Babka Cc: Yu Zhao Signed-off-by: Andrew Morton --- include/linux/mm.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 939386e0aeda..0d16208178c7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -729,6 +729,14 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) vma->detached = detached; } +static inline void release_fault_lock(struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(vmf->vma->vm_mm); +} + struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); @@ -749,6 +757,11 @@ static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, return NULL; } +static inline void release_fault_lock(struct vm_fault *vmf) +{ + mmap_read_unlock(vmf->vma->vm_mm); +} + #endif /* CONFIG_PER_VMA_LOCK */ extern const struct vm_operations_struct vma_dummy_vm_ops; -- cgit v1.2.3 From 29a22b9e08d70d6c9b075c12c47b6e895cb65cf0 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 30 Jun 2023 14:19:57 -0700 Subject: mm: handle userfaults under VMA lock Enable handle_userfault to operate under VMA lock by releasing VMA lock instead of mmap_lock and retrying. Note that FAULT_FLAG_RETRY_NOWAIT should never be used when handling faults under per-VMA lock protection because that would break the assumption that lock is dropped on retry. [surenb@google.com: fix a lockdep issue in vma_assert_write_locked] Link: https://lkml.kernel.org/r/20230712195652.969194-1-surenb@google.com Link: https://lkml.kernel.org/r/20230630211957.1341547-7-surenb@google.com Signed-off-by: Suren Baghdasaryan Acked-by: Peter Xu Cc: Alistair Popple Cc: Al Viro Cc: Christian Brauner Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: Hillf Danton Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Jan Kara Cc: Johannes Weiner Cc: Josef Bacik Cc: Laurent Dufour Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Michal Hocko Cc: Michel Lespinasse Cc: Minchan Kim Cc: Pavel Tatashin Cc: Punit Agrawal Cc: Vlastimil Babka Cc: Yu Zhao Signed-off-by: Andrew Morton --- include/linux/mm.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0d16208178c7..c1db400e83cb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -679,6 +679,7 @@ static inline void vma_end_read(struct vm_area_struct *vma) rcu_read_unlock(); } +/* WARNING! Can only be used if mmap_lock is expected to be write-locked */ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) { mmap_assert_write_locked(vma->vm_mm); @@ -721,6 +722,12 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma); } +static inline void vma_assert_locked(struct vm_area_struct *vma) +{ + if (!rwsem_is_locked(&vma->vm_lock->lock)) + vma_assert_write_locked(vma); +} + static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) { /* When detaching vma should be write-locked */ @@ -737,6 +744,14 @@ static inline void release_fault_lock(struct vm_fault *vmf) mmap_read_unlock(vmf->vma->vm_mm); } +static inline void assert_fault_locked(struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_assert_locked(vmf->vma); + else + mmap_assert_locked(vmf->vma->vm_mm); +} + struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); @@ -762,6 +777,11 @@ static inline void release_fault_lock(struct vm_fault *vmf) mmap_read_unlock(vmf->vma->vm_mm); } +static inline void assert_fault_locked(struct vm_fault *vmf) +{ + mmap_assert_locked(vmf->vma->vm_mm); +} + #endif /* CONFIG_PER_VMA_LOCK */ extern const struct vm_operations_struct vma_dummy_vm_ops; -- cgit v1.2.3 From f82e6bf9bb9b6e1dc001320a88eee67d7ac31e96 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Wed, 26 Jul 2023 15:32:23 +0000 Subject: mm: memcg: use rstat for non-hierarchical stats Currently, memcg uses rstat to maintain aggregated hierarchical stats. Counters are maintained for hierarchical stats at each memcg. Rstat tracks which cgroups have updates on which cpus to keep those counters fresh on the read-side. Non-hierarchical stats are currently not covered by rstat. Their per-cpu counters are summed up on every read, which is expensive. The original implementation did the same. At some point before rstat, non-hierarchical aggregated counters were introduced by commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting"). However, those counters were updated on the performance critical write-side, which caused regressions, so they were later removed by commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"). See [1] for more detailed history. Kernel versions in between a983b5ebee57 & 815744d75152 (a year and a half) enjoyed cheap reads of non-hierarchical stats, specifically on cgroup v1. When moving to more recent kernels, a performance regression for reading non-hierarchical stats is observed. Now that we have rstat, we know exactly which percpu counters have updates for each stat. We can maintain non-hierarchical counters again, making reads much more efficient, without affecting the performance critical write-side. Hence, add non-hierarchical (i.e local) counters for the stats, and extend rstat flushing to keep those up-to-date. A caveat is that we now need a stats flush before reading local/non-hierarchical stats through {memcg/lruvec}_page_state_local() or memcg_events_local(), where we previously only needed a flush to read hierarchical stats. Most contexts reading non-hierarchical stats are already doing a flush, add a flush to the only missing context in count_shadow_nodes(). With this patch, reading memory.stat from 1000 memcgs is 3x faster on a machine with 256 cpus on cgroup v1: # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done # time cat /sys/fs/cgroup/memory/cg*/memory.stat > /dev/null real 0m0.125s user 0m0.005s sys 0m0.120s After: real 0m0.032s user 0m0.005s sys 0m0.027s To make sure there are no regressions on cgroup v2, I ran an artificial reclaim/refault stress test [2] that creates (NR_CPUS * 2) cgroups, assigns them limits, runs a worker process in each cgroup that allocates tmpfs memory equal to quadruple the limit (to invoke reclaim continuously), and then reads back the entire file (to invoke refaults). All workers are run in parallel, and zram is used as a swapping backend. Both reclaim and refault have conditional stats flushing. I ran this on a machine with 112 cpus, once on mm-unstable, and once on mm-unstable with this patch reverted. (1) A few runs without this patch: # time ./stress_reclaim_refault.sh real 0m9.949s user 0m0.496s sys 14m44.974s # time ./stress_reclaim_refault.sh real 0m10.049s user 0m0.486s sys 14m55.791s # time ./stress_reclaim_refault.sh real 0m9.984s user 0m0.481s sys 14m53.841s (2) A few runs with this patch: # time ./stress_reclaim_refault.sh real 0m9.885s user 0m0.486s sys 14m48.753s # time ./stress_reclaim_refault.sh real 0m9.903s user 0m0.495s sys 14m48.339s # time ./stress_reclaim_refault.sh real 0m9.861s user 0m0.507s sys 14m49.317s No regressions are observed with this patch. There is actually a very slight improvement. If I have to guess, maybe it's because we avoid the percpu loop in count_shadow_nodes() when calling lruvec_page_state_local(), but I could not prove this using perf, it's probably in the noise. [1] https://lore.kernel.org/lkml/20230725201811.GA1231514@cmpxchg.org/ [2] https://lore.kernel.org/lkml/CAJD7tkb17x=qwoO37uxyYXLEUVp15BQKR+Xfh7Sg9Hx-wTQ_=w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230803185046.1385770-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230726153223.821757-2-yosryahmed@google.com Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner Acked-by: Roman Gushchin Acked-by: Michal Hocko Cc: Muchun Song Cc: Shakeel Butt Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 163004ae3349..11810a2cfd2d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -111,6 +111,9 @@ struct lruvec_stats { /* Aggregated (CPU and subtree) state */ long state[NR_VM_NODE_STAT_ITEMS]; + /* Non-hierarchical (CPU aggregated) state */ + long state_local[NR_VM_NODE_STAT_ITEMS]; + /* Pending child counts during tree propagation */ long state_pending[NR_VM_NODE_STAT_ITEMS]; }; @@ -1018,14 +1021,12 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, { struct mem_cgroup_per_node *pn; long x = 0; - int cpu; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - for_each_possible_cpu(cpu) - x += per_cpu(pn->lruvec_stats_percpu->state[idx], cpu); + x = READ_ONCE(pn->lruvec_stats.state_local[idx]); #ifdef CONFIG_SMP if (x < 0) x = 0; -- cgit v1.2.3 From f9bff0e31881d03badf191d3b0005839391f5f2b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:29 +0100 Subject: minmax: add in_range() macro Patch series "New page table range API", v6. This patchset changes the API used by the MM to set up page table entries. The four APIs are: set_ptes(mm, addr, ptep, pte, nr) update_mmu_cache_range(vma, addr, ptep, nr) flush_dcache_folio(folio) flush_icache_pages(vma, page, nr) flush_dcache_folio() isn't technically new, but no architecture implemented it, so I've done that for them. The old APIs remain around but are mostly implemented by calling the new interfaces. The new APIs are based around setting up N page table entries at once. The N entries belong to the same PMD, the same folio and the same VMA, so ptep++ is a legitimate operation, and locking is taken care of for you. Some architectures can do a better job of it than just a loop, but I have hesitated to make too deep a change to architectures I don't understand well. One thing I have changed in every architecture is that PG_arch_1 is now a per-folio bit instead of a per-page bit when used for dcache clean/dirty tracking. This was something that would have to happen eventually, and it makes sense to do it now rather than iterate over every page involved in a cache flush and figure out if it needs to happen. The point of all this is better performance, and Fengwei Yin has measured improvement on x86. I suspect you'll see improvement on your architecture too. Try the new will-it-scale test mentioned here: https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ You'll need to run it on an XFS filesystem and have CONFIG_TRANSPARENT_HUGEPAGE set. This patchset is the basis for much of the anonymous large folio work being done by Ryan, so it's received quite a lot of testing over the last few months. This patch (of 38): Determine if a value lies within a range more efficiently (subtraction + comparison vs two comparisons and an AND). It also has useful (under some circumstances) behaviour if the range exceeds the maximum value of the type. Convert all the conflicting definitions of in_range() within the kernel; some can use the generic definition while others need their own definition. Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/minmax.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'include/linux') diff --git a/include/linux/minmax.h b/include/linux/minmax.h index 396df1121bff..4f011eb6533d 100644 --- a/include/linux/minmax.h +++ b/include/linux/minmax.h @@ -3,6 +3,7 @@ #define _LINUX_MINMAX_H #include +#include /* * min()/max()/clamp() macros must accomplish three things: @@ -158,6 +159,32 @@ */ #define clamp_val(val, lo, hi) clamp_t(typeof(val), val, lo, hi) +static inline bool in_range64(u64 val, u64 start, u64 len) +{ + return (val - start) < len; +} + +static inline bool in_range32(u32 val, u32 start, u32 len) +{ + return (val - start) < len; +} + +/** + * in_range - Determine if a value lies within a range. + * @val: Value to test. + * @start: First value in range. + * @len: Number of values in range. + * + * This is more efficient than "if (start <= val && val < (start + len))". + * It also gives a different answer if @start + @len overflows the size of + * the type by a sufficient amount to encompass @val. Decide for yourself + * which behaviour you want, or prove that start + len never overflow. + * Do not blindly replace one form with the other. + */ +#define in_range(val, start, len) \ + ((sizeof(start) | sizeof(len) | sizeof(val)) <= sizeof(u32) ? \ + in_range32(val, start, len) : in_range64(val, start, len)) + /** * swap - swap values of @a and @b * @a: first value -- cgit v1.2.3 From a379322022c0961fe0b638cdd842d3c38eeff92c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:30 +0100 Subject: mm: convert page_table_check_pte_set() to page_table_check_ptes_set() Tell the page table check how many PTEs & PFNs we want it to check. Link: https://lkml.kernel.org/r/20230802151406.3735276-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Mike Rapoport (IBM) Acked-by: Pasha Tatashin Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/page_table_check.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 7f6b9bf926c5..6722941c7cb8 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -17,7 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); -void __page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, pte_t pte); +void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, + unsigned int nr); void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud); void __page_table_check_pte_clear_range(struct mm_struct *mm, @@ -64,13 +65,13 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) __page_table_check_pud_clear(mm, pud); } -static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, - pte_t pte) +static inline void page_table_check_ptes_set(struct mm_struct *mm, + pte_t *ptep, pte_t pte, unsigned int nr) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pte_set(mm, ptep, pte); + __page_table_check_ptes_set(mm, ptep, pte, nr); } static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, @@ -123,8 +124,8 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) { } -static inline void page_table_check_pte_set(struct mm_struct *mm, pte_t *ptep, - pte_t pte) +static inline void page_table_check_ptes_set(struct mm_struct *mm, + pte_t *ptep, pte_t pte, unsigned int nr) { } -- cgit v1.2.3 From bc60abfbe687e886f1c38d49ef2a00e90b4b49cf Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:32 +0100 Subject: mm: add folio_flush_mapping() This is the folio equivalent of page_mapping_file(), but rename it to make it clear that it's very different from page_file_mapping(). Theoretically, there's nothing flush-only about it, but there are no other users today, and I doubt there will be; it's almost always more useful to know the swapfile's mapping or the swapcache's mapping. Link: https://lkml.kernel.org/r/20230802151406.3735276-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Mike Rapoport (IBM) Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 437e4526028c..88d161887cf2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -389,6 +389,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio) return folio->mapping; } +/** + * folio_flush_mapping - Find the file mapping this folio belongs to. + * @folio: The folio. + * + * For folios which are in the page cache, return the mapping that this + * page belongs to. Anonymous folios return NULL, even if they're in + * the swap cache. Other kinds of folio also return NULL. + * + * This is ONLY used by architecture cache flushing code. If you aren't + * writing cache flushing code, you want either folio_mapping() or + * folio_file_mapping(). + */ +static inline struct address_space *folio_flush_mapping(struct folio *folio) +{ + if (unlikely(folio_test_swapcache(folio))) + return NULL; + + return folio_mapping(folio); +} + static inline struct address_space *page_file_mapping(struct page *page) { return folio_file_mapping(page_folio(page)); @@ -399,11 +419,7 @@ static inline struct address_space *page_file_mapping(struct page *page) */ static inline struct address_space *page_mapping_file(struct page *page) { - struct folio *folio = page_folio(page); - - if (unlikely(folio_test_swapcache(folio))) - return NULL; - return folio_mapping(folio); + return folio_flush_mapping(page_folio(page)); } /** -- cgit v1.2.3 From 29d26f1215de14721188988a59b1426abb85b7be Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:33 +0100 Subject: mm: remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Current best practice is to reuse the name of the function as a define to indicate that the function is implemented by the architecture. Link: https://lkml.kernel.org/r/20230802151406.3735276-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Mike Rapoport (IBM) Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/cacheflush.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h index a6189d21f2ba..82136f3fcf54 100644 --- a/include/linux/cacheflush.h +++ b/include/linux/cacheflush.h @@ -7,14 +7,14 @@ struct folio; #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE -#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO +#ifndef flush_dcache_folio void flush_dcache_folio(struct folio *folio); #endif #else static inline void flush_dcache_folio(struct folio *folio) { } -#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO 0 +#define flush_dcache_folio flush_dcache_folio #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */ #endif /* _LINUX_CACHEFLUSH_H */ -- cgit v1.2.3 From bcc6cc832573a99d1f935c89a28e2c71fd1aaf0c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:34 +0100 Subject: mm: add default definition of set_ptes() Most architectures can just define set_pte() and PFN_PTE_SHIFT to use this definition. It's also a handy spot to document the guarantees provided by the MM. Link: https://lkml.kernel.org/r/20230802151406.3735276-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Suggested-by: Mike Rapoport (IBM) Reviewed-by: Mike Rapoport (IBM) Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 81 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 60 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 6064f454c8e3..81c3f7decb1c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -182,6 +182,66 @@ static inline int pmd_young(pmd_t pmd) } #endif +/* + * A facility to provide lazy MMU batching. This allows PTE updates and + * page invalidations to be delayed until a call to leave lazy MMU mode + * is issued. Some architectures may benefit from doing this, and it is + * beneficial for both shadow and direct mode hypervisors, which may batch + * the PTE updates which happen during this window. Note that using this + * interface requires that read hazards be removed from the code. A read + * hazard could result in the direct mode hypervisor case, since the actual + * write to the page tables may not yet have taken place, so reads though + * a raw PTE pointer after it has been modified are not guaranteed to be + * up to date. This mode can only be entered and left under the protection of + * the page table locks for all page tables which may be modified. In the UP + * case, this is required so that preemption is disabled, and in the SMP case, + * it must synchronize the delayed page table writes properly on other CPUs. + */ +#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE +#define arch_enter_lazy_mmu_mode() do {} while (0) +#define arch_leave_lazy_mmu_mode() do {} while (0) +#define arch_flush_lazy_mmu_mode() do {} while (0) +#endif + +#ifndef set_ptes +#ifdef PFN_PTE_SHIFT +/** + * set_ptes - Map consecutive pages to a contiguous range of addresses. + * @mm: Address space to map the pages into. + * @addr: Address to map the first page at. + * @ptep: Page table pointer for the first entry. + * @pte: Page table entry for the first page. + * @nr: Number of pages to map. + * + * May be overridden by the architecture, or the architecture can define + * set_pte() and PFN_PTE_SHIFT. + * + * Context: The caller holds the page table lock. The pages all belong + * to the same folio. The PTEs are all in the same PMD. + */ +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + page_table_check_ptes_set(mm, ptep, pte, nr); + + arch_enter_lazy_mmu_mode(); + for (;;) { + set_pte(ptep, pte); + if (--nr == 0) + break; + ptep++; + pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT)); + } + arch_leave_lazy_mmu_mode(); +} +#ifndef set_pte_at +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#endif +#endif +#else +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, @@ -1051,27 +1111,6 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot) #define pgprot_decrypted(prot) (prot) #endif -/* - * A facility to provide lazy MMU batching. This allows PTE updates and - * page invalidations to be delayed until a call to leave lazy MMU mode - * is issued. Some architectures may benefit from doing this, and it is - * beneficial for both shadow and direct mode hypervisors, which may batch - * the PTE updates which happen during this window. Note that using this - * interface requires that read hazards be removed from the code. A read - * hazard could result in the direct mode hypervisor case, since the actual - * write to the page tables may not yet have taken place, so reads though - * a raw PTE pointer after it has been modified are not guaranteed to be - * up to date. This mode can only be entered and left under the protection of - * the page table locks for all page tables which may be modified. In the UP - * case, this is required so that preemption is disabled, and in the SMP case, - * it must synchronize the delayed page table writes properly on other CPUs. - */ -#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE -#define arch_enter_lazy_mmu_mode() do {} while (0) -#define arch_leave_lazy_mmu_mode() do {} while (0) -#define arch_flush_lazy_mmu_mode() do {} while (0) -#endif - /* * A facility to provide batching of the reload of page tables and * other process state with the actual context switch code for -- cgit v1.2.3 From 29269ad90bed55a9ece007a7ece53f4505df3781 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:58 +0100 Subject: mm: remove page_mapping_file() This function has no more users. Link: https://lkml.kernel.org/r/20230802151406.3735276-31-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 88d161887cf2..b5c4c8beefe2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -414,14 +414,6 @@ static inline struct address_space *page_file_mapping(struct page *page) return folio_file_mapping(page_folio(page)); } -/* - * For file cache pages, return the address_space, otherwise return NULL - */ -static inline struct address_space *page_mapping_file(struct page *page) -{ - return folio_flush_mapping(page_folio(page)); -} - /** * folio_inode - Get the host inode for this folio. * @folio: The folio. -- cgit v1.2.3 From 203b7b6aad6769a43987deb81c35456de8bb16c7 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:13:59 +0100 Subject: mm: rationalise flush_icache_pages() and flush_icache_page() Move the default (no-op) implementation of flush_icache_pages() to from . Remove the flush_icache_page() wrapper from each architecture into . Link: https://lkml.kernel.org/r/20230802151406.3735276-32-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/cacheflush.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h index 82136f3fcf54..55f297b2c23f 100644 --- a/include/linux/cacheflush.h +++ b/include/linux/cacheflush.h @@ -17,4 +17,13 @@ static inline void flush_dcache_folio(struct folio *folio) #define flush_dcache_folio flush_dcache_folio #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */ +#ifndef flush_icache_pages +static inline void flush_icache_pages(struct vm_area_struct *vma, + struct page *page, unsigned int nr) +{ +} +#endif + +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1) + #endif /* _LINUX_CACHEFLUSH_H */ -- cgit v1.2.3 From af4fcb0729329cc4b3f6977d7f75562a00174bd1 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Aug 2023 16:14:00 +0100 Subject: mm: tidy up set_ptes definition Now that all architectures are converted, we can remove the PFN_PTE_SHIFT ifdef and we can define set_pte_at() unconditionally. Link: https://lkml.kernel.org/r/20230802151406.3735276-33-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 81c3f7decb1c..fc811c9b421a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -204,7 +204,6 @@ static inline int pmd_young(pmd_t pmd) #endif #ifndef set_ptes -#ifdef PFN_PTE_SHIFT /** * set_ptes - Map consecutive pages to a contiguous range of addresses. * @mm: Address space to map the pages into. @@ -234,13 +233,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, } arch_leave_lazy_mmu_mode(); } -#ifndef set_pte_at -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) -#endif #endif -#else #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) -#endif #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, -- cgit v1.2.3 From 86f35f69db8e7d169c36472a349507ab0a461f49 Mon Sep 17 00:00:00 2001 From: Yin Fengwei Date: Wed, 2 Aug 2023 16:14:03 +0100 Subject: rmap: add folio_add_file_rmap_range() folio_add_file_rmap_range() allows to add pte mapping to a specific range of file folio. Comparing to page_add_file_rmap(), it batched updates __lruvec_stat for large folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-36-willy@infradead.org Signed-off-by: Yin Fengwei Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/rmap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index b87d01660412..a3825ce81102 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr, + struct vm_area_struct *, bool compound); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); -- cgit v1.2.3 From 3bd786f76de2e01745f462844fd1a206052ee8b8 Mon Sep 17 00:00:00 2001 From: Yin Fengwei Date: Wed, 2 Aug 2023 16:14:04 +0100 Subject: mm: convert do_set_pte() to set_pte_range() set_pte_range() allows to setup page table entries for a specific range. It takes advantage of batched rmap update for large folio. It now takes care of calling update_mmu_cache_range(). Link: https://lkml.kernel.org/r/20230802151406.3735276-37-willy@infradead.org Signed-off-by: Yin Fengwei Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index c1db400e83cb..ddb95967ba64 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1322,7 +1322,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) } vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); +void set_pte_range(struct vm_fault *vmf, struct folio *folio, + struct page *page, unsigned int nr, unsigned long addr); vm_fault_t finish_fault(struct vm_fault *vmf); vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); -- cgit v1.2.3 From cfeed8ffe55b37fa10286aaaa1369da00cb88440 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 21 Aug 2023 18:08:46 +0200 Subject: mm/swap: stop using page->private on tail pages for THP_SWAP Patch series "mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups". This series stops using page->private on tail pages for THP_SWAP, replaces folio->private by folio->swap for swapcache folios, and starts using "new_folio" for tail pages that we are splitting to remove the usage of page->private for swapcache handling completely. This patch (of 4): Let's stop using page->private on tail pages, making it possible to just unconditionally reuse that field in the tail pages of large folios. The remaining usage of the private field for THP_SWAP is in the THP splitting code (mm/huge_memory.c), that we'll handle separately later. Update the THP_SWAP documentation and sanity checks in mm_types.h and __split_huge_page_tail(). [david@redhat.com: stop using page->private on tail pages for THP_SWAP] Link: https://lkml.kernel.org/r/6f0a82a3-6948-20d9-580b-be1dbf415701@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-1-david@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-2-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Catalin Marinas [arm64] Reviewed-by: Yosry Ahmed Cc: Dan Streetman Cc: Hugh Dickins Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Cc: Seth Jennings Cc: Vitaly Wool Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 12 +----------- include/linux/swap.h | 9 +++++++++ 2 files changed, 10 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2b9d8be28361..55cd4bc57b8d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -322,11 +322,8 @@ struct folio { atomic_t _pincount; #ifdef CONFIG_64BIT unsigned int _folio_nr_pages; - /* 4 byte gap here */ - /* private: the union with struct page is transitional */ - /* Fix THP_SWAP to not use tail->private */ - unsigned long _private_1; #endif + /* private: the union with struct page is transitional */ }; struct page __page_1; }; @@ -347,9 +344,6 @@ struct folio { /* public: */ struct list_head _deferred_list; /* private: the union with struct page is transitional */ - unsigned long _avail_2a; - /* Fix THP_SWAP to not use tail->private */ - unsigned long _private_2a; }; struct page __page_2; }; @@ -374,9 +368,6 @@ FOLIO_MATCH(memcg_data, memcg_data); offsetof(struct page, pg) + sizeof(struct page)) FOLIO_MATCH(flags, _flags_1); FOLIO_MATCH(compound_head, _head_1); -#ifdef CONFIG_64BIT -FOLIO_MATCH(private, _private_1); -#endif #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) == \ @@ -385,7 +376,6 @@ FOLIO_MATCH(flags, _flags_2); FOLIO_MATCH(compound_head, _head_2); FOLIO_MATCH(flags, _flags_2a); FOLIO_MATCH(compound_head, _head_2a); -FOLIO_MATCH(private, _private_2a); #undef FOLIO_MATCH /** diff --git a/include/linux/swap.h b/include/linux/swap.h index bb5adc604144..e5cf58a1cf9e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -339,6 +339,15 @@ static inline swp_entry_t folio_swap_entry(struct folio *folio) return entry; } +static inline swp_entry_t page_swap_entry(struct page *page) +{ + struct folio *folio = page_folio(page); + swp_entry_t entry = folio_swap_entry(folio); + + entry.val += folio_page_idx(folio, page); + return entry; +} + static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) { folio->private = (void *)entry.val; -- cgit v1.2.3 From 85a1333417a7561c1d10a77d6c873a37e6ea63a0 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox Date: Mon, 21 Aug 2023 18:08:47 +0200 Subject: mm/swap: use dedicated entry for swap in folio Let's stop working on the private field and use an explicit swap field. We have to move the swp_entry_t typedef. Link: https://lkml.kernel.org/r/20230821160849.531668-3-david@redhat.com Signed-off-by: Matthew Wilcox Signed-off-by: David Hildenbrand Reviewed-by: Chris Li Cc: Catalin Marinas Cc: Dan Streetman Cc: Hugh Dickins Cc: Peter Xu Cc: Seth Jennings Cc: Vitaly Wool Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 23 +++++++++++++---------- include/linux/swap.h | 5 ++--- 2 files changed, 15 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 55cd4bc57b8d..36c5b43999e6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -248,6 +248,14 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page) return (struct page *)(~ENCODE_PAGE_BITS & (unsigned long)page); } +/* + * A swap entry has to fit into a "unsigned long", as the entry is hidden + * in the "index" field of the swapper address space. + */ +typedef struct { + unsigned long val; +} swp_entry_t; + /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. @@ -258,7 +266,7 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page) * @index: Offset within the file, in units of pages. For anonymous memory, * this is the index from the beginning of the mmap. * @private: Filesystem per-folio data (see folio_attach_private()). - * Used for swp_entry_t if folio_test_swapcache(). + * @swap: Used for swp_entry_t if folio_test_swapcache(). * @_mapcount: Do not access this member directly. Use folio_mapcount() to * find out how many times this folio is mapped by userspace. * @_refcount: Do not access this member directly. Use folio_ref_count() @@ -301,7 +309,10 @@ struct folio { }; struct address_space *mapping; pgoff_t index; - void *private; + union { + void *private; + swp_entry_t swap; + }; atomic_t _mapcount; atomic_t _refcount; #ifdef CONFIG_MEMCG @@ -1209,14 +1220,6 @@ enum tlb_flush_reason { NR_TLB_FLUSH_REASONS, }; - /* - * A swap entry has to fit into a "unsigned long", as the entry is hidden - * in the "index" field of the swapper address space. - */ -typedef struct { - unsigned long val; -} swp_entry_t; - /** * enum fault_flag - Fault flag definitions. * @FAULT_FLAG_WRITE: Fault was a write fault. diff --git a/include/linux/swap.h b/include/linux/swap.h index e5cf58a1cf9e..352eca0a75bc 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -335,8 +335,7 @@ struct swap_info_struct { static inline swp_entry_t folio_swap_entry(struct folio *folio) { - swp_entry_t entry = { .val = page_private(&folio->page) }; - return entry; + return folio->swap; } static inline swp_entry_t page_swap_entry(struct page *page) @@ -350,7 +349,7 @@ static inline swp_entry_t page_swap_entry(struct page *page) static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) { - folio->private = (void *)entry.val; + folio->swap = entry; } /* linux/mm/workingset.c */ -- cgit v1.2.3 From 3d2c908768877714a354ee6d7bf93e801400d5e2 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 21 Aug 2023 18:08:48 +0200 Subject: mm/swap: inline folio_set_swap_entry() and folio_swap_entry() Let's simply work on the folio directly and remove the helpers. Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.com Signed-off-by: David Hildenbrand Suggested-by: Matthew Wilcox Reviewed-by: Chris Li Cc: Catalin Marinas Cc: Dan Streetman Cc: Hugh Dickins Cc: Peter Xu Cc: Seth Jennings Cc: Vitaly Wool Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/swap.h | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 352eca0a75bc..493487ed7c38 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,25 +333,15 @@ struct swap_info_struct { */ }; -static inline swp_entry_t folio_swap_entry(struct folio *folio) -{ - return folio->swap; -} - static inline swp_entry_t page_swap_entry(struct page *page) { struct folio *folio = page_folio(page); - swp_entry_t entry = folio_swap_entry(folio); + swp_entry_t entry = folio->swap; entry.val += folio_page_idx(folio, page); return entry; } -static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) -{ - folio->swap = entry; -} - /* linux/mm/workingset.c */ bool workingset_test_recent(void *shadow, bool file, bool *workingset); void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); -- cgit v1.2.3 From bb7dbaafff3f582d18028a5b99a8faa789842678 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 19 Aug 2023 04:18:37 +0100 Subject: mm: remove checks for pte_index Since pte_index is always defined, we don't need to check whether it's defined or not. Delete the slow version that doesn't depend on it and remove the #define since nobody needs to test for it. Link: https://lkml.kernel.org/r/20230819031837.3160096-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Mike Rapoport (IBM) Cc: Christian Dietrich Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index fc811c9b421a..95ad544ad395 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -63,7 +63,6 @@ static inline unsigned long pte_index(unsigned long address) { return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); } -#define pte_index pte_index #ifndef pmd_index static inline unsigned long pmd_index(unsigned long address) -- cgit v1.2.3 From 051ddcfeb1bdbae45e660c0db2468d29ca15c6c2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 18 Aug 2023 21:23:33 +0100 Subject: mm: move PMD_ORDER to pgtable.h Patch series "Change calling convention for ->huge_fault", v2. There are two unrelated changes to the calling convention for ->huge_fault. I've bundled them together to help people notice the change. The first is to improve scalability of DAX page faults by allowing them to be handled under the VMA lock. The second is to remove enum page_entry_size since it's really unnecessary. The changelogs and documentation updates hopefully work to that end. This patch (of 3): Allow this to be used in generic code. Also add PUD_ORDER. Link: https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230818202335.2739663-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 95ad544ad395..f49abcfe5eda 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -5,6 +5,9 @@ #include #include +#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define PUD_ORDER (PUD_SHIFT - PAGE_SHIFT) + #ifndef __ASSEMBLY__ #ifdef CONFIG_MMU -- cgit v1.2.3 From 1d024e7a8dabcc3c84d77532a88c774c32cf8245 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 18 Aug 2023 21:23:35 +0100 Subject: mm: remove enum page_entry_size Remove the unnecessary encoding of page order into an enum and pass the page order directly. That lets us get rid of pe_order(). The switch constructs have to be changed to if/else constructs to prevent GCC from warning on builds with 3-level page tables where PMD_ORDER and PUD_ORDER have the same value. If you are looking at this commit because your driver stopped compiling, look at the previous commit as well and audit your driver to be sure it doesn't depend on mmap_lock being held in its ->huge_fault method. [willy@infradead.org: use "order %u" to match the (non dev_t) style] Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/dax.h | 4 ++-- include/linux/mm.h | 10 +--------- 2 files changed, 3 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index 261944ec0887..22cd9902345d 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -241,10 +241,10 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); -vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, +vm_fault_t dax_iomap_fault(struct vm_fault *vmf, unsigned int order, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, - enum page_entry_size pe_size, pfn_t pfn); + unsigned int order, pfn_t pfn); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); diff --git a/include/linux/mm.h b/include/linux/mm.h index ddb95967ba64..53efddc4d178 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -532,13 +532,6 @@ struct vm_fault { */ }; -/* page entry size for vm->huge_fault() */ -enum page_entry_size { - PE_SIZE_PTE = 0, - PE_SIZE_PMD, - PE_SIZE_PUD, -}; - /* * These are the virtual MM functions - opening of an area, closing and * unmapping it (needed to keep files on disk up-to-date etc), pointer @@ -562,8 +555,7 @@ struct vm_operations_struct { int (*mprotect)(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long newflags); vm_fault_t (*fault)(struct vm_fault *vmf); - vm_fault_t (*huge_fault)(struct vm_fault *vmf, - enum page_entry_size pe_size); + vm_fault_t (*huge_fault)(struct vm_fault *vmf, unsigned int order); vm_fault_t (*map_pages)(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff); unsigned long (*pagesize)(struct vm_area_struct * area); -- cgit v1.2.3 From 8f9ff2deb8b91ad1cba666c7adbe4ca79e2d3225 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Aug 2023 21:23:35 +0100 Subject: secretmem: convert page_is_secretmem() to folio_is_secretmem() The only caller already has a folio, so use it to save calling compound_head() in PageLRU() and remove a use of page->mapping. Link: https://lkml.kernel.org/r/20230822202335.179081-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/secretmem.h | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index 988528b5da43..35f3a4a8ceb1 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -6,24 +6,23 @@ extern const struct address_space_operations secretmem_aops; -static inline bool page_is_secretmem(struct page *page) +static inline bool folio_is_secretmem(struct folio *folio) { struct address_space *mapping; /* - * Using page_mapping() is quite slow because of the actual call - * instruction and repeated compound_head(page) inside the - * page_mapping() function. + * Using folio_mapping() is quite slow because of the actual call + * instruction. * We know that secretmem pages are not compound and LRU so we can * save a couple of cycles here. */ - if (PageCompound(page) || !PageLRU(page)) + if (folio_test_large(folio) || !folio_test_lru(folio)) return false; mapping = (struct address_space *) - ((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS); + ((unsigned long)folio->mapping & ~PAGE_MAPPING_FLAGS); - if (!mapping || mapping != page->mapping) + if (!mapping || mapping != folio->mapping) return false; return mapping->a_ops == &secretmem_aops; @@ -39,7 +38,7 @@ static inline bool vma_is_secretmem(struct vm_area_struct *vma) return false; } -static inline bool page_is_secretmem(struct page *page) +static inline bool folio_is_secretmem(struct folio *folio) { return false; } -- cgit v1.2.3 From 52ae298e3e5c9be5bb95e1c6d9199e5210f2a156 Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Tue, 22 Aug 2023 00:51:45 +0200 Subject: maple_tree: shrink struct maple_tree Pack the members of struct maple_tree to avoid holes on 64-bit. The size shrinks from 24 to 16 bytes which will save eight bytes in every structure which embeds it. [willy@infradead.org: changelog alterations] Link: https://lkml.kernel.org/r/20230821225145.2169848-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik Reviewed-by: Liam R. Howlett Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index c962af188681..e41c70ac7744 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -220,8 +220,8 @@ struct maple_tree { spinlock_t ma_lock; lockdep_map_p ma_external_lock; }; - void __rcu *ma_root; unsigned int ma_flags; + void __rcu *ma_root; }; /** -- cgit v1.2.3 From 6f991cc363a3269866476b8ff10a112768d3d45c Mon Sep 17 00:00:00 2001 From: Eric DeVolder Date: Mon, 14 Aug 2023 17:44:39 -0400 Subject: crash: move a few code bits to setup support of crash hotplug MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28. Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated. The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions. The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace. This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update. Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates. In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed. To realize the benefits/test this patchset, one must make a couple of minor changes to userspace: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump. - Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall. This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here: http://lists.infradead.org/pipermail/kexec/2023-May/027049.html To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch. This patch (of 8): The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c. In addition, struct crash_mem and crash_notes were moved to new locales so that PROC_KCORE, which sets CRASH_CORE alone, builds correctly. No functionality change intended. Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Akhil Raj Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric W. Biederman Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Mimi Zohar Cc: Naveen N. Rao Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Sean Christopherson Cc: Takashi Iwai Cc: Thomas Gleixner Cc: Thomas Weißschuh Cc: Valentin Schneider Cc: Vivek Goyal Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/crash_core.h | 20 ++++++++++++++++++++ include/linux/kexec.h | 15 --------------- 2 files changed, 20 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e..1e48b1d96404 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -28,6 +28,8 @@ VMCOREINFO_BYTES) typedef u32 note_buf_t[CRASH_CORE_NOTE_BYTES/4]; +/* Per cpu memory for storing cpu states in case of system crash. */ +extern note_buf_t __percpu *crash_notes; void crash_update_vmcoreinfo_safecopy(void *ptr); void crash_save_vmcoreinfo(void); @@ -84,4 +86,22 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +/* Alignment required for elf header segment */ +#define ELF_CORE_HEADER_ALIGN 4096 + +struct crash_mem { + unsigned int max_nr_ranges; + unsigned int nr_ranges; + struct range ranges[]; +}; + +extern int crash_exclude_mem_range(struct crash_mem *mem, + unsigned long long mstart, + unsigned long long mend); +extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, + void **addr, unsigned long *sz); + +struct kimage; +struct kexec_segment; + #endif /* LINUX_CRASH_CORE_H */ diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 22b5cd24f581..fb4350db33ff 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -230,21 +230,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) } #endif -/* Alignment required for elf header segment */ -#define ELF_CORE_HEADER_ALIGN 4096 - -struct crash_mem { - unsigned int max_nr_ranges; - unsigned int nr_ranges; - struct range ranges[]; -}; - -extern int crash_exclude_mem_range(struct crash_mem *mem, - unsigned long long mstart, - unsigned long long mend); -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, - void **addr, unsigned long *sz); - #ifndef arch_kexec_apply_relocations_add /* * arch_kexec_apply_relocations_add - apply relocations of type RELA -- cgit v1.2.3 From 24726275612140af6b1c0afc7c6611ad66233207 Mon Sep 17 00:00:00 2001 From: Eric DeVolder Date: Mon, 14 Aug 2023 17:44:40 -0400 Subject: crash: add generic infrastructure for crash hotplug support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlining). The crash elfcorehdr describes the CPUs and memory to be written into the vmcore. To track CPU changes, callbacks are registered with the cpuhp mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The crash hotplug elfcorehdr update has no explicit ordering requirement (relative to other cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a new state for crash hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE group, just prior to the STARTING group, which is very close to the CPU starting up in a plug/online situation, or stopping in a unplug/ offline situation. This minimizes the window of time during an actual plug/online or unplug/offline situation in which the elfcorehdr would be inaccurate. Note that for a CPU being unplugged or offlined, the CPU will still be present in the list of CPUs generated by crash_prepare_elf64_headers(). However, there is no need to explicitly omit the CPU, see justification in 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'. To track memory changes, a notifier is registered to capture the memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event() which performs needed tasks and then dispatches the event to the architecture specific arch_crash_handle_hotplug_event() to update the elfcorehdr with the current state of CPUs and memory. During the process, the kexec_lock is held. Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.com Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Akhil Raj Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric W. Biederman Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Mimi Zohar Cc: Naveen N. Rao Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Sean Christopherson Cc: Takashi Iwai Cc: Thomas Gleixner Cc: Thomas Weißschuh Cc: Valentin Schneider Cc: Vivek Goyal Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/crash_core.h | 7 +++++++ include/linux/kexec.h | 11 +++++++++++ 2 files changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 1e48b1d96404..0c06561bf5ff 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -104,4 +104,11 @@ extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_ma struct kimage; struct kexec_segment; +#define KEXEC_CRASH_HP_NONE 0 +#define KEXEC_CRASH_HP_ADD_CPU 1 +#define KEXEC_CRASH_HP_REMOVE_CPU 2 +#define KEXEC_CRASH_HP_ADD_MEMORY 3 +#define KEXEC_CRASH_HP_REMOVE_MEMORY 4 +#define KEXEC_CRASH_HP_INVALID_CPU -1U + #endif /* LINUX_CRASH_CORE_H */ diff --git a/include/linux/kexec.h b/include/linux/kexec.h index fb4350db33ff..df395f888915 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes; #include #include #include +#include #include /* Verify architecture specific macros are defined */ @@ -345,6 +346,12 @@ struct kimage { struct purgatory_info purgatory_info; #endif +#ifdef CONFIG_CRASH_HOTPLUG + int hp_action; + int elfcorehdr_index; + bool elfcorehdr_updated; +#endif + #ifdef CONFIG_IMA_KEXEC /* Virtual address of IMA measurement buffer for kexec syscall */ void *ima_buffer; @@ -475,6 +482,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { } #endif +#ifndef arch_crash_handle_hotplug_event +static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +#endif + #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; -- cgit v1.2.3 From 88a6f89944216b028d3872b0cec0f51a2f955460 Mon Sep 17 00:00:00 2001 From: Eric DeVolder Date: Mon, 14 Aug 2023 17:44:42 -0400 Subject: crash: memory and CPU hotplug sysfs attributes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Akhil Raj Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric W. Biederman Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Mimi Zohar Cc: Naveen N. Rao Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Sean Christopherson Cc: Takashi Iwai Cc: Thomas Gleixner Cc: Thomas Weißschuh Cc: Valentin Schneider Cc: Vivek Goyal Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kexec.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index df395f888915..172e9a544928 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -486,6 +486,14 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } #endif +#ifndef crash_hotplug_cpu_support +static inline int crash_hotplug_cpu_support(void) { return 0; } +#endif + +#ifndef crash_hotplug_memory_support +static inline int crash_hotplug_memory_support(void) { return 0; } +#endif + #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; -- cgit v1.2.3 From a72bbec70da285a7e09e53fb13c2da7da2032da9 Mon Sep 17 00:00:00 2001 From: Eric DeVolder Date: Mon, 14 Aug 2023 17:44:44 -0400 Subject: crash: hotplug support for kexec_load() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The hotplug support for kexec_load() requires changes to the userspace kexec-tools and a little extra help from the kernel. Given a kdump capture kernel loaded via kexec_load(), and a subsequent hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites it to reflect the hotplug change. That is the desired outcome, however, at kernel panic time, the purgatory integrity check fails (because the elfcorehdr changed), and the capture kernel does not boot and no vmcore is generated. Therefore, the userspace kexec-tools/kexec must indicate to the kernel that the elfcorehdr can be modified (because the kexec excluded the elfcorehdr from the digest, and sized the elfcorehdr memory buffer appropriately). To facilitate hotplug support with kexec_load(): - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is safe for the kernel to modify the kexec_load()'d elfcorehdr - the /sys/kernel/crash_elfcorehdr_size node communicates the preferred size of the elfcorehdr memory buffer - The sysfs crash_hotplug nodes (ie. /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically take into account kexec_file_load() vs kexec_load() and KEXEC_UPDATE_ELFCOREHDR. This is critical so that the udev rule processing of crash_hotplug is all that is needed to determine if the userspace unload-then-load of the kdump image is to be skipped, or not. The proposed udev rule change looks like: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" The table below indicates the behavior of kexec_load()'d kdump image updates (with the new udev crash_hotplug rule in place): Kernel |Kexec -------+-----+---- Old |Old |New | a | a -------+-----+---- New | a | b -------+-----+---- where kexec 'old' and 'new' delineate kexec-tools has the needed modifications for the crash hotplug feature, and kernel 'old' and 'new' delineate the kernel supports this crash hotplug feature. Behavior 'a' indicates the unload-then-reload of the entire kdump image. For the kexec 'old' column, the unload-then-reload occurs due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel (with 'new' kexec) does not present the crash_hotplug sysfs node, which leads to the unload-then-reload of the kdump image. Behavior 'b' indicates the desired optimized behavior of the kernel directly modifying the elfcorehdr and avoiding the unload-then-reload of the kdump image. If the udev rule is not updated with crash_hotplug node check, then no matter any combination of kernel or kexec is new or old, the kdump image continues to be unload-then-reload on hotplug changes. To fully support crash hotplug feature, there needs to be a rollout of kernel, kexec-tools and udev rule changes. However, the order of the rollout of these pieces does not matter; kexec_load()'d kdump images still function for hotplug as-is. Link: https://lkml.kernel.org/r/20230814214446.6659-7-eric.devolder@oracle.com Signed-off-by: Eric DeVolder Suggested-by: Hari Bathini Acked-by: Hari Bathini Acked-by: Baoquan He Cc: Akhil Raj Cc: Bjorn Helgaas Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric W. Biederman Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Mimi Zohar Cc: Naveen N. Rao Cc: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Sean Christopherson Cc: Sourabh Jain Cc: Takashi Iwai Cc: Thomas Gleixner Cc: Thomas Weißschuh Cc: Valentin Schneider Cc: Vivek Goyal Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kexec.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 172e9a544928..32c78078552c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -320,6 +320,10 @@ struct kimage { unsigned int preserve_context : 1; /* If set, we are using file mode kexec syscall */ unsigned int file_mode:1; +#ifdef CONFIG_CRASH_HOTPLUG + /* If set, allow changes to elfcorehdr of kexec_load'd image */ + unsigned int update_elfcorehdr:1; +#endif #ifdef ARCH_HAS_KIMAGE_ARCH struct kimage_arch arch; @@ -396,9 +400,9 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec flags */ #ifndef CONFIG_KEXEC_JUMP -#define KEXEC_FLAGS KEXEC_ON_CRASH +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR) #else -#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT) +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR) #endif /* List of defined/legal kexec file flags */ @@ -486,6 +490,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } #endif +int crash_check_update_elfcorehdr(void); + #ifndef crash_hotplug_cpu_support static inline int crash_hotplug_cpu_support(void) { return 0; } #endif @@ -494,6 +500,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; } static inline int crash_hotplug_memory_support(void) { return 0; } #endif +#ifndef crash_get_elfcorehdr_size +static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; } +#endif + #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; -- cgit v1.2.3 From dce8f8ed1de1d9d6d27c5ccd202ce4ec163b100c Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Wed, 23 Aug 2023 19:08:06 +0200 Subject: document while_each_thread(), change first_tid() to use for_each_thread() Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a group leader and thus can't exit before t, t or another sub-thread can exec and remove g from the thread_group list. The only lockless user of while_each_thread() is first_tid() and it is fine in that it can't loop forever, yet for_each_thread() looks better and I am going to change while_each_thread/next_thread. Link: https://lkml.kernel.org/r/20230823170806.GA11724@redhat.com Signed-off-by: Oleg Nesterov Cc: Eric W. Biederman Signed-off-by: Andrew Morton --- include/linux/sched/signal.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 0deebe2ab07d..0014d3adaf84 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -648,6 +648,10 @@ extern void flush_itimer_signals(void); extern bool current_is_single_threaded(void); +/* + * Without tasklist/siglock it is only rcu-safe if g can't exit/exec, + * otherwise next_thread(t) will never reach g after list_del_rcu(g). + */ #define while_each_thread(g, t) \ while ((t = next_thread(t)) != g) -- cgit v1.2.3 From 60c5fd2e8f3c42a5abc565ba9876ead1da5ad2b7 Mon Sep 17 00:00:00 2001 From: Zhu Wang Date: Tue, 22 Aug 2023 01:52:54 +0000 Subject: scsi: core: raid_class: Remove raid_component_add() The raid_component_add() function was added to the kernel tree via patch "[SCSI] embryonic RAID class" (2005). Remove this function since it never has had any callers in the Linux kernel. And also raid_component_release() is only used in raid_component_add(), so it is also removed. Signed-off-by: Zhu Wang Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche Fixes: 04b5b5cb0136 ("scsi: core: Fix possible memory leak if device_add() fails") Signed-off-by: Martin K. Petersen --- include/linux/raid_class.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/raid_class.h b/include/linux/raid_class.h index 6a9b177d5c41..e50416ba9cd9 100644 --- a/include/linux/raid_class.h +++ b/include/linux/raid_class.h @@ -77,7 +77,3 @@ DEFINE_RAID_ATTRIBUTE(enum raid_state, state) struct raid_template *raid_class_attach(struct raid_function_template *); void raid_class_release(struct raid_template *); - -int __must_check raid_component_add(struct raid_template *, struct device *, - struct device *); - -- cgit v1.2.3 From d55595f04dcc8bd6f6ff33f451dda8de3f1232da Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Wed, 23 Aug 2023 14:19:28 +0800 Subject: net: pcs: xpcs: add specific vendor supoprt for Wangxun 10Gb NICs Since Wangxun 10Gb NICs require some special configuration on the IP of Synopsys Designware XPCS, introduce dev_flag for different vendors. Read OUI from device identifier registers, to detect Wangxun devices. And xpcs_soft_reset() is skipped to avoid the reset of device identifier registers. Signed-off-by: Jiawen Wu Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index ff99cf7a5d0d..f37e8acfd351 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -20,12 +20,19 @@ #define DW_AN_C37_1000BASEX 4 #define DW_10GBASER 5 +/* device vendor OUI */ +#define DW_OUI_WX 0x0018fc80 + +/* dev_flag */ +#define DW_DEV_TXGBE BIT(0) + struct xpcs_id; struct dw_xpcs { struct mdio_device *mdiodev; const struct xpcs_id *id; struct phylink_pcs pcs; + int dev_flag; }; int xpcs_get_an_mode(struct dw_xpcs *xpcs, phy_interface_t interface); -- cgit v1.2.3 From f629acc6f21043fdc80ae93cdafa6713888db0fd Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Wed, 23 Aug 2023 14:19:29 +0800 Subject: net: pcs: xpcs: support to switch mode for Wangxun NICs According to chapter 6 of DesignWare Cores Ethernet PCS (version 3.20a) and custom design manual, add a configuration flow for switching interface mode. If the interface changes, the following setting is required: 1. wait VR_XS_PCS_DIG_STS bit(4, 2) [PSEQ_STATE] = 100b (Power-Good) 2. write SR_XS_PCS_CTRL2 to select various PCS type 3. write SR_PMA_CTRL1 and/or SR_XS_PCS_CTRL1 for link speed 4. program PMA registers 5. write VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] = 1b (Vendor-Specific Soft Reset) 6. wait for VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] to get cleared Only 10GBASE-R/SGMII/1000BASE-X modes are planned for the current Wangxun devices. And there is a quirk for Wangxun devices to switch mode although the interface in phylink state has not changed, since PCS will change to default 10GBASE-R when the ethernet driver(txgbe) do LAN reset. Signed-off-by: Jiawen Wu Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index f37e8acfd351..da3a6c30f6d2 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -32,6 +32,7 @@ struct dw_xpcs { struct mdio_device *mdiodev; const struct xpcs_id *id; struct phylink_pcs pcs; + phy_interface_t interface; int dev_flag; }; -- cgit v1.2.3 From a4f39c9f14a634e4cd35fcd338c239d11fcc73fc Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 23 Aug 2023 15:41:02 +0200 Subject: net: handle ARPHRD_PPP in dev_is_mac_header_xmit() The goal is to support a bpf_redirect() from an ethernet device (ingress) to a ppp device (egress). The l2 header is added automatically by the ppp driver, thus the ethernet header should be removed. CC: stable@vger.kernel.org Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel Tested-by: Siwar Zitouni Reviewed-by: Guillaume Nault Signed-off-by: David S. Miller --- include/linux/if_arp.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h index 1ed52441972f..10a1e81434cb 100644 --- a/include/linux/if_arp.h +++ b/include/linux/if_arp.h @@ -53,6 +53,10 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev) case ARPHRD_NONE: case ARPHRD_RAWIP: case ARPHRD_PIMREG: + /* PPP adds its l2 header automatically in ppp_start_xmit(). + * This makes it look like an l3 device to __bpf_redirect() and tcf_mirred_init(). + */ + case ARPHRD_PPP: return false; default: return true; -- cgit v1.2.3 From bac806830fde94ccf41482535fc442c02656febc Mon Sep 17 00:00:00 2001 From: Wenchao Chen Date: Fri, 25 Aug 2023 17:17:42 +0800 Subject: mmc: core: Add host specific tuning support for SD HS mode To support the need for host specific tuning for SD high-speed mode, let's add two new optional callbacks, ->prepare|execute_sd_hs_tuning() and let's call them when switching into the SD high-speed mode. Note that, during the tuning process it's also needed for host drivers to send commands to the SD card to verify that the tuning process succeeds. Therefore, let's also share the corresponding functions from the core to allow this. Signed-off-by: Wenchao Chen Link: https://lore.kernel.org/r/20230825091743.15613-2-wenchao.chen@unisoc.com [Ulf: Dropped unnecessary function declarations and updated the commit msg] Signed-off-by: Ulf Hansson --- include/linux/mmc/host.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 461d1543893b..62a6847a3b6f 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -184,6 +184,12 @@ struct mmc_host_ops { /* Execute HS400 tuning depending host driver */ int (*execute_hs400_tuning)(struct mmc_host *host, struct mmc_card *card); + /* Optional callback to prepare for SD high-speed tuning */ + int (*prepare_sd_hs_tuning)(struct mmc_host *host, struct mmc_card *card); + + /* Optional callback to execute SD high-speed tuning */ + int (*execute_sd_hs_tuning)(struct mmc_host *host, struct mmc_card *card); + /* Prepare switch to DDR during the HS400 init sequence */ int (*hs400_prepare_ddr)(struct mmc_host *host); @@ -665,6 +671,8 @@ static inline void mmc_debugfs_err_stats_inc(struct mmc_host *host, host->err_stats[stat] += 1; } +int mmc_sd_switch(struct mmc_card *card, int mode, int group, u8 value, u8 *resp); +int mmc_send_status(struct mmc_card *card, u32 *status); int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error); int mmc_send_abort_tuning(struct mmc_host *host, u32 opcode); int mmc_get_ext_csd(struct mmc_card *card, u8 **new_ext_csd); -- cgit v1.2.3 From ce6e94722523f85e265455eb3296ebdbe996b3b5 Mon Sep 17 00:00:00 2001 From: Balamanikandan Gunasundar Date: Fri, 25 Aug 2023 15:21:55 +0530 Subject: mmc: atmel-mci: Convert to gpio descriptors Replace the legacy GPIO APIs with gpio descriptor consumer interface. To maintain backward compatibility, we rely on the "cd-inverted" property to manage the invertion flag instead of GPIO property. Signed-off-by: Balamanikandan Gunasundar Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20230825095157.76073-2-balamanikandan.gunasundar@microchip.com Signed-off-by: Ulf Hansson --- include/linux/atmel-mci.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/atmel-mci.h b/include/linux/atmel-mci.h index 1491af38cc6e..017e7d8f6126 100644 --- a/include/linux/atmel-mci.h +++ b/include/linux/atmel-mci.h @@ -26,8 +26,8 @@ */ struct mci_slot_pdata { unsigned int bus_width; - int detect_pin; - int wp_pin; + struct gpio_desc *detect_pin; + struct gpio_desc *wp_pin; bool detect_is_active_high; bool non_removable; }; -- cgit v1.2.3 From d2c6d518c21d73d96616e08a19eccd4642f4bafa Mon Sep 17 00:00:00 2001 From: Balamanikandan Gunasundar Date: Fri, 25 Aug 2023 15:21:56 +0530 Subject: mmc: atmel-mci: move atmel MCI header file Move the contents of linux/atmel-mci.h into drivers/mmc/host/atmel-mci.c as it is only used in one file Signed-off-by: Balamanikandan Gunasundar Reviewed-by: Ludovic Desroches Link: https://lore.kernel.org/r/20230825095157.76073-3-balamanikandan.gunasundar@microchip.com Signed-off-by: Ulf Hansson --- include/linux/atmel-mci.h | 46 ---------------------------------------------- 1 file changed, 46 deletions(-) delete mode 100644 include/linux/atmel-mci.h (limited to 'include/linux') diff --git a/include/linux/atmel-mci.h b/include/linux/atmel-mci.h deleted file mode 100644 index 017e7d8f6126..000000000000 --- a/include/linux/atmel-mci.h +++ /dev/null @@ -1,46 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_ATMEL_MCI_H -#define __LINUX_ATMEL_MCI_H - -#include -#include - -#define ATMCI_MAX_NR_SLOTS 2 - -/** - * struct mci_slot_pdata - board-specific per-slot configuration - * @bus_width: Number of data lines wired up the slot - * @detect_pin: GPIO pin wired to the card detect switch - * @wp_pin: GPIO pin wired to the write protect sensor - * @detect_is_active_high: The state of the detect pin when it is active - * @non_removable: The slot is not removable, only detect once - * - * If a given slot is not present on the board, @bus_width should be - * set to 0. The other fields are ignored in this case. - * - * Any pins that aren't available should be set to a negative value. - * - * Note that support for multiple slots is experimental -- some cards - * might get upset if we don't get the clock management exactly right. - * But in most cases, it should work just fine. - */ -struct mci_slot_pdata { - unsigned int bus_width; - struct gpio_desc *detect_pin; - struct gpio_desc *wp_pin; - bool detect_is_active_high; - bool non_removable; -}; - -/** - * struct mci_platform_data - board-specific MMC/SDcard configuration - * @dma_slave: DMA slave interface to use in data transfers. - * @slot: Per-slot configuration data. - */ -struct mci_platform_data { - void *dma_slave; - dma_filter_fn dma_filter; - struct mci_slot_pdata slot[ATMCI_MAX_NR_SLOTS]; -}; - -#endif /* __LINUX_ATMEL_MCI_H */ -- cgit v1.2.3 From c439d5e8a0deb7310b5bb4e5f2fe47c40ff5297f Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Wed, 23 Aug 2023 07:06:08 +0200 Subject: pcpcntr: add group allocation/free Allocations and frees are globally serialized on the pcpu lock (and the CPU hotplug lock if enabled, which is the case on Debian). At least one frequent consumer allocates 4 back-to-back counters (and frees them in the same manner), exacerbating the problem. While this does not fully remedy scalability issues, it is a step towards that goal and provides immediate relief. Signed-off-by: Mateusz Guzik Reviewed-by: Dennis Zhou Reviewed-by: Vegard Nossum Link: https://lore.kernel.org/r/20230823050609.2228718-2-mjguzik@gmail.com [Dennis: reflowed a few lines] Signed-off-by: Dennis Zhou --- include/linux/percpu_counter.h | 41 ++++++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 75b73c83bc9d..d01351b1526f 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -30,17 +30,28 @@ struct percpu_counter { extern int percpu_counter_batch; -int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp, - struct lock_class_key *key); +int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, + gfp_t gfp, u32 nr_counters, + struct lock_class_key *key); -#define percpu_counter_init(fbc, value, gfp) \ +#define percpu_counter_init_many(fbc, value, gfp, nr_counters) \ ({ \ static struct lock_class_key __key; \ \ - __percpu_counter_init(fbc, value, gfp, &__key); \ + __percpu_counter_init_many(fbc, value, gfp, nr_counters,\ + &__key); \ }) -void percpu_counter_destroy(struct percpu_counter *fbc); + +#define percpu_counter_init(fbc, value, gfp) \ + percpu_counter_init_many(fbc, value, gfp, 1) + +void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters); +static inline void percpu_counter_destroy(struct percpu_counter *fbc) +{ + percpu_counter_destroy_many(fbc, 1); +} + void percpu_counter_set(struct percpu_counter *fbc, s64 amount); void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch); @@ -116,11 +127,27 @@ struct percpu_counter { s64 count; }; +static inline int percpu_counter_init_many(struct percpu_counter *fbc, + s64 amount, gfp_t gfp, + u32 nr_counters) +{ + u32 i; + + for (i = 0; i < nr_counters; i++) + fbc[i].count = amount; + + return 0; +} + static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp) { - fbc->count = amount; - return 0; + return percpu_counter_init_many(fbc, amount, gfp, 1); +} + +static inline void percpu_counter_destroy_many(struct percpu_counter *fbc, + u32 nr_counters) +{ } static inline void percpu_counter_destroy(struct percpu_counter *fbc) -- cgit v1.2.3 From 2a6d50b50d6d589d43a90d6ca990b8b811e67701 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Mon, 21 Aug 2023 12:33:06 -0700 Subject: bpf: Consider non-owning refs trusted Recent discussions around default kptr "trustedness" led to changes such as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier."). One of the conclusions of those discussions, as expressed in code and comments in that patch, is that we'd like to move away from 'raw' PTR_TO_BTF_ID without some type flag or other register state indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark this state explicitly, the verifier currently considers trustedness implied by other register state. For example, owning refs to graph collection nodes must have a nonzero ref_obj_id, so they pass the is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag. This patch makes trustedness of non-owning refs to graph collection nodes explicit as well. By definition, non-owning refs are currently trusted. Although the ref has no control over pointee lifetime, due to non-owning ref clobbering rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is safe in the critical section controlled by bpf_spin_lock associated with its owning collection. Note that the previous statement does not hold true for nodes with shared ownership due to the use-after-free issue that this series is addressing. True shared ownership was disabled by commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"), though, so the statement holds for now. Further patches in the series will change the trustedness state of non-owning refs before re-enabling bpf_refcount_acquire. Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a non-owning ref reg state would pass is_trusted_reg check. Somewhat surprisingly, this doesn't result in any change to user-visible functionality elsewhere in the verifier: graph collection nodes are all marked MEM_ALLOC, which tends to be handled in separate codepaths from "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the current state of things before changing it elsewhere in the series. Signed-off-by: Dave Marchevsky Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230821193311.3290257-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index f70f9ac884d2..b6e58dab8e27 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -745,7 +745,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog) } } -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED) +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED | NON_OWN_REF) static inline bool bpf_type_has_unsafe_modifiers(u32 type) { -- cgit v1.2.3 From 0816b8c6bf7fc87cec4273dc199e8f0764b9e7b1 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Mon, 21 Aug 2023 12:33:09 -0700 Subject: bpf: Consider non-owning refs to refcounted nodes RCU protected An earlier patch in the series ensures that the underlying memory of nodes with bpf_refcount - which can have multiple owners - is not reused until RCU grace period has elapsed. This prevents use-after-free with non-owning references that may point to recently-freed memory. While RCU read lock is held, it's safe to dereference such a non-owning ref, as by definition RCU GP couldn't have elapsed and therefore underlying memory couldn't have been reused. From the perspective of verifier "trustedness" non-owning refs to refcounted nodes are now trusted only in RCU CS and therefore should no longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them MEM_RCU in order to reflect this new state. Signed-off-by: Dave Marchevsky Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eced6400f778..12596af59c00 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -653,7 +653,8 @@ enum bpf_type_flag { MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning. - * Currently only valid for linked-list and rbtree nodes. + * Currently only valid for linked-list and rbtree nodes. If the nodes + * have a bpf_refcount_field, they must be tagged MEM_RCU as well. */ NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), -- cgit v1.2.3 From 5f536ac6a5a7b67351e4e5ae4f9e1e57d31268e6 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 16:59:56 -0700 Subject: LoadPin: Annotate struct dm_verity_loadpin_trusted_root_digest with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dm_verity_loadpin_trusted_root_digest. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-devel@redhat.com Cc: Paul Moore Cc: James Morris Cc: "Serge E. Hallyn" Cc: linux-security-module@vger.kernel.org Link: https://lore.kernel.org/r/20230817235955.never.762-kees@kernel.org Signed-off-by: Kees Cook --- include/linux/dm-verity-loadpin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dm-verity-loadpin.h b/include/linux/dm-verity-loadpin.h index 552b817ab102..3ac6dbaeaa37 100644 --- a/include/linux/dm-verity-loadpin.h +++ b/include/linux/dm-verity-loadpin.h @@ -12,7 +12,7 @@ extern struct list_head dm_verity_loadpin_trusted_root_digests; struct dm_verity_loadpin_trusted_root_digest { struct list_head node; unsigned int len; - u8 data[]; + u8 data[] __counted_by(len); }; #if IS_ENABLED(CONFIG_SECURITY_LOADPIN_VERITY) -- cgit v1.2.3 From 70934c7c99ad01778eef83e898df4c624e52492f Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 24 Aug 2023 14:37:53 +0100 Subject: net: phylink: add phylink_limit_mac_speed() Add a function which can be used to limit the phylink MAC capabilities to an upper speed limit. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZAX3-005pTi-K1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 789c516c6b4a..7d07f8736431 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -223,6 +223,8 @@ struct phylink_config { unsigned long mac_capabilities; }; +void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed); + /** * struct phylink_mac_ops - MAC operations structure. * @validate: Validate and update the link configuration. -- cgit v1.2.3 From e80af2acdef73d90ce56d6849d96f2a5862cec2c Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 24 Aug 2023 14:37:58 +0100 Subject: net: stmmac: convert plat->phylink_node to fwnode All users of plat->phylink_node first convert it to a fwnode. Rather than repeatedly convert to a fwnode, store it as a fwnode. To reflect this change, call it plat->port_node instead - it is used for more than just phylink. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZAX8-005pTo-OT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 784277d666eb..b2ccd827bb80 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -227,7 +227,7 @@ struct plat_stmmacenet_data { phy_interface_t phy_interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; - struct device_node *phylink_node; + struct fwnode_handle *port_node; struct device_node *mdio_node; struct stmmac_dma_cfg *dma_cfg; struct stmmac_est *est; -- cgit v1.2.3 From ebf05c7dc92c11b0355aaa0e94064beadaa4b05c Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 25 Aug 2023 17:28:20 +0200 Subject: tty: shrink the size of struct tty_struct by 40 bytes It's been a long time since anyone has looked at what struct tty_struct looks like in memory, turns out there was a ton of holes. So move things around a bit, change one variable (closing) from being an int to a bool (it is only being tested for 0/1), and we end up saving 40 bytes per structure overall on x86-64 systems. Before this patch: /* size: 696, cachelines: 11, members: 37 */ /* sum members: 665, holes: 8, sum holes: 31 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */ /* last cacheline: 56 bytes */ After this change: /* size: 656, cachelines: 11, members: 37 */ /* sum members: 654, holes: 1, sum holes: 2 */ /* forced alignments: 2 */ /* last cacheline: 16 bytes */ Cc: Jiri Slaby Link: https://lore.kernel.org/r/2023082519-cobbler-unholy-8d1f@gregkh Signed-off-by: Greg Kroah-Hartman --- include/linux/tty.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty.h b/include/linux/tty.h index e8d5d9997aca..f002d0f25db7 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -192,13 +192,14 @@ struct tty_operations; */ struct tty_struct { struct kref kref; + int index; struct device *dev; struct tty_driver *driver; + struct tty_port *port; const struct tty_operations *ops; - int index; - struct ld_semaphore ldisc_sem; struct tty_ldisc *ldisc; + struct ld_semaphore ldisc_sem; struct mutex atomic_write_lock; struct mutex legacy_mutex; @@ -209,6 +210,7 @@ struct tty_struct { char name[64]; unsigned long flags; int count; + unsigned int receive_room; struct winsize winsize; struct { @@ -219,16 +221,16 @@ struct tty_struct { } __aligned(sizeof(unsigned long)) flow; struct { - spinlock_t lock; struct pid *pgrp; struct pid *session; + spinlock_t lock; unsigned char pktstatus; bool packet; unsigned long unused[0]; } __aligned(sizeof(unsigned long)) ctrl; bool hw_stopped; - unsigned int receive_room; + bool closing; int flow_change; struct tty_struct *link; @@ -239,15 +241,13 @@ struct tty_struct { void *disc_data; void *driver_data; spinlock_t files_lock; + int write_cnt; + unsigned char *write_buf; + struct list_head tty_files; #define N_TTY_BUF_SIZE 4096 - - int closing; - unsigned char *write_buf; - int write_cnt; struct work_struct SAK_work; - struct tty_port *port; } __randomize_layout; /* Each of a tty's open files has private_data pointing to tty_file_private */ -- cgit v1.2.3 From 781589e40ac5f929f58824c15448e1ba49c3ac32 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Thu, 17 Aug 2023 15:55:31 -0700 Subject: rtc: Add support for limited alarm timer offsets Some alarm timers are based on time offsets, not on absolute times. In some situations, the amount of time that can be scheduled in the future is limited. This may result in a refusal to suspend the system, causing substantial battery drain. Some RTC alarm drivers remedy the situation by setting the alarm time to the maximum supported time if a request for an out-of-range timeout is made. This is not really desirable since it may result in unexpected early wakeups. To reduce the impact of this problem, let RTC drivers report the maximum supported alarm timer offset. The code setting alarm timers can then decide if it wants to reject setting alarm timers to a larger value, if it wants to implement recurring alarms until the actually requested alarm time is met, or if it wants to accept the limited alarm time. Only introduce the necessary variable into struct rtc_device. Code to set and use the variable will follow with subsequent patches. Cc: Brian Norris Signed-off-by: Guenter Roeck Link: https://lore.kernel.org/r/20230817225537.4053865-2-linux@roeck-us.net Signed-off-by: Alexandre Belloni --- include/linux/rtc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 1fd9c6a21ebe..4c0bcbeb1f00 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -146,6 +146,7 @@ struct rtc_device { time64_t range_min; timeu64_t range_max; + timeu64_t alarm_offset_max; time64_t start_secs; time64_t offset_secs; bool set_start_time; -- cgit v1.2.3 From 17c8da5a3423c9f3800a256dbaa685bae18e1264 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Thu, 24 Aug 2023 23:28:33 -0700 Subject: net/mlx5: Add IFC bits to support IPsec enable/disable Add hardware definitions to allow to control IPSec capabilities. Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 08dcb1f43be7..fc3db401f8a2 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -65,9 +65,11 @@ enum { enum { MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0, + MLX5_SET_HCA_CAP_OP_MOD_ETHERNET_OFFLOADS = 0x1, MLX5_SET_HCA_CAP_OP_MOD_ODP = 0x2, MLX5_SET_HCA_CAP_OP_MOD_ATOMIC = 0x3, MLX5_SET_HCA_CAP_OP_MOD_ROCE = 0x4, + MLX5_SET_HCA_CAP_OP_MOD_IPSEC = 0x15, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE2 = 0x20, MLX5_SET_HCA_CAP_OP_MOD_PORT_SELECTION = 0x25, }; @@ -3451,6 +3453,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; struct mlx5_ifc_macsec_cap_bits macsec_cap; struct mlx5_ifc_crypto_cap_bits crypto_cap; + struct mlx5_ifc_ipsec_cap_bits ipsec_cap; u8 reserved_at_0[0x8000]; }; -- cgit v1.2.3 From 8efd7b17a3b03242c97281d88463ad56e8f551f8 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Thu, 24 Aug 2023 23:28:34 -0700 Subject: net/mlx5: Provide an interface to block change of IPsec capabilities mlx5 HW can't perform IPsec offload operation simultaneously both on PF and VFs at the same time. While the previous patches added devlink knobs to change IPsec capabilities dynamically, there is a need to add a logic to block such IPsec capabilities for the cases when IPsec is already configured. Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e95f10066eac..3033bbaeac81 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -813,6 +813,7 @@ struct mlx5_core_dev { /* MACsec notifier chain to sync MACsec core and IB database */ struct blocking_notifier_head macsec_nh; #endif + u64 num_ipsec_offloads; }; struct mlx5_db { -- cgit v1.2.3 From 72f93a3136ee18fd59fa6579f84c07e93424681e Mon Sep 17 00:00:00 2001 From: Antonio Napolitano Date: Sat, 26 Aug 2023 01:05:50 +0200 Subject: r8152: add vendor/device ID pair for D-Link DUB-E250 The D-Link DUB-E250 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Antonio Napolitano Link: https://lore.kernel.org/r/CV200KJEEUPC.WPKAHXCQJ05I@mercurius Signed-off-by: Jakub Kicinski --- include/linux/usb/r8152.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h index 20d88b1defc3..287e9d83fb8b 100644 --- a/include/linux/usb/r8152.h +++ b/include/linux/usb/r8152.h @@ -29,6 +29,7 @@ #define VENDOR_ID_LINKSYS 0x13b1 #define VENDOR_ID_NVIDIA 0x0955 #define VENDOR_ID_TPLINK 0x2357 +#define VENDOR_ID_DLINK 0x2001 #if IS_REACHABLE(CONFIG_USB_RTL8152) extern u8 rtl8152_get_version(struct usb_interface *intf); -- cgit v1.2.3 From a014c35556b9045ece8426df2b38eb3c5e1c1aa0 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 26 Aug 2023 11:02:51 +0100 Subject: net: stmmac: clarify difference between "interface" and "phy_interface" Clarify the difference between "interface" and "phy_interface" in struct plat_stmmacenet_data, both by adding a comment, and also renaming "interface" to be "mac_interface". The difference between these are: MAC ----- optional PCS ----- SerDes ----- optional PHY ----- Media ^ ^ mac_interface phy_interface Note that phylink currently only deals with phy_interface. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZq83-005tts-6K@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index b2ccd827bb80..ce89cc3e4913 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -223,7 +223,20 @@ struct dwmac4_addrs { struct plat_stmmacenet_data { int bus_id; int phy_addr; - int interface; + /* MAC ----- optional PCS ----- SerDes ----- optional PHY ----- Media + * ^ ^ + * mac_interface phy_interface + * + * mac_interface is the MAC-side interface, which may be the same + * as phy_interface if there is no intervening PCS. If there is a + * PCS, then mac_interface describes the interface mode between the + * MAC and PCS, and phy_interface describes the interface mode + * between the PCS and PHY. + */ + phy_interface_t mac_interface; + /* phy_interface is the PHY-side interface - the interface used by + * an attached PHY. + */ phy_interface_t phy_interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; -- cgit v1.2.3 From 35d8dbbb25add265a880ab0dc48a229f06b08325 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 29 Aug 2023 20:46:31 +0200 Subject: thermal: core: Drop unused .get_trip_*() callbacks After recent changes in the ACPI thermal driver and in the Intel DTS IOSF thermal driver, all thermal zone drivers are expected to use trip tables for initialization and none of them should implement .get_trip_type(), .get_trip_temp() or .get_trip_hyst() callbacks, so drop these callbacks entirely from the core. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index b449a46766f5..e6bd3b7c9eda 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -62,11 +62,7 @@ struct thermal_zone_device_ops { int (*set_trips) (struct thermal_zone_device *, int, int); int (*change_mode) (struct thermal_zone_device *, enum thermal_device_mode); - int (*get_trip_type) (struct thermal_zone_device *, int, - enum thermal_trip_type *); - int (*get_trip_temp) (struct thermal_zone_device *, int, int *); int (*set_trip_temp) (struct thermal_zone_device *, int, int); - int (*get_trip_hyst) (struct thermal_zone_device *, int, int *); int (*set_trip_hyst) (struct thermal_zone_device *, int, int); int (*get_crit_temp) (struct thermal_zone_device *, int *); int (*set_emul_temp) (struct thermal_zone_device *, int); -- cgit v1.2.3 From 8289d810ea85755a9d22f75785806cb34eecd5e5 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 22 Aug 2023 13:40:06 +0200 Subject: thermal: core: Rework .get_trend() thermal zone callback Passing a struct thermal_trip pointer instead of a trip index to the .get_trend() thermal zone callback allows one of its 2 implementations, the thermal_get_trend() function in the ACPI thermal driver, to be simplified quite a bit, and the other implementation of it in the ti-soc-thermal driver does not even use the relevant callback argument. For this reason, change the .get_trend() thermal zone callback definition and adjust the related code accordingly. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index e6bd3b7c9eda..eb17495c8acc 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -53,6 +53,20 @@ enum thermal_notify_event { THERMAL_EVENT_KEEP_ALIVE, /* Request for user space handler to respond */ }; +/** + * struct thermal_trip - representation of a point in temperature domain + * @temperature: temperature value in miliCelsius + * @hysteresis: relative hysteresis in miliCelsius + * @type: trip point type + * @priv: pointer to driver data associated with this trip + */ +struct thermal_trip { + int temperature; + int hysteresis; + enum thermal_trip_type type; + void *priv; +}; + struct thermal_zone_device_ops { int (*bind) (struct thermal_zone_device *, struct thermal_cooling_device *); @@ -66,26 +80,12 @@ struct thermal_zone_device_ops { int (*set_trip_hyst) (struct thermal_zone_device *, int, int); int (*get_crit_temp) (struct thermal_zone_device *, int *); int (*set_emul_temp) (struct thermal_zone_device *, int); - int (*get_trend) (struct thermal_zone_device *, int, + int (*get_trend) (struct thermal_zone_device *, struct thermal_trip *, enum thermal_trend *); void (*hot)(struct thermal_zone_device *); void (*critical)(struct thermal_zone_device *); }; -/** - * struct thermal_trip - representation of a point in temperature domain - * @temperature: temperature value in miliCelsius - * @hysteresis: relative hysteresis in miliCelsius - * @type: trip point type - * @priv: pointer to driver data associated with this trip - */ -struct thermal_trip { - int temperature; - int hysteresis; - enum thermal_trip_type type; - void *priv; -}; - struct thermal_cooling_device_ops { int (*get_max_state) (struct thermal_cooling_device *, unsigned long *); int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *); -- cgit v1.2.3 From 218a06a79d9a98a96ef46bb003d4d8adb0962056 Mon Sep 17 00:00:00 2001 From: Jie Zhan Date: Tue, 22 Aug 2023 20:48:37 +0800 Subject: cpufreq: Support per-policy performance boost The boost control currently applies to the whole system. However, users may prefer to boost a subset of cores in order to provide prioritized performance to workloads running on the boosted cores. Enable per-policy boost by adding a 'boost' sysfs interface under each policy path. This can be found at: /sys/devices/system/cpu/cpufreq/policy<*>/boost Same to the global boost switch, writing 1/0 to the per-policy 'boost' enables/disables boost on a cpufreq policy respectively. The user view of global and per-policy boost controls should be: 1. Enabling global boost initially enables boost on all policies, and per-policy boost can then be enabled or disabled individually, given that the platform does support so. 2. Disabling global boost makes the per-policy boost interface illegal. Signed-off-by: Jie Zhan Reviewed-by: Wei Xu Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- include/linux/cpufreq.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 43b363a99215..71d186d6933a 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -141,6 +141,9 @@ struct cpufreq_policy { */ bool dvfs_possible_from_any_cpu; + /* Per policy boost enabled flag. */ + bool boost_enabled; + /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ unsigned int cached_target_freq; unsigned int cached_resolved_idx; -- cgit v1.2.3 From ae89408341f59e39d3b7d9e0d58905722136a176 Mon Sep 17 00:00:00 2001 From: Costa Shulyupin Date: Tue, 29 Aug 2023 11:25:51 +0300 Subject: sched/core: Add kernel-doc for set_cpus_allowed_ptr() This is an exported symbol, so it should have kernel-doc. Add a note to very similar function do_set_cpus_allowed() to avoid confusion and misuse. Signed-off-by: Costa Shulyupin Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230829082551.2661290-1-costa.shul@redhat.com --- include/linux/sched.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 177b3f3676ef..ae6fd76afaf5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1858,7 +1858,17 @@ extern int task_can_attach(struct task_struct *p); extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP + +/* do_set_cpus_allowed() - consider using set_cpus_allowed_ptr() instead */ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); + +/** + * set_cpus_allowed_ptr - set CPU affinity mask of a task + * @p: the task + * @new_mask: CPU affinity mask + * + * Return: zero if successful, or a negative error code + */ extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node); extern void release_user_cpus_ptr(struct task_struct *p); -- cgit v1.2.3 From cb18eca4b86768ec79e847795d1043356c9ee3b0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 9 Jul 2023 11:45:41 -0400 Subject: NFSD: Remove svc_rqst::rq_cacherep Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f8751118c122..fe1394cc1371 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -265,7 +265,6 @@ struct svc_rqst { /* Catering to nfsd */ struct auth_domain * rq_client; /* RPC peer info */ struct auth_domain * rq_gssclient; /* "gss/"-style peer info */ - struct svc_cacherep * rq_cacherep; /* cache info */ struct task_struct *rq_task; /* service thread */ struct net *rq_bc_net; /* pointer to backchannel's * net namespace -- cgit v1.2.3 From f80774787aa2b719d9c5f2d67a5901b59f219ce7 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Sat, 22 Jul 2023 11:31:16 +0800 Subject: sunrpc: Remove unused extern declarations Since commit 49b28684fdba ("nfsd: Remove deprecated nfsctl system call and related code.") these declarations are unused, so can remove it. Signed-off-by: YueHaibing Signed-off-by: Chuck Lever --- include/linux/sunrpc/svcauth.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 6d9cc9080aca..27582d3b538f 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -157,11 +157,8 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor); extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); -extern int auth_unix_add_addr(struct net *net, struct in6_addr *addr, struct auth_domain *dom); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); extern struct auth_domain *auth_domain_find(char *name); -extern struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr); -extern int auth_unix_forget_old(struct auth_domain *dom); extern void svcauth_unix_purge(struct net *net); extern void svcauth_unix_info_release(struct svc_xprt *xpt); extern int svcauth_unix_set_client(struct svc_rqst *rqstp); -- cgit v1.2.3 From 2eb2b93581813b74c7174961126f6ec38eadb5a7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 19 Jul 2023 14:31:03 -0400 Subject: SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly Add a helper to convert a whole xdr_buf directly into an array of bio_vecs, then send this array instead of iterating piecemeal over the xdr_buf containing the outbound RPC message. Reviewed-by: David Howells Signed-off-by: Chuck Lever --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f89ec4b5ea16..42f9d7eb9a1a 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -139,6 +139,8 @@ void xdr_terminate_string(const struct xdr_buf *, const u32); size_t xdr_buf_pagecount(const struct xdr_buf *buf); int xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp); void xdr_free_bvec(struct xdr_buf *buf); +unsigned int xdr_buf_to_bvec(struct bio_vec *bvec, unsigned int bvec_size, + const struct xdr_buf *xdr); static inline __be32 *xdr_encode_array(__be32 *p, const void *s, unsigned int len) { -- cgit v1.2.3 From e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 19 Jul 2023 14:31:09 -0400 Subject: SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call There is now enough infrastructure in place to combine the stream record marker into the biovec array used to send each outgoing RPC message on TCP. The whole message can be more efficiently sent with a single call to sock_sendmsg() using a bio_vec iterator. Note that this also helps with RPC-with-TLS: the TLS implementation can now clearly see where the upper layer message boundaries are. Before, it would send each component of the xdr_buf (record marker, head, page payload, tail) in separate TLS records. Suggested-by: David Howells Reviewed-by: David Howells Signed-off-by: Chuck Lever --- include/linux/sunrpc/svcsock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index a7116048a4d4..caf3308f1f07 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -38,6 +38,8 @@ struct svc_sock { /* Number of queued send requests */ atomic_t sk_sendqlen; + struct page_frag_cache sk_frag_cache; + struct completion sk_handshake_done; struct page * sk_pages[RPCSVC_MAXPAGES]; /* received data */ -- cgit v1.2.3 From 89d2d9fbeadcbdbd6302d3d0cd6bfbe219d85b68 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 19 Jul 2023 14:31:22 -0400 Subject: SUNRPC: Revert e0a912e8ddba Flamegraph analysis showed that the cork/uncork calls consume nearly a third of the CPU time spent in svc_tcp_sendto(). The other two consumers are mutex lock/unlock and svc_tcp_sendmsg(). Now that svc_tcp_sendto() coalesces RPC messages properly, there is no need to introduce artificial delays to prevent sending partial messages. After applying this change, I measured a 1.2K read IOPS increase for 8KB random I/O (several percent) on 56Gb IP over IB. Reviewed-by: David Howells Signed-off-by: Chuck Lever --- include/linux/sunrpc/svcsock.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index caf3308f1f07..a7ea54460b1a 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -35,8 +35,6 @@ struct svc_sock { /* Total length of the data (not including fragment headers) * received so far in the fragments making up this rpc: */ u32 sk_datalen; - /* Number of queued send requests */ - atomic_t sk_sendqlen; struct page_frag_cache sk_frag_cache; -- cgit v1.2.3 From 18e4cf915543257eae2925671934937163f5639b Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 31 Jul 2023 16:48:31 +1000 Subject: nfsd: Simplify code around svc_exit_thread() call in nfsd() Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed. Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index fe1394cc1371..2230148d9d68 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -120,19 +120,6 @@ static inline void svc_put(struct svc_serv *serv) kref_put(&serv->sv_refcnt, svc_destroy); } -/** - * svc_put_not_last - decrement non-final reference count on SUNRPC serv - * @serv: the svc_serv to have count decremented - * - * Returns: %true is refcount was decremented. - * - * If the refcount is 1, it is not decremented and instead failure is reported. - */ -static inline bool svc_put_not_last(struct svc_serv *serv) -{ - return refcount_dec_not_one(&serv->sv_refcnt.refcount); -} - /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is -- cgit v1.2.3 From 7b719e2bf342a59e88b2b6215b98ca4cf824bc58 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 18 Jul 2023 16:38:08 +1000 Subject: SUNRPC: change svc_recv() to return void. svc_recv() currently returns a 0 on success or one of two errors: - -EAGAIN means no message was successfully received - -EINTR means the thread has been told to stop Previously nfsd would stop as the result of a signal as well as following kthread_stop(). In that case the difference was useful: EINTR means stop unconditionally. EAGAIN means stop if kthread_should_stop(), continue otherwise. Now threads only exit when kthread_should_stop() so we don't need the distinction. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svcsock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index a7ea54460b1a..c0ddd331d82f 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -57,7 +57,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk) * Function prototypes. */ void svc_close_net(struct svc_serv *, struct net *); -int svc_recv(struct svc_rqst *, long); +void svc_recv(struct svc_rqst *, long); void svc_send(struct svc_rqst *rqstp); void svc_drop(struct svc_rqst *); void svc_sock_update_bufs(struct svc_serv *serv); -- cgit v1.2.3 From c743b4259c3af2c0637c307f08a062d25fa3c99f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 18 Jul 2023 16:38:08 +1000 Subject: SUNRPC: remove timeout arg from svc_recv() Most svc threads have no interest in a timeout. nfsd sets it to 1 hour, but this is a wart of no significance. lockd uses the timeout so that it can call nlmsvc_retry_blocked(). It also sometimes calls svc_wake_up() to ensure this is called. So change lockd to be consistent and always use svc_wake_up() to trigger nlmsvc_retry_blocked() - using a timer instead of a timeout to svc_recv(). And change svc_recv() to not take a timeout arg. This makes the sp_threads_timedout counter always zero. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/lockd/lockd.h | 4 +++- include/linux/sunrpc/svc.h | 1 - include/linux/sunrpc/svcsock.h | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index f42594a9efe0..0f016d69c996 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -204,6 +204,8 @@ extern unsigned long nlmsvc_timeout; extern bool nsm_use_hostnames; extern u32 nsm_local_state; +extern struct timer_list nlmsvc_retry; + /* * Lockd client functions */ @@ -280,7 +282,7 @@ __be32 nlmsvc_testlock(struct svc_rqst *, struct nlm_file *, struct nlm_host *, struct nlm_lock *, struct nlm_lock *, struct nlm_cookie *); __be32 nlmsvc_cancel_blocked(struct net *net, struct nlm_file *, struct nlm_lock *); -unsigned long nlmsvc_retry_blocked(void); +void nlmsvc_retry_blocked(void); void nlmsvc_traverse_blocks(struct nlm_host *, struct nlm_file *, nlm_host_match_fn_t match); void nlmsvc_grant_reply(struct nlm_cookie *, __be32); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 2230148d9d68..b206fdde8e97 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -41,7 +41,6 @@ struct svc_pool { /* statistics on pool operation */ struct percpu_counter sp_sockets_queued; struct percpu_counter sp_threads_woken; - struct percpu_counter sp_threads_timedout; #define SP_TASK_PENDING (0) /* still work to do even if no * xprt is queued. */ diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index c0ddd331d82f..d4a173c5b3be 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -57,7 +57,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk) * Function prototypes. */ void svc_close_net(struct svc_serv *, struct net *); -void svc_recv(struct svc_rqst *, long); +void svc_recv(struct svc_rqst *rqstp); void svc_send(struct svc_rqst *rqstp); void svc_drop(struct svc_rqst *); void svc_sock_update_bufs(struct svc_serv *serv); -- cgit v1.2.3 From ba4bba6c97d40fa9f2aa25a34f6e9717a468c8f3 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 29 Jul 2023 14:31:55 -0400 Subject: SUNRPC: change cache_head.flags bits to enum When a sequence of numbers are needed for internal-use only, an enum is typically best. The sequence will inevitably need to be changed one day, and having an enum means the developer doesn't need to think about renumbering after insertion or deletion. Such patches will be easier to review. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/cache.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h index 518bd28f5ab8..35766963dd14 100644 --- a/include/linux/sunrpc/cache.h +++ b/include/linux/sunrpc/cache.h @@ -56,10 +56,14 @@ struct cache_head { struct kref ref; unsigned long flags; }; -#define CACHE_VALID 0 /* Entry contains valid data */ -#define CACHE_NEGATIVE 1 /* Negative entry - there is no match for the key */ -#define CACHE_PENDING 2 /* An upcall has been sent but no reply received yet*/ -#define CACHE_CLEANED 3 /* Entry has been cleaned from cache */ + +/* cache_head.flags */ +enum { + CACHE_VALID, /* Entry contains valid data */ + CACHE_NEGATIVE, /* Negative entry - there is no match for the key */ + CACHE_PENDING, /* An upcall has been sent but no reply received yet*/ + CACHE_CLEANED, /* Entry has been cleaned from cache */ +}; #define CACHE_NEW_EXPIRY 120 /* keep new things pending confirmation for 120 seconds */ -- cgit v1.2.3 From 3275694adf0f89e1cdcacfee16103be6643c2a0c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 29 Jul 2023 14:33:05 -0400 Subject: SUNRPC: change svc_pool::sp_flags bits to enum When a sequence of numbers are needed for internal-use only, an enum is typically best. The sequence will inevitably need to be changed one day, and having an enum means the developer doesn't need to think about renumbering after insertion or deletion. Such patches will be easier to review. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index b206fdde8e97..dc526240de5e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -42,12 +42,16 @@ struct svc_pool { struct percpu_counter sp_sockets_queued; struct percpu_counter sp_threads_woken; -#define SP_TASK_PENDING (0) /* still work to do even if no - * xprt is queued. */ -#define SP_CONGESTED (1) unsigned long sp_flags; } ____cacheline_aligned_in_smp; +/* bits for sp_flags */ +enum { + SP_TASK_PENDING, /* still work to do even if no xprt is queued */ + SP_CONGESTED, /* all threads are busy, none idle */ +}; + + /* * RPC service. * -- cgit v1.2.3 From a6b4ec39036fd78b95205eef3be121454b7d2973 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 29 Jul 2023 14:34:12 -0400 Subject: SUNRPC: change svc_rqst::rq_flags bits to enum When a sequence of numbers are needed for internal-use only, an enum is typically best. The sequence will inevitably need to be changed one day, and having an enum means the developer doesn't need to think about renumbering after insertion or deletion. Such patches will be easier to review. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index dc526240de5e..9b429253dbd9 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -222,16 +222,6 @@ struct svc_rqst { u32 rq_proc; /* procedure number */ u32 rq_prot; /* IP protocol */ int rq_cachetype; /* catering to nfsd */ -#define RQ_SECURE (0) /* secure port */ -#define RQ_LOCAL (1) /* local request */ -#define RQ_USEDEFERRAL (2) /* use deferral */ -#define RQ_DROPME (3) /* drop current reply */ -#define RQ_SPLICE_OK (4) /* turned off in gss privacy - * to prevent encrypting page - * cache pages */ -#define RQ_VICTIM (5) /* about to be shut down */ -#define RQ_BUSY (6) /* request is busy */ -#define RQ_DATA (7) /* request has data */ unsigned long rq_flags; /* flags field */ ktime_t rq_qtime; /* enqueue time */ @@ -262,6 +252,19 @@ struct svc_rqst { void ** rq_lease_breaker; /* The v4 client breaking a lease */ }; +/* bits for rq_flags */ +enum { + RQ_SECURE, /* secure port */ + RQ_LOCAL, /* local request */ + RQ_USEDEFERRAL, /* use deferral */ + RQ_DROPME, /* drop current reply */ + RQ_SPLICE_OK, /* turned off in gss privacy to prevent + * encrypting page cache pages */ + RQ_VICTIM, /* about to be shut down */ + RQ_BUSY, /* request is busy */ + RQ_DATA, /* request has data */ +}; + #define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net : rqst->rq_bc_net) /* -- cgit v1.2.3 From d75e490f35601aae12c7284d3c22684c65fb8354 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 29 Jul 2023 14:36:10 -0400 Subject: SUNRPC: change svc_xprt::xpt_flags bits to enum When a sequence of numbers are needed for internal-use only, an enum is typically best. The sequence will inevitably need to be changed one day, and having an enum means the developer doesn't need to think about renumbering after insertion or deletion. Such patches will be easier to review. Suggested-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_xprt.h | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index a6b12631db21..fa55d12dc765 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -56,23 +56,6 @@ struct svc_xprt { struct list_head xpt_list; struct list_head xpt_ready; unsigned long xpt_flags; -#define XPT_BUSY 0 /* enqueued/receiving */ -#define XPT_CONN 1 /* conn pending */ -#define XPT_CLOSE 2 /* dead or dying */ -#define XPT_DATA 3 /* data pending */ -#define XPT_TEMP 4 /* connected transport */ -#define XPT_DEAD 6 /* transport closed */ -#define XPT_CHNGBUF 7 /* need to change snd/rcv buf sizes */ -#define XPT_DEFERRED 8 /* deferred request pending */ -#define XPT_OLD 9 /* used for xprt aging mark+sweep */ -#define XPT_LISTENER 10 /* listening endpoint */ -#define XPT_CACHE_AUTH 11 /* cache auth info */ -#define XPT_LOCAL 12 /* connection from loopback interface */ -#define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ -#define XPT_CONG_CTRL 14 /* has congestion control */ -#define XPT_HANDSHAKE 15 /* xprt requests a handshake */ -#define XPT_TLS_SESSION 16 /* transport-layer security established */ -#define XPT_PEER_AUTH 17 /* peer has been authenticated */ struct svc_serv *xpt_server; /* service for transport */ atomic_t xpt_reserved; /* space on outq that is rsvd */ @@ -97,6 +80,27 @@ struct svc_xprt { struct rpc_xprt_switch *xpt_bc_xps; /* NFSv4.1 backchannel */ }; +/* flag bits for xpt_flags */ +enum { + XPT_BUSY, /* enqueued/receiving */ + XPT_CONN, /* conn pending */ + XPT_CLOSE, /* dead or dying */ + XPT_DATA, /* data pending */ + XPT_TEMP, /* connected transport */ + XPT_DEAD, /* transport closed */ + XPT_CHNGBUF, /* need to change snd/rcv buf sizes */ + XPT_DEFERRED, /* deferred request pending */ + XPT_OLD, /* used for xprt aging mark+sweep */ + XPT_LISTENER, /* listening endpoint */ + XPT_CACHE_AUTH, /* cache auth info */ + XPT_LOCAL, /* connection from loopback interface */ + XPT_KILL_TEMP, /* call xpo_kill_temp_xprt before closing */ + XPT_CONG_CTRL, /* has congestion control */ + XPT_HANDSHAKE, /* xprt requests a handshake */ + XPT_TLS_SESSION, /* transport-layer security established */ + XPT_PEER_AUTH, /* peer has been authenticated */ +}; + static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u) { spin_lock(&xpt->xpt_lock); -- cgit v1.2.3 From 78c542f916bccafffef4f3bec9bc60d7cda548f5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 29 Jul 2023 20:58:54 -0400 Subject: SUNRPC: Add enum svc_auth_status In addition to the benefits of using an enum rather than a set of macros, we now have a named type that can improve static type checking of function return values. As part of this change, I removed a stale comment from svcauth.h; the return values from current implementations of the auth_ops::release method are all zero/negative errno, not the SVC_OK enum values as the old comment suggested. Suggested-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 2 +- include/linux/sunrpc/svcauth.h | 50 ++++++++++++++++++++---------------------- 2 files changed, 25 insertions(+), 27 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 9b429253dbd9..1c491f02efc8 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -336,7 +336,7 @@ struct svc_program { char * pg_name; /* service name */ char * pg_class; /* class name: services sharing authentication */ struct svc_stat * pg_stats; /* rpc statistics */ - int (*pg_authenticate)(struct svc_rqst *); + enum svc_auth_status (*pg_authenticate)(struct svc_rqst *rqstp); __be32 (*pg_init_request)(struct svc_rqst *, const struct svc_program *, struct svc_process_info *); diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 27582d3b538f..6f90203edbf8 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -83,6 +83,19 @@ struct auth_domain { struct rcu_head rcu_head; }; +enum svc_auth_status { + SVC_GARBAGE = 1, + SVC_SYSERR, + SVC_VALID, + SVC_NEGATIVE, + SVC_OK, + SVC_DROP, + SVC_CLOSE, + SVC_DENIED, + SVC_PENDING, + SVC_COMPLETE, +}; + /* * Each authentication flavour registers an auth_ops * structure. @@ -98,6 +111,8 @@ struct auth_domain { * is (probably) already in place. Certainly space is * reserved for it. * DROP - simply drop the request. It may have been deferred + * CLOSE - like SVC_DROP, but request is definitely lost. + * If there is a tcp connection, it should be closed. * GARBAGE - rpc garbage_args error * SYSERR - rpc system_err error * DENIED - authp holds reason for denial. @@ -111,14 +126,10 @@ struct auth_domain { * * release() is given a request after the procedure has been run. * It should sign/encrypt the results if needed - * It should return: - * OK - the resbuf is ready to be sent - * DROP - the reply should be quitely dropped - * DENIED - authp holds a reason for MSG_DENIED - * SYSERR - rpc system_err * * domain_release() * This call releases a domain. + * * set_client() * Givens a pending request (struct svc_rqst), finds and assigns * an appropriate 'auth_domain' as the client. @@ -127,31 +138,18 @@ struct auth_ops { char * name; struct module *owner; int flavour; - int (*accept)(struct svc_rqst *rq); - int (*release)(struct svc_rqst *rq); - void (*domain_release)(struct auth_domain *); - int (*set_client)(struct svc_rqst *rq); -}; -#define SVC_GARBAGE 1 -#define SVC_SYSERR 2 -#define SVC_VALID 3 -#define SVC_NEGATIVE 4 -#define SVC_OK 5 -#define SVC_DROP 6 -#define SVC_CLOSE 7 /* Like SVC_DROP, but request is definitely - * lost so if there is a tcp connection, it - * should be closed - */ -#define SVC_DENIED 8 -#define SVC_PENDING 9 -#define SVC_COMPLETE 10 + enum svc_auth_status (*accept)(struct svc_rqst *rqstp); + int (*release)(struct svc_rqst *rqstp); + void (*domain_release)(struct auth_domain *dom); + enum svc_auth_status (*set_client)(struct svc_rqst *rqstp); +}; struct svc_xprt; -extern int svc_authenticate(struct svc_rqst *rqstp); +extern enum svc_auth_status svc_authenticate(struct svc_rqst *rqstp); extern int svc_authorise(struct svc_rqst *rqstp); -extern int svc_set_client(struct svc_rqst *rqstp); +extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp); extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); extern void svc_auth_unregister(rpc_authflavor_t flavor); @@ -161,7 +159,7 @@ extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *ne extern struct auth_domain *auth_domain_find(char *name); extern void svcauth_unix_purge(struct net *net); extern void svcauth_unix_info_release(struct svc_xprt *xpt); -extern int svcauth_unix_set_client(struct svc_rqst *rqstp); +extern enum svc_auth_status svcauth_unix_set_client(struct svc_rqst *rqstp); extern int unix_gid_cache_create(struct net *net); extern void unix_gid_cache_destroy(struct net *net); -- cgit v1.2.3 From 850bac3ae4a636e9e6bb8de62fe697ac171cb221 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 10 Jul 2023 12:42:00 -0400 Subject: SUNRPC: Deduplicate thread wake-up code Refactor: Extract the loop that finds an idle service thread from svc_xprt_enqueue() and svc_wake_up(). Both functions do just about the same thing. Note that svc_wake_up() currently does not hold the RCU read lock while waking the target thread. It indeed should hold the lock, just as svc_xprt_enqueue() does, to ensure the rqstp does not vanish during the wake-up. This patch adds the RCU lock for svc_wake_up(). Note that shrinking the pool thread count is rare, and calls to svc_wake_up() are also quite infrequent. In practice, this race is very unlikely to be hit, so we are not marking the lock fix for stable backport at this time. Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 1c491f02efc8..2b9c6df23078 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -419,6 +419,7 @@ int svc_register(const struct svc_serv *, struct net *, const int, void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); +bool svc_pool_wake_idle_thread(struct svc_pool *pool); struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv); char * svc_print_addr(struct svc_rqst *, char *, size_t); const char * svc_proc_name(const struct svc_rqst *rqstp); -- cgit v1.2.3 From f208e9508ace01864f2b37a45d07cda0641ff3ea Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 10 Jul 2023 12:42:20 -0400 Subject: SUNRPC: Count ingress RPC messages per svc_pool svc_xprt_enqueue() can be costly, since it involves selecting and waking up a process. More than one enqueue is done per incoming RPC. For example, svc_data_ready() enqueues, and so does svc_xprt_receive(). Also, if an RPC message requires more than one call to ->recvfrom() to receive it fully, each one of those calls does an enqueue. To get a sense of the average number of transport enqueue operations needed to process an incoming RPC message, re-use the "packets" pool stat. Track the number of complete RPC messages processed by each thread pool. Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 2b9c6df23078..7838b37bcfa8 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -39,6 +39,7 @@ struct svc_pool { struct list_head sp_all_threads; /* all server threads */ /* statistics on pool operation */ + struct percpu_counter sp_messages_arrived; struct percpu_counter sp_sockets_queued; struct percpu_counter sp_threads_woken; -- cgit v1.2.3 From 2a4557452aacf9e7168cb83bc102467094ff9391 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 31 Jul 2023 16:48:29 +1000 Subject: SUNRPC: Remove return value of svc_pool_wake_idle_thread() The returned value is not used (any more), so don't return it. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 7838b37bcfa8..dbf5b21feafe 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -420,7 +420,7 @@ int svc_register(const struct svc_serv *, struct net *, const int, void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); -bool svc_pool_wake_idle_thread(struct svc_pool *pool); +void svc_pool_wake_idle_thread(struct svc_pool *pool); struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv); char * svc_print_addr(struct svc_rqst *, char *, size_t); const char * svc_proc_name(const struct svc_rqst *rqstp); -- cgit v1.2.3 From 07dc19dbd1d194397d7ae1c4781e203f8419b3c6 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 21 Aug 2023 20:33:46 +0800 Subject: SUNRPC: Remove unused declarations Commit c7d7ec8f043e ("SUNRPC: Remove svc_shutdown_net()") removed svc_close_net() implementation but left declaration in place. Remove it. Commit 1f11a034cdc4 ("SUNRPC new transport for the NFSv4.1 shared back channel") removed svc_sock_create()/svc_sock_destroy() but not the declarations. Signed-off-by: Yue Haibing Signed-off-by: Chuck Lever --- include/linux/sunrpc/svcsock.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index d4a173c5b3be..7c78ec6356b9 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -56,7 +56,6 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk) /* * Function prototypes. */ -void svc_close_net(struct svc_serv *, struct net *); void svc_recv(struct svc_rqst *rqstp); void svc_send(struct svc_rqst *rqstp); void svc_drop(struct svc_rqst *); @@ -66,8 +65,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net, const struct cred *cred); void svc_init_xprt_sock(void); void svc_cleanup_xprt_sock(void); -struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot); -void svc_sock_destroy(struct svc_xprt *); /* * svc_makesock socket characteristics -- cgit v1.2.3 From 899525e892dd165d2bb2e41f9f9d9d82574b31b5 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 9 Aug 2023 22:14:26 +0800 Subject: SUNRPC: Remove unused declaration rpc_modcount() These declarations are never implemented since the beginning of git history. Remove these, then merge the two #ifdef block for simplification. Signed-off-by: Yue Haibing Reviewed-by: NeilBrown Signed-off-by: Chuck Lever --- include/linux/sunrpc/stats.h | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/stats.h b/include/linux/sunrpc/stats.h index d94d4f410507..3ce1550d1beb 100644 --- a/include/linux/sunrpc/stats.h +++ b/include/linux/sunrpc/stats.h @@ -43,22 +43,6 @@ struct net; #ifdef CONFIG_PROC_FS int rpc_proc_init(struct net *); void rpc_proc_exit(struct net *); -#else -static inline int rpc_proc_init(struct net *net) -{ - return 0; -} - -static inline void rpc_proc_exit(struct net *net) -{ -} -#endif - -#ifdef MODULE -void rpc_modcount(struct inode *, int); -#endif - -#ifdef CONFIG_PROC_FS struct proc_dir_entry * rpc_proc_register(struct net *,struct rpc_stat *); void rpc_proc_unregister(struct net *,const char *); void rpc_proc_zero(const struct rpc_program *); @@ -69,7 +53,14 @@ void svc_proc_unregister(struct net *, const char *); void svc_seq_show(struct seq_file *, const struct svc_stat *); #else +static inline int rpc_proc_init(struct net *net) +{ + return 0; +} +static inline void rpc_proc_exit(struct net *net) +{ +} static inline struct proc_dir_entry *rpc_proc_register(struct net *net, struct rpc_stat *s) { return NULL; } static inline void rpc_proc_unregister(struct net *net, const char *p) {} static inline void rpc_proc_zero(const struct rpc_program *p) {} -- cgit v1.2.3 From 0d6b35283bcf1a379cf20066544af8e6a6b16b46 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 30 Aug 2023 09:04:19 +1000 Subject: sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE task_state_index() ignores uninteresting state flags (such as TASK_FREEZABLE) for most states, but for TASK_IDLE and TASK_RTLOCK_WAIT it does not. So if a task is waiting TASK_IDLE|TASK_FREEZABLE it gets incorrectly reported as TASK_UNINTERRUPTIBLE or "D". (it is planned for nfsd to change to use this state). Fix this by only testing the interesting bits and not the irrelevant bits in __task_state_index() Signed-off-by: NeilBrown Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/169335025927.5133.4781141800413736103@noble.neil.brown.name --- include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index ae6fd76afaf5..77f01ac385f7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1671,7 +1671,7 @@ static inline unsigned int __task_state_index(unsigned int tsk_state, BUILD_BUG_ON_NOT_POWER_OF_2(TASK_REPORT_MAX); - if (tsk_state == TASK_IDLE) + if ((tsk_state & TASK_IDLE) == TASK_IDLE) state = TASK_REPORT_IDLE; /* @@ -1679,7 +1679,7 @@ static inline unsigned int __task_state_index(unsigned int tsk_state, * to userspace, we can make this appear as if the task has gone through * a regular rt_mutex_lock() call. */ - if (tsk_state == TASK_RTLOCK_WAIT) + if (tsk_state & TASK_RTLOCK_WAIT) state = TASK_UNINTERRUPTIBLE; return fls(state); -- cgit v1.2.3 From 7e9be1124dbe7888907e82cab20164578e3f9ab7 Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Tue, 29 Aug 2023 19:51:57 +0200 Subject: netfilter: nf_tables: Audit log setelem reset Since set element reset is not integrated into nf_tables' transaction logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET handling. For the sake of simplicity, catchall element reset will always generate a dedicated log entry. This relieves nf_tables_dump_set() from having to adjust the logged element count depending on whether a catchall element was found or not. Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET") Signed-off-by: Phil Sutter Reviewed-by: Richard Guy Briggs Signed-off-by: Pablo Neira Ayuso --- include/linux/audit.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/audit.h b/include/linux/audit.h index 6a3a9e122bb5..192bf03aacc5 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -117,6 +117,7 @@ enum audit_nfcfgop { AUDIT_NFT_OP_OBJ_RESET, AUDIT_NFT_OP_FLOWTABLE_REGISTER, AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, + AUDIT_NFT_OP_SETELEM_RESET, AUDIT_NFT_OP_INVALID, }; -- cgit v1.2.3 From ea078ae9108e25fc881c84369f7c03931d22e555 Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Tue, 29 Aug 2023 19:51:58 +0200 Subject: netfilter: nf_tables: Audit log rule reset Resetting rules' stateful data happens outside of the transaction logic, so 'get' and 'dump' handlers have to emit audit log entries themselves. Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET") Signed-off-by: Phil Sutter Reviewed-by: Richard Guy Briggs Signed-off-by: Pablo Neira Ayuso --- include/linux/audit.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/audit.h b/include/linux/audit.h index 192bf03aacc5..51b1b7054a23 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -118,6 +118,7 @@ enum audit_nfcfgop { AUDIT_NFT_OP_FLOWTABLE_REGISTER, AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, AUDIT_NFT_OP_SETELEM_RESET, + AUDIT_NFT_OP_RULE_RESET, AUDIT_NFT_OP_INVALID, }; -- cgit v1.2.3 From 69881be3d9a00cca770886af40913cfc5274b2d0 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 29 Aug 2023 17:23:56 +0200 Subject: fs: export sget_dev() They will be used for mtd devices as well. Acked-by: Richard Weinberger Reviewed-by: Jan Kara Message-Id: <20230829-vfs-super-mtd-v1-1-fecb572e5df3@kernel.org> Signed-off-by: Christian Brauner Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index c8ff4156a0a1..4aeb3fa11927 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2397,6 +2397,7 @@ struct super_block *sget(struct file_system_type *type, int (*test)(struct super_block *,void *), int (*set)(struct super_block *,void *), int flags, void *data); +struct super_block *sget_dev(struct fs_context *fc, dev_t dev); /* Alas, no aliases. Too much hassle with bringing module.h everywhere */ #define fops_get(fops) \ -- cgit v1.2.3 From 3af5ae22030cb59fab4fba35f5a2b62f47e14df9 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Tue, 25 Jul 2023 09:44:40 +0800 Subject: ceph: make members in struct ceph_mds_request_args_ext a union In ceph mainline it will allow to set the btime in the setattr request and just add a 'btime' member in the union 'ceph_mds_request_args' and then bump up the header version to 4. That means the total size of union 'ceph_mds_request_args' will increase sizeof(struct ceph_timespec) bytes, but in kclient it will increase the sizeof(setattr_ext) bytes for each request. Since the MDS will always depend on the header's vesion and front_len members to decode the 'ceph_mds_request_head' struct, at the same time kclient hasn't supported the 'btime' feature yet in setattr request, so it's safe to do this change here. This will save 48 bytes memories for each request. Fixes: 4f1ddb1ea874 ("ceph: implement updated ceph_mds_request_head structure") Signed-off-by: Xiubo Li Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 45f8ce61e103..ae44812eb6d4 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -467,17 +467,19 @@ union ceph_mds_request_args { } __attribute__ ((packed)); union ceph_mds_request_args_ext { - union ceph_mds_request_args old; - struct { - __le32 mode; - __le32 uid; - __le32 gid; - struct ceph_timespec mtime; - struct ceph_timespec atime; - __le64 size, old_size; /* old_size needed by truncate */ - __le32 mask; /* CEPH_SETATTR_* */ - struct ceph_timespec btime; - } __attribute__ ((packed)) setattr_ext; + union { + union ceph_mds_request_args old; + struct { + __le32 mode; + __le32 uid; + __le32 gid; + struct ceph_timespec mtime; + struct ceph_timespec atime; + __le64 size, old_size; /* old_size needed by truncate */ + __le32 mask; /* CEPH_SETATTR_* */ + struct ceph_timespec btime; + } __attribute__ ((packed)) setattr_ext; + }; }; #define CEPH_MDS_FLAG_REPLAY 1 /* this is a replayed op */ -- cgit v1.2.3 From ce0d5bd3a6c176f9a3bf867624a07119dd4d0878 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Tue, 25 Jul 2023 17:51:59 +0800 Subject: ceph: make num_fwd and num_retry to __u32 The num_fwd in MClientRequestForward is int32_t, while the num_fwd in ceph_mds_request_head is __u8. This is buggy when the num_fwd is larger than 256 it will always be truncate to 0 again. But the client couldn't recoginize this. This will make them to __u32 instead. Because the old cephs will directly copy the raw memories when decoding the reqeust's head, so we need to make sure this kclient will be compatible with old cephs. For newer cephs they will decode the requests depending the version, which will be much simpler and easier to extend new members. Link: https://tracker.ceph.com/issues/62145 Signed-off-by: Xiubo Li Reviewed-by: Alexander Mikhalitsyn Reviewed-by: Milind Changire Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index ae44812eb6d4..5f2301ee88bc 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -486,7 +486,7 @@ union ceph_mds_request_args_ext { #define CEPH_MDS_FLAG_WANT_DENTRY 2 /* want dentry in reply */ #define CEPH_MDS_FLAG_ASYNC 4 /* request is asynchronous */ -struct ceph_mds_request_head_old { +struct ceph_mds_request_head_legacy { __le64 oldest_client_tid; __le32 mdsmap_epoch; /* on client */ __le32 flags; /* CEPH_MDS_FLAG_* */ @@ -499,9 +499,9 @@ struct ceph_mds_request_head_old { union ceph_mds_request_args args; } __attribute__ ((packed)); -#define CEPH_MDS_REQUEST_HEAD_VERSION 1 +#define CEPH_MDS_REQUEST_HEAD_VERSION 2 -struct ceph_mds_request_head { +struct ceph_mds_request_head_old { __le16 version; /* struct version */ __le64 oldest_client_tid; __le32 mdsmap_epoch; /* on client */ @@ -515,6 +515,23 @@ struct ceph_mds_request_head { union ceph_mds_request_args_ext args; } __attribute__ ((packed)); +struct ceph_mds_request_head { + __le16 version; /* struct version */ + __le64 oldest_client_tid; + __le32 mdsmap_epoch; /* on client */ + __le32 flags; /* CEPH_MDS_FLAG_* */ + __u8 num_retry, num_fwd; /* legacy count retry and fwd attempts */ + __le16 num_releases; /* # include cap/lease release records */ + __le32 op; /* mds op code */ + __le32 caller_uid, caller_gid; + __le64 ino; /* use this ino for openc, mkdir, mknod, + etc. (if replaying) */ + union ceph_mds_request_args_ext args; + + __le32 ext_num_retry; /* new count retry attempts */ + __le32 ext_num_fwd; /* new count fwd attempts */ +} __attribute__ ((packed)); + /* cap/lease release record */ struct ceph_mds_request_release { __le64 ino, cap_id; /* ino and unique cap id */ -- cgit v1.2.3 From 52e322eda3d475614210efbc0f2793a1da9d367a Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 28 Jul 2023 17:47:22 -0700 Subject: KVM: x86/mmu: BUG() in rmap helpers iff CONFIG_BUG_ON_DATA_CORRUPTION=y Introduce KVM_BUG_ON_DATA_CORRUPTION() and use it in the low-level rmap helpers to convert the existing BUG()s to WARN_ON_ONCE() when the kernel is built with CONFIG_BUG_ON_DATA_CORRUPTION=n, i.e. does NOT want to BUG() on corruption of host kernel data structures. Environments that don't have infrastructure to automatically capture crash dumps, i.e. aren't likely to enable CONFIG_BUG_ON_DATA_CORRUPTION=y, are typically better served overall by WARN-and-continue behavior (for the kernel, the VM is dead regardless), as a BUG() while holding mmu_lock all but guarantees the _best_ case scenario is a panic(). Make the BUG()s conditional instead of removing/replacing them entirely as there's a non-zero chance (though by no means a guarantee) that the damage isn't contained to the target VM, e.g. if no rmap is found for a SPTE then KVM may be double-zapping the SPTE, i.e. has already freed the memory the SPTE pointed at and thus KVM is reading/writing memory that KVM no longer owns. Link: https://lore.kernel.org/all/20221129191237.31447-1-mizhang@google.com Suggested-by: Mingwei Zhang Cc: David Matlack Cc: Jim Mattson Reviewed-by: Mingwei Zhang Link: https://lore.kernel.org/r/20230729004722.1056172-13-seanjc@google.com Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1b583f35547e..fb6c6109fdca 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -867,6 +867,25 @@ static inline void kvm_vm_bugged(struct kvm *kvm) unlikely(__ret); \ }) +/* + * Note, "data corruption" refers to corruption of host kernel data structures, + * not guest data. Guest data corruption, suspected or confirmed, that is tied + * and contained to a single VM should *never* BUG() and potentially panic the + * host, i.e. use this variant of KVM_BUG() if and only if a KVM data structure + * is corrupted and that corruption can have a cascading effect to other parts + * of the hosts and/or to other VMs. + */ +#define KVM_BUG_ON_DATA_CORRUPTION(cond, kvm) \ +({ \ + bool __ret = !!(cond); \ + \ + if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION)) \ + BUG_ON(__ret); \ + else if (WARN_ON_ONCE(__ret && !(kvm)->vm_bugged)) \ + kvm_vm_bugged(kvm); \ + unlikely(__ret); \ +}) + static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PROVE_RCU -- cgit v1.2.3 From 6a86b5b5cd76d2734304a0173f5f01aa8aa2025e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 29 Aug 2023 22:53:52 +0200 Subject: bpf: Annotate bpf_long_memcpy with data_race syzbot reported a data race splat between two processes trying to update the same BPF map value via syscall on different CPUs: BUG: KCSAN: data-race in bpf_percpu_array_update / bpf_percpu_array_update write to 0xffffe8fffe7425d8 of 8 bytes by task 8257 on cpu 1: bpf_long_memcpy include/linux/bpf.h:428 [inline] bpf_obj_memcpy include/linux/bpf.h:441 [inline] copy_map_value_long include/linux/bpf.h:464 [inline] bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380 bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749 bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648 __sys_bpf+0x28a/0x780 __do_sys_bpf kernel/bpf/syscall.c:5241 [inline] __se_sys_bpf kernel/bpf/syscall.c:5239 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffffe8fffe7425d8 of 8 bytes by task 8268 on cpu 0: bpf_long_memcpy include/linux/bpf.h:428 [inline] bpf_obj_memcpy include/linux/bpf.h:441 [inline] copy_map_value_long include/linux/bpf.h:464 [inline] bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380 bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749 bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648 __sys_bpf+0x28a/0x780 __do_sys_bpf kernel/bpf/syscall.c:5241 [inline] __se_sys_bpf kernel/bpf/syscall.c:5239 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0xfffffff000002788 The bpf_long_memcpy is used with 8-byte aligned pointers, power-of-8 size and forced to use long read/writes to try to atomically copy long counters. It is best-effort only and no barriers are here since it _will_ race with concurrent updates from BPF programs. The bpf_long_memcpy() is called from bpf(2) syscall. Marco suggested that the best way to make this known to KCSAN would be to use data_race() annotation. Reported-by: syzbot+97522333291430dd277f@syzkaller.appspotmail.com Suggested-by: Marco Elver Signed-off-by: Daniel Borkmann Acked-by: Marco Elver Link: https://lore.kernel.org/bpf/000000000000d87a7f06040c970c@google.com Link: https://lore.kernel.org/bpf/57628f7a15e20d502247c3b55fceb1cb2b31f266.1693342186.git.daniel@iogearbox.net --- include/linux/bpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 12596af59c00..024e8b28c34b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -438,7 +438,7 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) size /= sizeof(long); while (size--) - *ldst++ = *lsrc++; + data_race(*ldst++ = *lsrc++); } /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */ -- cgit v1.2.3 From bfac19e239a7d3e98946fa7df362fcbfd40da4cf Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 31 Aug 2023 22:57:17 +0200 Subject: fbdev: mx3fb: Remove the driver The mx3fb driver does not support devicetree and i.MX has been converted to a DT-only platform since kernel 5.10. As there is no user for this driver anymore, just remove it. Signed-off-by: Fabio Estevam Reviewed-by: Arnd Bergmann Signed-off-by: Helge Deller --- include/linux/platform_data/video-mx3fb.h | 50 ------------------------------- 1 file changed, 50 deletions(-) delete mode 100644 include/linux/platform_data/video-mx3fb.h (limited to 'include/linux') diff --git a/include/linux/platform_data/video-mx3fb.h b/include/linux/platform_data/video-mx3fb.h deleted file mode 100644 index d03dc322a616..000000000000 --- a/include/linux/platform_data/video-mx3fb.h +++ /dev/null @@ -1,50 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2008 - * Guennadi Liakhovetski, DENX Software Engineering, - */ - -#ifndef __ASM_ARCH_MX3FB_H__ -#define __ASM_ARCH_MX3FB_H__ - -#include -#include - -/* Proprietary FB_SYNC_ flags */ -#define FB_SYNC_OE_ACT_HIGH 0x80000000 -#define FB_SYNC_CLK_INVERT 0x40000000 -#define FB_SYNC_DATA_INVERT 0x20000000 -#define FB_SYNC_CLK_IDLE_EN 0x10000000 -#define FB_SYNC_SHARP_MODE 0x08000000 -#define FB_SYNC_SWAP_RGB 0x04000000 -#define FB_SYNC_CLK_SEL_EN 0x02000000 - -/* - * Specify the way your display is connected. The IPU can arbitrarily - * map the internal colors to the external data lines. We only support - * the following mappings at the moment. - */ -enum disp_data_mapping { - /* blue -> d[0..5], green -> d[6..11], red -> d[12..17] */ - IPU_DISP_DATA_MAPPING_RGB666, - /* blue -> d[0..4], green -> d[5..10], red -> d[11..15] */ - IPU_DISP_DATA_MAPPING_RGB565, - /* blue -> d[0..7], green -> d[8..15], red -> d[16..23] */ - IPU_DISP_DATA_MAPPING_RGB888, -}; - -/** - * struct mx3fb_platform_data - mx3fb platform data - * - * @dma_dev: pointer to the dma-device, used for dma-slave connection - * @mode: pointer to a platform-provided per mxc_register_fb() videomode - */ -struct mx3fb_platform_data { - struct device *dma_dev; - const char *name; - const struct fb_videomode *mode; - int num_modes; - enum disp_data_mapping disp_data_fmt; -}; - -#endif -- cgit v1.2.3 From 4a9762aa358ee0e3deb6e759959f092a3cea86be Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonathan=20Neusch=C3=A4fer?= Date: Thu, 31 Aug 2023 23:44:41 +0200 Subject: fbdev: Update fbdev source file paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The files fbmem.c, fb_defio.c, fbsysfs.c, fbmon.c, modedb.c, and fbcmap.c were moved. Drop the path and just keep the file name. Reported by kalekale in #kernel (Libera IRC). Fixes: f7018c213502 ("video: move fbdev to drivers/video/fbdev") Fixes: 19757fc8432a ("fbdev: move fbdev core files to separate directory") Signed-off-by: Jonathan Neuschäfer Signed-off-by: Helge Deller --- include/linux/fb.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 16c3e6d6c55d..c14576458228 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -588,7 +588,7 @@ extern ssize_t fb_sys_write(struct fb_info *info, const char __user *buf, .fb_copyarea = sys_copyarea, \ .fb_imageblit = sys_imageblit -/* drivers/video/fbmem.c */ +/* fbmem.c */ extern int register_framebuffer(struct fb_info *fb_info); extern void unregister_framebuffer(struct fb_info *fb_info); extern int fb_prepare_logo(struct fb_info *fb_info, int rotate); @@ -631,7 +631,7 @@ static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, } } -/* drivers/video/fb_defio.c */ +/* fb_defio.c */ int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma); extern int fb_deferred_io_init(struct fb_info *info); extern void fb_deferred_io_open(struct fb_info *info, @@ -734,7 +734,7 @@ extern struct fb_info *framebuffer_alloc(size_t size, struct device *dev); extern void framebuffer_release(struct fb_info *info); extern void fb_bl_default_curve(struct fb_info *fb_info, u8 off, u8 min, u8 max); -/* drivers/video/fbmon.c */ +/* fbmon.c */ #define FB_MAXTIMINGS 0 #define FB_VSYNCTIMINGS 1 #define FB_HSYNCTIMINGS 2 @@ -768,7 +768,7 @@ extern int of_get_fb_videomode(struct device_node *np, extern int fb_videomode_from_videomode(const struct videomode *vm, struct fb_videomode *fbmode); -/* drivers/video/modedb.c */ +/* modedb.c */ #define VESA_MODEDB_SIZE 43 #define DMT_SIZE 0x50 @@ -794,7 +794,7 @@ extern void fb_videomode_to_modelist(const struct fb_videomode *modedb, int num, extern const struct fb_videomode *fb_find_best_display(const struct fb_monspecs *specs, struct list_head *head); -/* drivers/video/fbcmap.c */ +/* fbcmap.c */ extern int fb_alloc_cmap(struct fb_cmap *cmap, int len, int transp); extern int fb_alloc_cmap_gfp(struct fb_cmap *cmap, int len, int transp, gfp_t flags); extern void fb_dealloc_cmap(struct fb_cmap *cmap); -- cgit v1.2.3 From 8423be8926aa82cd2e28bba5cc96ccb72c7ce6be Mon Sep 17 00:00:00 2001 From: Sriram Yagnaraman Date: Thu, 31 Aug 2023 10:03:31 +0200 Subject: ipv6: ignore dst hint for multipath routes Route hints when the nexthop is part of a multipath group causes packets in the same receive batch to be sent to the same nexthop irrespective of the multipath hash of the packet. So, do not extract route hint for packets whose destination is part of a multipath group. A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the flag when route is looked up in fib6_select_path() and use it in ip6_can_use_hint() to check for the existence of the flag. Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.") Signed-off-by: Sriram Yagnaraman Reviewed-by: Ido Schimmel Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/ipv6.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 5883551b1ee8..af8a771a053c 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -147,6 +147,7 @@ struct inet6_skb_parm { #define IP6SKB_JUMBOGRAM 128 #define IP6SKB_SEG6 256 #define IP6SKB_FAKEJUMBO 512 +#define IP6SKB_MULTIPATH 1024 }; #if defined(CONFIG_NET_L3_MASTER_DEV) -- cgit v1.2.3 From 6ad11bc6ed37c6371e9e13619862709b6529b43a Mon Sep 17 00:00:00 2001 From: Baruch Siach Date: Thu, 24 Aug 2023 14:38:08 +0300 Subject: rmap: remove anon_vma_link() nommu stub anon_vma_link() is unused since commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue"). Link: https://lkml.kernel.org/r/cdce9b00c9ab15f6d02eddf40dcad537d3e9676f.1692877089.git.baruch@tkos.co.il Signed-off-by: Baruch Siach Reviewed-by: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/rmap.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a3825ce81102..51cc21ebb568 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -479,7 +479,6 @@ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, #define anon_vma_init() do {} while (0) #define anon_vma_prepare(vma) (0) -#define anon_vma_link(vma) do {} while (0) static inline int folio_referenced(struct folio *folio, int is_locked, struct mem_cgroup *memcg, -- cgit v1.2.3 From b63e5c70c39349ea5b9e7dbb604551902fc753fc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= Date: Fri, 9 Jun 2023 11:21:26 +0200 Subject: vdpa: add get_backend_features vdpa operation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This operation allow vdpa parent to expose its own backend feature bits. Next patches introduce a feature not compatible with all parent drivers: the ability to enable vq after driver_ok. Each parent must declare if it allows it or not. Signed-off-by: Eugenio Pérez Acked-by: Shannon Nelson Message-Id: <20230609092127.170673-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index db1b0eaef4eb..0e652026b776 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -208,6 +208,9 @@ struct vdpa_map_file { * @vdev: vdpa device * Returns the virtio features support by the * device + * @get_backend_features: Get parent-specific backend features (optional) + * Returns the vdpa features supported by the + * device. * @set_driver_features: Set virtio features supported by the driver * @vdev: vdpa device * @features: feature support by the driver @@ -358,6 +361,7 @@ struct vdpa_config_ops { u32 (*get_vq_align)(struct vdpa_device *vdev); u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx); u64 (*get_device_features)(struct vdpa_device *vdev); + u64 (*get_backend_features)(const struct vdpa_device *vdev); int (*set_driver_features)(struct vdpa_device *vdev, u64 features); u64 (*get_driver_features)(struct vdpa_device *vdev); void (*set_config_cb)(struct vdpa_device *vdev, -- cgit v1.2.3 From 8daafe9ebbd21a54bf91f9ff81decf215c203edd Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 10 Aug 2023 20:30:48 +0800 Subject: virtio_ring: introduce virtqueue_set_dma_premapped() This helper allows the driver change the dma mode to premapped mode. Under the premapped mode, the virtio core do not do dma mapping internally. This just work when the use_dma_api is true. If the use_dma_api is false, the dma options is not through the DMA APIs, that is not the standard way of the linux kernel. Signed-off-by: Xuan Zhuo Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index de6041deee37..8add38038877 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +int virtqueue_set_dma_premapped(struct virtqueue *_vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- cgit v1.2.3 From 2df64759071bdca2a12edb9af4f071e4419ad745 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 10 Aug 2023 20:30:50 +0800 Subject: virtio_ring: introduce virtqueue_dma_dev() Added virtqueue_dma_dev() to get DMA device for virtio. Then the caller can do dma operation in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 8add38038877..bd55a05eec04 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq, void *data, gfp_t gfp); +struct device *virtqueue_dma_dev(struct virtqueue *vq); + bool virtqueue_kick(struct virtqueue *vq); bool virtqueue_kick_prepare(struct virtqueue *vq); -- cgit v1.2.3 From ba3e0c47c070c4cf010be9fb1e4eb669c744af11 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 10 Aug 2023 20:30:54 +0800 Subject: virtio_ring: introduce virtqueue_reset() Introduce virtqueue_reset() to release all buffer inside vq. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index bd55a05eec04..49a640e0a6f4 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -99,6 +99,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq); int virtqueue_resize(struct virtqueue *vq, u32 num, void (*recycle)(struct virtqueue *vq, void *buf)); +int virtqueue_reset(struct virtqueue *vq, + void (*recycle)(struct virtqueue *vq, void *buf)); /** * struct virtio_device - representation of a device using virtio -- cgit v1.2.3 From b6253b4e21939f1bb54e8fdb84c23af9c3fb834a Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 10 Aug 2023 20:30:55 +0800 Subject: virtio_ring: introduce dma map api for virtqueue Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 49a640e0a6f4..79e3c74391e0 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -9,6 +9,7 @@ #include #include #include +#include /** * struct virtqueue - a queue to register buffers for sending or receiving. @@ -212,4 +213,11 @@ void unregister_virtio_driver(struct virtio_driver *drv); #define module_virtio_driver(__virtio_driver) \ module_driver(__virtio_driver, register_virtio_driver, \ unregister_virtio_driver) + +dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size_t size, + enum dma_data_direction dir, unsigned long attrs); +void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs); +int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr); #endif /* _LINUX_VIRTIO_H */ -- cgit v1.2.3 From 8bd2f71054bd0bc997833e9825143672eb7e2801 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 10 Aug 2023 20:30:56 +0800 Subject: virtio_ring: introduce dma sync api for virtqueue These API has been introduced: * virtqueue_dma_need_sync * virtqueue_dma_sync_single_range_for_cpu * virtqueue_dma_sync_single_range_for_device These APIs can be used together with the premapped mechanism to sync the DMA address. Signed-off-by: Xuan Zhuo Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 79e3c74391e0..1311a7fbe675 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -220,4 +220,12 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr, size_t size, enum dma_data_direction dir, unsigned long attrs); int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr); + +bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr); +void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t addr, + unsigned long offset, size_t size, + enum dma_data_direction dir); +void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, dma_addr_t addr, + unsigned long offset, size_t size, + enum dma_data_direction dir); #endif /* _LINUX_VIRTIO_H */ -- cgit v1.2.3 From 719c5e37e99d2fd588d1c994284d17650a66354c Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 1 Sep 2023 06:53:23 +0200 Subject: net: phy: micrel: Correct bit assignments for phy_device flags Previously, the defines for phy_device flags in the Micrel driver were ambiguous in their representation. They were intended to be bit masks but were mistakenly defined as bit positions. This led to the following issues: - MICREL_KSZ8_P1_ERRATA, designated for KSZ88xx switches, overlapped with MICREL_PHY_FXEN and MICREL_PHY_50MHZ_CLK. - Due to this overlap, the code path for MICREL_PHY_FXEN, tailored for the KSZ8041 PHY, was not executed for KSZ88xx PHYs. - Similarly, the code associated with MICREL_PHY_50MHZ_CLK wasn't triggered for KSZ88xx. To rectify this, all three flags have now been explicitly converted to use the `BIT()` macro, ensuring they are defined as bit masks and preventing potential overlaps in the future. Fixes: 49011e0c1555 ("net: phy: micrel: ksz886x/ksz8081: add cabletest support") Signed-off-by: Oleksij Rempel Reviewed-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/micrel_phy.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 8bef1ab62bba..322d87255984 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -41,9 +41,9 @@ #define PHY_ID_KSZ9477 0x00221631 /* struct phy_device dev_flags definitions */ -#define MICREL_PHY_50MHZ_CLK 0x00000001 -#define MICREL_PHY_FXEN 0x00000002 -#define MICREL_KSZ8_P1_ERRATA 0x00000003 +#define MICREL_PHY_50MHZ_CLK BIT(0) +#define MICREL_PHY_FXEN BIT(1) +#define MICREL_KSZ8_P1_ERRATA BIT(2) #define MICREL_KSZ9021_EXTREG_CTRL 0xB #define MICREL_KSZ9021_EXTREG_DATA_WRITE 0xC -- cgit v1.2.3 From 0be7592885d7b4c20595c388adc13930b653b847 Mon Sep 17 00:00:00 2001 From: Nilesh Javali Date: Thu, 31 Aug 2023 16:51:45 +0530 Subject: scsi: qla2xxx: Correct endianness for rqstlen and rsplen rqstlen and rsplen were changed to __le32 to fix sparse warnings: drivers/scsi/qla2xxx/qla_nvme.c:402:30: warning: incorrect type in assignment (different base types) drivers/scsi/qla2xxx/qla_nvme.c:402:30: expected restricted __le32 [usertype] cmd_len drivers/scsi/qla2xxx/qla_nvme.c:402:30: got unsigned short [usertype] rsplen drivers/scsi/qla2xxx/qla_nvme.c:507:30: warning: incorrect type in assignment (different base types) drivers/scsi/qla2xxx/qla_nvme.c:507:30: expected restricted __le32 [usertype] cmd_len drivers/scsi/qla2xxx/qla_nvme.c:507:30: got unsigned int [usertype] rqstlen drivers/scsi/qla2xxx/qla_nvme.c:508:30: warning: incorrect type in assignment (different base types) drivers/scsi/qla2xxx/qla_nvme.c:508:30: expected restricted __le32 [usertype] rsp_len drivers/scsi/qla2xxx/qla_nvme.c:508:30: got unsigned int [usertype] rsplen Correct the endianness in qla2xxx driver thus avoiding changes in nvme-fc-driver.h. Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe") Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20230831112146.32595-1-njavali@marvell.com Signed-off-by: Martin K. Petersen --- include/linux/nvme-fc-driver.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h index f6ef8cf5d774..4109f1bd6128 100644 --- a/include/linux/nvme-fc-driver.h +++ b/include/linux/nvme-fc-driver.h @@ -53,10 +53,10 @@ struct nvmefc_ls_req { void *rqstaddr; dma_addr_t rqstdma; - __le32 rqstlen; + u32 rqstlen; void *rspaddr; dma_addr_t rspdma; - __le32 rsplen; + u32 rsplen; u32 timeout; void *private; @@ -120,7 +120,7 @@ struct nvmefc_ls_req { struct nvmefc_ls_rsp { void *rspbuf; dma_addr_t rspdma; - __le32 rsplen; + u16 rsplen; void (*done)(struct nvmefc_ls_rsp *rsp); void *nvme_fc_private; /* LLDD is not to access !! */ -- cgit v1.2.3 From cf60ce92358da29f3b1e24005f7b748584b82753 Mon Sep 17 00:00:00 2001 From: Pavel Pisa Date: Mon, 4 Sep 2023 12:00:02 +0200 Subject: of: overlay: Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY The of_overlay_fdt_apply has been changed but when CONFIG_OF_OVERLAY support is not configured then old stub prototype is declared by of.h header. Signed-off-by: Pavel Pisa Fixes: 47284862bfc7 ("of: overlay: Extend of_overlay_fdt_apply() to specify the target node") Acked-by: Marc Kleine-Budde Link: https://lore.kernel.org/r/20230904100002.7913-1-pisa@cmp.felk.cvut.cz Signed-off-by: Rob Herring --- include/linux/of.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index ed679819c279..6a9ddf20e79a 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1676,8 +1676,8 @@ int of_overlay_notifier_unregister(struct notifier_block *nb); #else -static inline int of_overlay_fdt_apply(void *overlay_fdt, u32 overlay_fdt_size, - int *ovcs_id) +static inline int of_overlay_fdt_apply(const void *overlay_fdt, u32 overlay_fdt_size, + int *ovcs_id, struct device_node *target_base) { return -ENOTSUPP; } -- cgit v1.2.3 From 9ffa7b92bc768609243d79fb0c0125b9704c7061 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 30 Aug 2023 18:11:38 +0200 Subject: thermal: core: Clean up headers of thermal zone registration functions For consistency, add a missing thermal_zone_device_register_with_trips() stub for the CONFIG_THERMAL unset case, specify argument names in all of the thermal zone registration and unregistration function headers and make all of them use white space consistently. No intentional functional impact. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 53 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index eb17495c8acc..a30643104f3f 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -300,16 +300,24 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp); #endif #ifdef CONFIG_THERMAL -struct thermal_zone_device *thermal_zone_device_register(const char *, int, int, - void *, struct thermal_zone_device_ops *, - const struct thermal_zone_params *, int, int); - -void thermal_zone_device_unregister(struct thermal_zone_device *); - -struct thermal_zone_device * -thermal_zone_device_register_with_trips(const char *, struct thermal_trip *, int, int, - void *, struct thermal_zone_device_ops *, - const struct thermal_zone_params *, int, int); +struct thermal_zone_device *thermal_zone_device_register( + const char *type, + int num_trips, int mask, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay); + +struct thermal_zone_device *thermal_zone_device_register_with_trips( + const char *type, + struct thermal_trip *trips, + int num_trips, int mask, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay); + +void thermal_zone_device_unregister(struct thermal_zone_device *tz); void *thermal_zone_device_priv(struct thermal_zone_device *tzd); const char *thermal_zone_device_type(struct thermal_zone_device *tzd); @@ -351,14 +359,27 @@ int thermal_zone_device_disable(struct thermal_zone_device *tz); void thermal_zone_device_critical(struct thermal_zone_device *tz); #else static inline struct thermal_zone_device *thermal_zone_device_register( - const char *type, int trips, int mask, void *devdata, - struct thermal_zone_device_ops *ops, - const struct thermal_zone_params *tzp, - int passive_delay, int polling_delay) + const char *type, + int num_trips, int mask, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay) +{ return ERR_PTR(-ENODEV); } + +static inline struct thermal_zone_device *thermal_zone_device_register_with_trips( + const char *type, + struct thermal_trip *trips, + int num_trips, int mask, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay) { return ERR_PTR(-ENODEV); } -static inline void thermal_zone_device_unregister( - struct thermal_zone_device *tz) + +static inline void thermal_zone_device_unregister(struct thermal_zone_device *tz) { } + static inline struct thermal_cooling_device * thermal_cooling_device_register(const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) -- cgit v1.2.3 From d332db8fc1a2dfb4738281b1d6d4ed20115dd9d3 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 30 Aug 2023 18:13:35 +0200 Subject: thermal: core: Add function for registering tripless thermal zones Multiple callers of thermal_zone_device_register() don't pass any trips to it and they might use a shortened argument list for that, so add a special function with fewer arguments for this purpose. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index a30643104f3f..1a1433eb9b59 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -317,6 +317,12 @@ struct thermal_zone_device *thermal_zone_device_register_with_trips( const struct thermal_zone_params *tzp, int passive_delay, int polling_delay); +struct thermal_zone_device *thermal_tripless_zone_device_register( + const char *type, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp); + void thermal_zone_device_unregister(struct thermal_zone_device *tz); void *thermal_zone_device_priv(struct thermal_zone_device *tzd); @@ -377,6 +383,13 @@ static inline struct thermal_zone_device *thermal_zone_device_register_with_trip int passive_delay, int polling_delay) { return ERR_PTR(-ENODEV); } +static inline struct thermal_zone_device *thermal_tripless_zone_device_register( + const char *type, + void *devdata, + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp) +{ return ERR_PTR(-ENODEV); } + static inline void thermal_zone_device_unregister(struct thermal_zone_device *tz) { } -- cgit v1.2.3 From edd220b33f479cf9dcda0bfefb2cb8c5902e9885 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 30 Aug 2023 18:16:29 +0200 Subject: thermal: core: Drop thermal_zone_device_register() There are no more users of thermal_zone_device_register(), so drop it from the core. Note that thermal_zone_device_register_with_trips() may be renamed to thermal_zone_device_register() in the future, but only after a grace period allowing all of the possible work in progress that may be using the latter to adjust. Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 1a1433eb9b59..c99440aac1a1 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -300,14 +300,6 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp); #endif #ifdef CONFIG_THERMAL -struct thermal_zone_device *thermal_zone_device_register( - const char *type, - int num_trips, int mask, - void *devdata, - struct thermal_zone_device_ops *ops, - const struct thermal_zone_params *tzp, - int passive_delay, int polling_delay); - struct thermal_zone_device *thermal_zone_device_register_with_trips( const char *type, struct thermal_trip *trips, @@ -364,15 +356,6 @@ int thermal_zone_device_enable(struct thermal_zone_device *tz); int thermal_zone_device_disable(struct thermal_zone_device *tz); void thermal_zone_device_critical(struct thermal_zone_device *tz); #else -static inline struct thermal_zone_device *thermal_zone_device_register( - const char *type, - int num_trips, int mask, - void *devdata, - struct thermal_zone_device_ops *ops, - const struct thermal_zone_params *tzp, - int passive_delay, int polling_delay) -{ return ERR_PTR(-ENODEV); } - static inline struct thermal_zone_device *thermal_zone_device_register_with_trips( const char *type, struct thermal_trip *trips, -- cgit v1.2.3 From e7716c74e3882405f9eca16faa6cb1bf19995399 Mon Sep 17 00:00:00 2001 From: Philipp Stanner Date: Mon, 21 Aug 2023 10:21:29 +0200 Subject: xarray: Document necessary flag in alloc functions Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC being set previously. The documentation so far says that functions wrapping __xa_alloc() and __xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in case of an error. If the xarray has been initialized without the flag XA_FLAGS_ALLOC, however, they fail with a different, undocumented error code. As hinted at in Documentation/core-api/xarray.rst, wrappers around these functions should only be invoked when the flag has been set. The functions' documentation should reflect that as well. Signed-off-by: Philipp Stanner Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/xarray.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 741703b45f61..cb571dfcf4b1 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -856,6 +856,9 @@ static inline int __must_check xa_insert_irq(struct xarray *xa, * stores the index into the @id pointer, then stores the entry at * that index. A concurrent lookup will not see an uninitialised @id. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Any context. Takes and releases the xa_lock. May sleep if * the @gfp flags permit. * Return: 0 on success, -ENOMEM if memory could not be allocated or @@ -886,6 +889,9 @@ static inline __must_check int xa_alloc(struct xarray *xa, u32 *id, * stores the index into the @id pointer, then stores the entry at * that index. A concurrent lookup will not see an uninitialised @id. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Any context. Takes and releases the xa_lock while * disabling softirqs. May sleep if the @gfp flags permit. * Return: 0 on success, -ENOMEM if memory could not be allocated or @@ -916,6 +922,9 @@ static inline int __must_check xa_alloc_bh(struct xarray *xa, u32 *id, * stores the index into the @id pointer, then stores the entry at * that index. A concurrent lookup will not see an uninitialised @id. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Process context. Takes and releases the xa_lock while * disabling interrupts. May sleep if the @gfp flags permit. * Return: 0 on success, -ENOMEM if memory could not be allocated or @@ -949,6 +958,9 @@ static inline int __must_check xa_alloc_irq(struct xarray *xa, u32 *id, * The search for an empty entry will start at @next and will wrap * around if necessary. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Any context. Takes and releases the xa_lock. May sleep if * the @gfp flags permit. * Return: 0 if the allocation succeeded without wrapping. 1 if the @@ -983,6 +995,9 @@ static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry, * The search for an empty entry will start at @next and will wrap * around if necessary. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Any context. Takes and releases the xa_lock while * disabling softirqs. May sleep if the @gfp flags permit. * Return: 0 if the allocation succeeded without wrapping. 1 if the @@ -1017,6 +1032,9 @@ static inline int xa_alloc_cyclic_bh(struct xarray *xa, u32 *id, void *entry, * The search for an empty entry will start at @next and will wrap * around if necessary. * + * Must only be operated on an xarray initialized with flag XA_FLAGS_ALLOC set + * in xa_init_flags(). + * * Context: Process context. Takes and releases the xa_lock while * disabling interrupts. May sleep if the @gfp flags permit. * Return: 0 if the allocation succeeded without wrapping. 1 if the -- cgit v1.2.3 From 39285e124edbc752331e98ace37cc141a6a3747a Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Tue, 5 Sep 2023 08:46:10 +0000 Subject: net: team: do not use dynamic lockdep key team interface has used a dynamic lockdep key to avoid false-positive lockdep deadlock detection. Virtual interfaces such as team usually have their own lock for protecting private data. These interfaces can be nested. team0 | team1 Each interface's lock is actually different(team0->lock and team1->lock). So, mutex_lock(&team0->lock); mutex_lock(&team1->lock); mutex_unlock(&team1->lock); mutex_unlock(&team0->lock); The above case is absolutely safe. But lockdep warns about deadlock. Because the lockdep understands these two locks are same. This is a false-positive lockdep warning. So, in order to avoid this problem, the team interfaces started to use dynamic lockdep key. The false-positive problem was fixed, but it introduced a new problem. When the new team virtual interface is created, it registers a dynamic lockdep key(creates dynamic lockdep key) and uses it. But there is the limitation of the number of lockdep keys. So, If so many team interfaces are created, it consumes all lockdep keys. Then, the lockdep stops to work and warns about it. In order to fix this problem, team interfaces use the subclass instead of the dynamic key. So, when a new team interface is created, it doesn't register(create) a new lockdep, but uses existed subclass key instead. It is already used by the bonding interface for a similar case. As the bonding interface does, the subclass variable is the same as the 'dev->nested_level'. This variable indicates the depth in the stacked interface graph. The 'dev->nested_level' is protected by RTNL and RCU. So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU. In the current code, 'team->lock' is usually acquired under RTNL, there is no problem with using 'dev->nested_level'. The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire 'team->lock' without RTNL. But these don't iterate their own ports nested so they don't need nested lock. Reproducer: for i in {0..1000} do ip link add team$i type team ip link add dummy$i master team$i type dummy ip link set dummy$i up ip link set team$i up done Splat looks like: BUG: MAX_LOCKDEP_ENTRIES too low! turning off the locking correctness validator. Please attach the output of /proc/lock_stat to the bug report CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45 Call Trace: dump_stack_lvl+0x64/0xb0 add_lock_to_list+0x30d/0x5e0 check_prev_add+0x73a/0x23a0 ... sock_def_readable+0xfe/0x4f0 netlink_broadcast+0x76b/0xac0 nlmsg_notify+0x69/0x1d0 dev_open+0xed/0x130 ... Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning") Signed-off-by: Taehee Yoo Signed-off-by: David S. Miller --- include/linux/if_team.h | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 1b9b15a492fa..12d4447fc8ab 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -221,10 +221,38 @@ struct team { atomic_t count_pending; struct delayed_work dw; } mcast_rejoin; - struct lock_class_key team_lock_key; long mode_priv[TEAM_MODE_PRIV_LONGS]; }; +static inline void __team_lock(struct team *team) +{ + mutex_lock(&team->lock); +} + +static inline int team_trylock(struct team *team) +{ + return mutex_trylock(&team->lock); +} + +#ifdef CONFIG_LOCKDEP +static inline void team_lock(struct team *team) +{ + ASSERT_RTNL(); + mutex_lock_nested(&team->lock, team->dev->nested_level); +} + +#else +static inline void team_lock(struct team *team) +{ + __team_lock(team); +} +#endif + +static inline void team_unlock(struct team *team) +{ + mutex_unlock(&team->lock); +} + static inline int team_dev_queue_xmit(struct team *team, struct team_port *port, struct sk_buff *skb) { -- cgit v1.2.3 From 1a961e74d5abbea049588a3d74b759955b4ed9d5 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 5 Sep 2023 16:42:02 -0700 Subject: net: phylink: fix sphinx complaint about invalid literal sphinx complains about the use of "%PHYLINK_PCS_NEG_*": Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string. Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string. These are not valid symbols so drop the '%' prefix. Alternatively we could use %PHYLINK_PCS_NEG_\* (escape the *) or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already a handful of un-adorned DEFINE_* in this file. Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode") Reported-by: Stephen Rothwell Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/ Signed-off-by: Jakub Kicinski Reviewed-by: Bagas Sanjaya Signed-off-by: David S. Miller --- include/linux/phylink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 7d07f8736431..2b886ea654bb 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -600,7 +600,7 @@ void pcs_get_state(struct phylink_pcs *pcs, * * The %neg_mode argument should be tested via the phylink_mode_*() family of * functions, or for PCS that set pcs->neg_mode true, should be tested - * against the %PHYLINK_PCS_NEG_* definitions. + * against the PHYLINK_PCS_NEG_* definitions. */ int pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, const unsigned long *advertising, @@ -630,7 +630,7 @@ void pcs_an_restart(struct phylink_pcs *pcs); * * The %mode argument should be tested via the phylink_mode_*() family of * functions, or for PCS that set pcs->neg_mode true, should be tested - * against the %PHYLINK_PCS_NEG_* definitions. + * against the PHYLINK_PCS_NEG_* definitions. */ void pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, int speed, int duplex); -- cgit v1.2.3 From 8f3f06dfd6873135068ccf1a0b386308e8c4da38 Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Wed, 6 Sep 2023 22:53:55 +0800 Subject: raid6: Add LoongArch SIMD syndrome calculation The algorithms work on 64 bytes at a time, which is the L1 cache line size of all current and future LoongArch cores (that we care about), as confirmed by Huacai. The code is based on the generic int.uc algorithm, unrolled 4 times for LSX and 2 times for LASX. Further unrolling does not meaningfully improve the performance according to experiments. Performance numbers measured during system boot on a 3A5000 @ 2.5GHz: > raid6: lasx gen() 12726 MB/s > raid6: lsx gen() 10001 MB/s > raid6: int64x8 gen() 2876 MB/s > raid6: int64x4 gen() 3867 MB/s > raid6: int64x2 gen() 2531 MB/s > raid6: int64x1 gen() 1945 MB/s Comparison of xor() speeds (from different boots but meaningful anyway): > lasx: 11226 MB/s > lsx: 6395 MB/s > int64x4: 2147 MB/s Performance as measured by raid6test: > raid6: lasx gen() 25109 MB/s > raid6: lsx gen() 13233 MB/s > raid6: int64x8 gen() 4164 MB/s > raid6: int64x4 gen() 6005 MB/s > raid6: int64x2 gen() 5781 MB/s > raid6: int64x1 gen() 4119 MB/s > raid6: using algorithm lasx gen() 25109 MB/s > raid6: .... xor() 14439 MB/s, rmw enabled Acked-by: Song Liu Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- include/linux/raid/pq.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index f29aaaf2eb21..874447485848 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -108,6 +108,8 @@ extern const struct raid6_calls raid6_vpermxor1; extern const struct raid6_calls raid6_vpermxor2; extern const struct raid6_calls raid6_vpermxor4; extern const struct raid6_calls raid6_vpermxor8; +extern const struct raid6_calls raid6_lsx; +extern const struct raid6_calls raid6_lasx; struct raid6_recov_calls { void (*data2)(int, size_t, int, int, void **); -- cgit v1.2.3 From f2091321044d9fbcadb93dfc1c9cf23e563ea40c Mon Sep 17 00:00:00 2001 From: WANG Xuerui Date: Wed, 6 Sep 2023 22:53:55 +0800 Subject: raid6: Add LoongArch SIMD recovery implementation Similar to the syndrome calculation, the recovery algorithms also work on 64 bytes at a time to align with the L1 cache line size of current and future LoongArch cores (that we care about). Which means unrolled-by-4 LSX and unrolled-by-2 LASX code. The assembly is originally based on the x86 SSSE3/AVX2 ports, but register allocation has been redone to take advantage of LSX/LASX's 32 vector registers, and instruction sequence has been optimized to suit (e.g. LoongArch can perform per-byte srl and andi on vectors, but x86 cannot). Performance numbers measured by instrumenting the raid6test code, on a 3A5000 system clocked at 2.5GHz: > lasx 2data: 354.987 MiB/s > lasx datap: 350.430 MiB/s > lsx 2data: 340.026 MiB/s > lsx datap: 337.318 MiB/s > intx1 2data: 164.280 MiB/s > intx1 datap: 187.966 MiB/s Because recovery algorithms are chosen solely based on priority and availability, lasx is marked as priority 2 and lsx priority 1. At least for the current generation of LoongArch micro-architectures, LASX should always be faster than LSX whenever supported, and have similar power consumption characteristics (because the only known LASX-capable uarch, the LA464, always compute the full 256-bit result for vector ops). Acked-by: Song Liu Signed-off-by: WANG Xuerui Signed-off-by: Huacai Chen --- include/linux/raid/pq.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 874447485848..006e18decfad 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -125,6 +125,8 @@ extern const struct raid6_recov_calls raid6_recov_avx2; extern const struct raid6_recov_calls raid6_recov_avx512; extern const struct raid6_recov_calls raid6_recov_s390xc; extern const struct raid6_recov_calls raid6_recov_neon; +extern const struct raid6_recov_calls raid6_recov_lsx; +extern const struct raid6_recov_calls raid6_recov_lasx; extern const struct raid6_calls raid6_neonx1; extern const struct raid6_calls raid6_neonx2; -- cgit v1.2.3 From 9b04c764af18a1dab6d48ca0671f70cdcccf90a2 Mon Sep 17 00:00:00 2001 From: Qing Zhang Date: Wed, 6 Sep 2023 22:54:16 +0800 Subject: kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping MIPS, LoongArch and some other architectures have many holes between different segments and the valid address space (256T available) is insufficient to map all these segments to kasan shadow memory with the common formula provided by kasan core. So we need architecture specific mapping formulas to ensure different segments are mapped individually, and only limited space lengths of those specific segments are mapped to shadow. Therefore, when the incoming address is converted to a shadow, we need to add a condition to determine whether it is valid. Reviewed-by: Andrey Konovalov Signed-off-by: Qing Zhang Signed-off-by: Huacai Chen --- include/linux/kasan.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 819b6bc8ac08..3df5499f7936 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -54,11 +54,13 @@ extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; int kasan_populate_early_shadow(const void *shadow_start, const void *shadow_end); +#ifndef __HAVE_ARCH_SHADOW_MAP static inline void *kasan_mem_to_shadow(const void *addr) { return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET; } +#endif int kasan_add_zero_shadow(void *start, unsigned long size); void kasan_remove_zero_shadow(void *start, unsigned long size); -- cgit v1.2.3 From 08c6d8bae48c2c28f7017d7b61b5d5a1518ceb39 Mon Sep 17 00:00:00 2001 From: Lukasz Majewski Date: Tue, 5 Sep 2023 11:33:15 +0200 Subject: net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) The KSZ9477 errata points out (in 'Module 4') the link up/down problems when EEE (Energy Efficient Ethernet) is enabled in the device to which the KSZ9477 tries to auto negotiate. The suggested workaround is to clear advertisement of EEE for PHYs in this chip driver. To avoid regressions with other switch ICs the new MICREL_NO_EEE flag has been introduced. Moreover, the in-register disablement of MMD_DEVICE_ID_EEE_ADV.MMD_EEE_ADV MMD register is removed, as this code is both; now executed too late (after previous rework of the PHY and DSA for KSZ switches) and not required as setting all members of eee_broken_modes bit field prevents the KSZ9477 from advertising EEE. Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ9477 Signed-off-by: Lukasz Majewski Tested-by: Oleksij Rempel # Confirmed disabled EEE with oscilloscope. Reviewed-by: Oleksij Rempel Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230905093315.784052-1-lukma@denx.de Signed-off-by: Jakub Kicinski --- include/linux/micrel_phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 322d87255984..4e27ca7c49de 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -44,6 +44,7 @@ #define MICREL_PHY_50MHZ_CLK BIT(0) #define MICREL_PHY_FXEN BIT(1) #define MICREL_KSZ8_P1_ERRATA BIT(2) +#define MICREL_NO_EEE BIT(3) #define MICREL_KSZ9021_EXTREG_CTRL 0xB #define MICREL_KSZ9021_EXTREG_DATA_WRITE 0xC -- cgit v1.2.3 From 6afcf0fb92701487421aa73c692855aa70fbc796 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 7 Sep 2023 11:01:04 -0700 Subject: Revert "net: team: do not use dynamic lockdep key" This reverts commit 39285e124edbc752331e98ace37cc141a6a3747a. Looks like the change has unintended consequences in exposing objects before they are initialized. Let's drop this patch and try again in net-next. Reported-by: syzbot+44ae022028805f4600fc@syzkaller.appspotmail.com Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key") Link: https://lore.kernel.org/all/20230907103124.6adb7256@kernel.org/ Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 30 +----------------------------- 1 file changed, 1 insertion(+), 29 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 12d4447fc8ab..1b9b15a492fa 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -221,38 +221,10 @@ struct team { atomic_t count_pending; struct delayed_work dw; } mcast_rejoin; + struct lock_class_key team_lock_key; long mode_priv[TEAM_MODE_PRIV_LONGS]; }; -static inline void __team_lock(struct team *team) -{ - mutex_lock(&team->lock); -} - -static inline int team_trylock(struct team *team) -{ - return mutex_trylock(&team->lock); -} - -#ifdef CONFIG_LOCKDEP -static inline void team_lock(struct team *team) -{ - ASSERT_RTNL(); - mutex_lock_nested(&team->lock, team->dev->nested_level); -} - -#else -static inline void team_lock(struct team *team) -{ - __team_lock(team); -} -#endif - -static inline void team_unlock(struct team *team) -{ - mutex_unlock(&team->lock); -} - static inline int team_dev_queue_xmit(struct team *team, struct team_port *port, struct sk_buff *skb) { -- cgit v1.2.3 From f94cf2206b066bd6d761d3347fd35f77b828c376 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 7 Sep 2023 09:40:07 -0400 Subject: buffer: Make bh_offset() work for compound pages If the buffer pointed to by the buffer_head is part of a compound page, bh_offset() assumes that b_page is the precise page that contains the data. A recent change to jbd2 inadvertently violated that assumption. By using page_size(), we support both b_page being set to the head page (as page_size() will return the size of the entire folio) and the precise page (as it will return PAGE_SIZE for a tail page). Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()") Reported-by: Zorro Lang Tested-by: Ritesh Harjani (IBM) Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/buffer_head.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6cb3e9af78c9..4ba242073adc 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -173,7 +173,10 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh) return test_bit_acquire(BH_Uptodate, &bh->b_state); } -#define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) +static inline unsigned long bh_offset(const struct buffer_head *bh) +{ + return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1); +} /* If we *know* page->private refers to buffer_heads */ #define page_buffers(page) \ -- cgit v1.2.3 From 6fdac58c560e4d164eb8161987bee045147cabe4 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Thu, 7 Sep 2023 22:19:12 -0400 Subject: tracing: Remove unused trace_event_file dir field Now that eventfs structure is used to create the events directory via the eventfs dynamically allocate code, the "dir" field of the trace_event_file structure is no longer used. Remove it. Link: https://lkml.kernel.org/r/20230908022001.580400115@goodmis.org Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Andrew Morton Cc: Ajay Kaher Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index eb5c3add939b..12f875e9e69a 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -650,7 +650,6 @@ struct trace_event_file { struct trace_event_call *event_call; struct event_filter __rcu *filter; struct eventfs_file *ef; - struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system; struct list_head triggers; -- cgit v1.2.3 From 5d153cd128251aaedc8e9657f0a949ec94952055 Mon Sep 17 00:00:00 2001 From: Steve French Date: Fri, 8 Sep 2023 16:34:59 -0500 Subject: spnego: add missing OID to oid registry Add missing OID to the registry. Some servers and clients (including Windows) now request "NEGOEX - SPNEGEO Extended Negotiation Security") See https://datatracker.ietf.org/doc/html/draft-zhu-negoex-02 Reviewed-by: Namjae Jeon Signed-off-by: Steve French --- include/linux/oid_registry.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/oid_registry.h b/include/linux/oid_registry.h index 0f4a8903922a..f86a08ba0207 100644 --- a/include/linux/oid_registry.h +++ b/include/linux/oid_registry.h @@ -67,6 +67,7 @@ enum OID { OID_msOutlookExpress, /* 1.3.6.1.4.1.311.16.4 */ OID_ntlmssp, /* 1.3.6.1.4.1.311.2.2.10 */ + OID_negoex, /* 1.3.6.1.4.1.311.2.2.30 */ OID_spnego, /* 1.3.6.1.5.5.2 */ -- cgit v1.2.3 From 24e0e61db3cb86a66824531989f1df80e0939f26 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Mon, 4 Sep 2023 22:42:56 +0200 Subject: ata: libata: disallow dev-initiated LPM transitions to unsupported states MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In AHCI 1.3.1, the register description for CAP.SSC: "When cleared to ‘0’, software must not allow the HBA to initiate transitions to the Slumber state via agressive link power management nor the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port must be programmed to disallow device initiated Slumber requests." In AHCI 1.3.1, the register description for CAP.PSC: "When cleared to ‘0’, software must not allow the HBA to initiate transitions to the Partial state via agressive link power management nor the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port must be programmed to disallow device initiated Partial requests." Ensure that we always set the corresponding bits in PxSCTL.IPM, such that a device is not allowed to initiate transitions to power states which are unsupported by the HBA. DevSleep is always initiated by the HBA, however, for completeness, set the corresponding bit in PxSCTL.IPM such that agressive link power management cannot transition to DevSleep if DevSleep is not supported. sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp. However, only libahci has the ability to read the CAP/CAP2 register to see if these features are supported. Therefore, in order to not introduce any regressions on ata_piix or libata-pmp, create flags that indicate that the respective feature is NOT supported. This way, the behavior for ata_piix and libata-pmp should remain unchanged. This change is based on a patch originally submitted by Runa Guo-oc. Signed-off-by: Niklas Cassel Fixes: 1152b2617a6e ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal --- include/linux/libata.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 52d58b13e5ee..bf4913f4d7ac 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -222,6 +222,10 @@ enum { ATA_HOST_PARALLEL_SCAN = (1 << 2), /* Ports on this host can be scanned in parallel */ ATA_HOST_IGNORE_ATA = (1 << 3), /* Ignore ATA devices on this host. */ + ATA_HOST_NO_PART = (1 << 4), /* Host does not support partial */ + ATA_HOST_NO_SSC = (1 << 5), /* Host does not support slumber */ + ATA_HOST_NO_DEVSLP = (1 << 6), /* Host does not support devslp */ + /* bits 24:31 of host->flags are reserved for LLD specific flags */ /* various lengths of time */ -- cgit v1.2.3 From ebc7abb35b258152d4a424f89d7c03db1d7ce61c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 7 Sep 2023 20:18:56 +0200 Subject: thermal: Constify the trip argument of the .get_trend() zone callback Add 'const' to the definition of the 'trip' argument of the .get_trend() thermal zone callback to indicate that the trip point passed to it should not be modified by it and adjust the callback functions implementing it, thermal_get_trend() in the ACPI thermal driver and __ti_thermal_get_trend(), accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki Reviewed-by: Michal Wilczynski --- include/linux/thermal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index c99440aac1a1..a5ae4af955ff 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -80,8 +80,8 @@ struct thermal_zone_device_ops { int (*set_trip_hyst) (struct thermal_zone_device *, int, int); int (*get_crit_temp) (struct thermal_zone_device *, int *); int (*set_emul_temp) (struct thermal_zone_device *, int); - int (*get_trend) (struct thermal_zone_device *, struct thermal_trip *, - enum thermal_trend *); + int (*get_trend) (struct thermal_zone_device *, + const struct thermal_trip *, enum thermal_trend *); void (*hot)(struct thermal_zone_device *); void (*critical)(struct thermal_zone_device *); }; -- cgit v1.2.3 From fc52a64416b010c8324e2cb50070faae868521c1 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Fri, 8 Sep 2023 16:39:29 -0400 Subject: tracing/synthetic: Fix order of struct trace_dynamic_info To make handling BIG and LITTLE endian better the offset/len of dynamic fields of the synthetic events was changed into a structure of: struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 offset; u16 len; #else u16 len; u16 offset; #endif }; to replace the manual changes of: data_offset = offset & 0xffff; data_offest = len << 16; But if you look closely, the above is: << 16 | offset Which in little endian would be in memory: offset_lo offset_hi len_lo len_hi and in big endian: len_hi len_lo offset_hi offset_lo Which if broken into a structure would be: struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 len; u16 offset; #else u16 offset; u16 len; #endif }; Which is the opposite of what was defined. Fix this and just to be safe also add "__packed". Link: https://lore.kernel.org/all/20230908154417.5172e343@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230908163929.2c25f3dc@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland Tested-by: Sven Schnelle Acked-by: Masami Hiramatsu (Google) Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts") Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 12f875e9e69a..21ae37e49319 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -62,13 +62,13 @@ void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...); /* Used to find the offset and length of dynamic fields in trace events */ struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN - u16 offset; u16 len; + u16 offset; #else - u16 len; u16 offset; + u16 len; #endif -}; +} __packed; /* * The trace entry - the most basic unit of tracing. This is what -- cgit v1.2.3 From 08700ec705043eb0cee01b35cf5b9d63f0230d12 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 6 Sep 2023 03:46:57 +0900 Subject: linux/export: fix reference to exported functions for parisc64 John David Anglin reported parisc has been broken since commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost"). Like ia64, parisc64 uses a function descriptor. The function references must be prefixed with P%. Also, symbols prefixed $$ from the library have the symbol type STT_LOPROC instead of STT_FUNC. They should be handled as functions too. Fixes: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") Reported-by: John David Anglin Tested-by: John David Anglin Tested-by: Helge Deller Closes: https://lore.kernel.org/linux-parisc/1901598a-e11d-f7dd-a5d9-9a69d06e6b6e@bell.net/T/#u Signed-off-by: Masahiro Yamada Signed-off-by: Helge Deller --- include/linux/export-internal.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h index 1c849db953a5..45fca09b2319 100644 --- a/include/linux/export-internal.h +++ b/include/linux/export-internal.h @@ -52,6 +52,8 @@ #ifdef CONFIG_IA64 #define KSYM_FUNC(name) @fptr(name) +#elif defined(CONFIG_PARISC) && defined(CONFIG_64BIT) +#define KSYM_FUNC(name) P%name #else #define KSYM_FUNC(name) name #endif -- cgit v1.2.3 From 25e73b7e3f72a25aa30cbb2eecb49036e0acf066 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 2 Aug 2023 12:55:46 +0200 Subject: x86/ibt: Suppress spurious ENDBR It was reported that under certain circumstances GCC emits ENDBR instructions for _THIS_IP_ usage. Specifically, when it appears at the start of a basic block -- but not elsewhere. Since _THIS_IP_ is never used for control flow, these ENDBR instructions are completely superfluous. Override the _THIS_IP_ definition for x86_64 to avoid this. Less ENDBR instructions is better. Fixes: 156ff4a544ae ("x86/ibt: Base IBT bits") Reported-by: David Kaplan Reviewed-by: Andrew Cooper Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230802110323.016197440@infradead.org --- include/linux/instruction_pointer.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/instruction_pointer.h b/include/linux/instruction_pointer.h index cda1f706eaeb..aa0b3ffea935 100644 --- a/include/linux/instruction_pointer.h +++ b/include/linux/instruction_pointer.h @@ -2,7 +2,12 @@ #ifndef _LINUX_INSTRUCTION_POINTER_H #define _LINUX_INSTRUCTION_POINTER_H +#include + #define _RET_IP_ (unsigned long)__builtin_return_address(0) + +#ifndef _THIS_IP_ #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) +#endif #endif /* _LINUX_INSTRUCTION_POINTER_H */ -- cgit v1.2.3 From 5eb1e6e459cfa025f79c43014f66ff62a55542f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Tue, 5 Sep 2023 21:42:53 +0200 Subject: i2c: Drop legacy callback .probe_new() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that all drivers are converted to the (new) .probe() callback, the temporary .probe_new() can go away. \o/ Link: https://lore.kernel.org/linux-i2c/20230626094548.559542-1-u.kleine-koenig@pengutronix.de Reviewed-by: Javier Martinez Canillas Reviewed-by: Jean Delvare Signed-off-by: Uwe Kleine-König Signed-off-by: Wolfram Sang --- include/linux/i2c.h | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 3430cc2b05a6..0dae9db27538 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -237,7 +237,6 @@ enum i2c_driver_flags { * struct i2c_driver - represent an I2C device driver * @class: What kind of i2c device we instantiate (for detect) * @probe: Callback for device binding - * @probe_new: Transitional callback for device binding - do not use * @remove: Callback for device unbinding * @shutdown: Callback for device shutdown * @alert: Alert callback, for example for the SMBus alert protocol @@ -272,16 +271,8 @@ enum i2c_driver_flags { struct i2c_driver { unsigned int class; - union { /* Standard driver model interfaces */ - int (*probe)(struct i2c_client *client); - /* - * Legacy callback that was part of a conversion of .probe(). - * Today it has the same semantic as .probe(). Don't use for new - * code. - */ - int (*probe_new)(struct i2c_client *client); - }; + int (*probe)(struct i2c_client *client); void (*remove)(struct i2c_client *client); -- cgit v1.2.3