From 440d52b370b03b366fd26ace36bab20552116145 Mon Sep 17 00:00:00 2001 From: Rob Clark Date: Fri, 13 Sep 2024 13:23:01 -0700 Subject: drm/sched: Fix dynamic job-flow control race Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609 The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control") Closes: https://github.com/AsahiLinux/linux/issues/309 Cc: stable@vger.kernel.org Signed-off-by: Rob Clark Reviewed-by: Danilo Krummrich Tested-by: Janne Grunau Signed-off-by: Danilo Krummrich Link: https://patchwork.freedesktop.org/patch/msgid/20240913202301.16772-1-robdclark@gmail.com --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 5acc64954a88..e28bc649b5c9 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -574,7 +574,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched); void drm_sched_job_cleanup(struct drm_sched_job *job); -void drm_sched_wakeup(struct drm_gpu_scheduler *sched, struct drm_sched_entity *entity); +void drm_sched_wakeup(struct drm_gpu_scheduler *sched); bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched); void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched); void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched); -- cgit v1.2.3 From f0fa69b5011a45394554fb8061d74fee4d7cd72c Mon Sep 17 00:00:00 2001 From: Derek Foreman Date: Tue, 27 Aug 2024 11:39:04 -0500 Subject: drm/connector: hdmi: Fix writing Dynamic Range Mastering infoframes The largest infoframe we create is the DRM (Dynamic Range Mastering) infoframe which is 26 bytes + a 4 byte header, for a total of 30 bytes. With HDMI_MAX_INFOFRAME_SIZE set to 29 bytes, as it is now, we allocate too little space to pack a DRM infoframe in write_device_infoframe(), leading to an ENOSPC return from hdmi_infoframe_pack(), and never calling the connector's write_infoframe() vfunc. Instead of having HDMI_MAX_INFOFRAME_SIZE defined in two places, replace HDMI_MAX_INFOFRAME_SIZE with HDMI_INFOFRAME_SIZE(MAX) and make MAX 27 bytes - which is defined by the HDMI specification to be the largest infoframe payload. Fixes: f378b77227bc ("drm/connector: hdmi: Add Infoframes generation") Fixes: c602e4959a0c ("drm/connector: hdmi: Create Infoframe DebugFS entries") Signed-off-by: Derek Foreman Acked-by: Maxime Ripard Reviewed-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20240827163918.48160-1-derek.foreman@collabora.com Signed-off-by: Maxime Ripard --- include/linux/hdmi.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h index 3bb87bf6bc65..455f855bc084 100644 --- a/include/linux/hdmi.h +++ b/include/linux/hdmi.h @@ -59,6 +59,15 @@ enum hdmi_infoframe_type { #define HDMI_DRM_INFOFRAME_SIZE 26 #define HDMI_VENDOR_INFOFRAME_SIZE 4 +/* + * HDMI 1.3a table 5-14 states that the largest InfoFrame_length is 27, + * not including the packet header or checksum byte. We include the + * checksum byte in HDMI_INFOFRAME_HEADER_SIZE, so this should allow + * HDMI_INFOFRAME_SIZE(MAX) to be the largest buffer we could ever need + * for any HDMI infoframe. + */ +#define HDMI_MAX_INFOFRAME_SIZE 27 + #define HDMI_INFOFRAME_SIZE(type) \ (HDMI_INFOFRAME_HEADER_SIZE + HDMI_ ## type ## _INFOFRAME_SIZE) -- cgit v1.2.3 From 19da17010a55924f2b5540b0f61652cc5781af85 Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Mon, 23 Sep 2024 11:44:30 +0300 Subject: net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc Fixing the wrong size of a field in hca_cap_2. The bug was introduced by adding new fields for HWS and not fixing the reserved field size. Fixes: 34c626c3004a ("net/mlx5: Added missing mlx5_ifc definition for HW Steering") Signed-off-by: Yevgeny Kliteynik Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 620a5c305123..04df1610736e 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -2115,7 +2115,7 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 ts_cqe_metadata_size2wqe_counter[0x5]; u8 reserved_at_250[0x10]; - u8 reserved_at_260[0x120]; + u8 reserved_at_260[0x20]; u8 format_select_dw_gtpu_dw_0[0x8]; u8 format_select_dw_gtpu_dw_1[0x8]; -- cgit v1.2.3 From 76f1ed087b562a469f2153076f179854b749c09a Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Wed, 25 Sep 2024 20:01:20 +0200 Subject: netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED Fix the comment which incorrectly defines it as NLA_U32. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Phil Sutter Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index d6476ca5d7a6..9e9079321380 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -1694,7 +1694,7 @@ enum nft_flowtable_flags { * * @NFTA_FLOWTABLE_TABLE: name of the table containing the expression (NLA_STRING) * @NFTA_FLOWTABLE_NAME: name of this flow table (NLA_STRING) - * @NFTA_FLOWTABLE_HOOK: netfilter hook configuration(NLA_U32) + * @NFTA_FLOWTABLE_HOOK: netfilter hook configuration (NLA_NESTED) * @NFTA_FLOWTABLE_USE: number of references to this flow table (NLA_U32) * @NFTA_FLOWTABLE_HANDLE: object handle (NLA_U64) * @NFTA_FLOWTABLE_FLAGS: flags (NLA_U32) -- cgit v1.2.3 From 678379e1d4f7443b170939525d3312cfc37bf86b Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 16 Aug 2024 15:17:00 -0400 Subject: close_range(): fix the logics in descriptor table trimming Cloning a descriptor table picks the size that would cover all currently opened files. That's fine for clone() and unshare(), but for close_range() there's an additional twist - we clone before we close, and it would be a shame to have close_range(3, ~0U, CLOSE_RANGE_UNSHARE) leave us with a huge descriptor table when we are not going to keep anything past stderr, just because some large file descriptor used to be open before our call has taken it out. Unfortunately, it had been dealt with in an inherently racy way - sane_fdtable_size() gets a "don't copy anything past that" argument (passed via unshare_fd() and dup_fd()), close_range() decides how much should be trimmed and passes that to unshare_fd(). The problem is, a range that used to extend to the end of descriptor table back when close_range() had looked at it might very well have stuff grown after it by the time dup_fd() has allocated a new files_struct and started to figure out the capacity of fdtable to be attached to that. That leads to interesting pathological cases; at the very least it's a QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes a snapshot of descriptor table one might have observed at some point. Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination of unshare(CLONE_FILES) with plain close_range(), ending up with a weird state that would never occur with unshare(2) is confusing, to put it mildly. It's not hard to get rid of - all it takes is passing both ends of the range down to sane_fdtable_size(). There we are under ->files_lock, so the race is trivially avoided. So we do the following: * switch close_files() from calling unshare_fd() to calling dup_fd(). * undo the calling convention change done to unshare_fd() in 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" * introduce struct fd_range, pass a pointer to that to dup_fd() and sane_fdtable_size() instead of "trim everything past that point" they are currently getting. NULL means "we are not going to be punching any holes"; NR_OPEN_MAX is gone. * make sane_fdtable_size() use find_last_bit() instead of open-coding it; it's easier to follow that way. * while we are at it, have dup_fd() report errors by returning ERR_PTR(), no need to use a separate int *errorp argument. Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" Cc: stable@vger.kernel.org Signed-off-by: Al Viro --- include/linux/fdtable.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 2944d4aa413b..b1c5722f2b3c 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -22,7 +22,6 @@ * as this is the granularity returned by copy_fdset(). */ #define NR_OPEN_DEFAULT BITS_PER_LONG -#define NR_OPEN_MAX ~0U struct fdtable { unsigned int max_fds; @@ -106,7 +105,10 @@ struct task_struct; void put_files_struct(struct files_struct *fs); int unshare_files(void); -struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; +struct fd_range { + unsigned int from, to; +}; +struct files_struct *dup_fd(struct files_struct *, struct fd_range *) __latent_entropy; void do_close_on_exec(struct files_struct *); int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), @@ -115,8 +117,6 @@ int iterate_fd(struct files_struct *, unsigned, extern int close_fd(unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); extern struct file *file_close_fd(unsigned int fd); -extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, - struct files_struct **new_fdp); extern struct kmem_cache *files_cachep; -- cgit v1.2.3 From e8d4d34df715133c319fabcf63fdec684be75ff8 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 23 Sep 2024 23:22:41 +0200 Subject: net: Add netif_get_gro_max_size helper for GRO Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann Cc: Eric Dumazet Cc: Paolo Abeni Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e87b5e488325..d571451638de 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5029,6 +5029,15 @@ void netif_set_tso_max_segs(struct net_device *dev, unsigned int segs); void netif_inherit_tso_max(struct net_device *to, const struct net_device *from); +static inline unsigned int +netif_get_gro_max_size(const struct net_device *dev, const struct sk_buff *skb) +{ + /* pairs with WRITE_ONCE() in netif_set_gro(_ipv4)_max_size() */ + return skb->protocol == htons(ETH_P_IPV6) ? + READ_ONCE(dev->gro_max_size) : + READ_ONCE(dev->gro_ipv4_max_size); +} + static inline bool netif_is_macsec(const struct net_device *dev) { return dev->priv_flags & IFF_MACSEC; -- cgit v1.2.3 From e609c959a939660c7519895f853dfa5624c6827a Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 23 Sep 2024 23:22:42 +0200 Subject: net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") added a dev->gso_max_size test to gso_features_check() in order to fall back to GSO when needed. This was added as it was noticed that some drivers could misbehave if TSO packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size limit. For instance, a device could be configured with BIG TCP for IPv4, but not IPv6. Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size() and use the helper to respect both limits before falling back to GSO engine. Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") Signed-off-by: Daniel Borkmann Cc: Eric Dumazet Cc: Paolo Abeni Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d571451638de..4d20c776a4ff 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5038,6 +5038,15 @@ netif_get_gro_max_size(const struct net_device *dev, const struct sk_buff *skb) READ_ONCE(dev->gro_ipv4_max_size); } +static inline unsigned int +netif_get_gso_max_size(const struct net_device *dev, const struct sk_buff *skb) +{ + /* pairs with WRITE_ONCE() in netif_set_gso(_ipv4)_max_size() */ + return skb->protocol == htons(ETH_P_IPV6) ? + READ_ONCE(dev->gso_max_size) : + READ_ONCE(dev->gso_ipv4_max_size); +} + static inline bool netif_is_macsec(const struct net_device *dev) { return dev->priv_flags & IFF_MACSEC; -- cgit v1.2.3 From f5c82730bedbc4a424cb94d2653bcb8be9dbd2ec Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 1 Oct 2024 17:01:40 +0200 Subject: folio_queue: fix documentation s/folioq_count/folioq_full/ Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au Signed-off-by: Christian Brauner --- include/linux/folio_queue.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/folio_queue.h b/include/linux/folio_queue.h index af871405ae55..3abe614ef5f0 100644 --- a/include/linux/folio_queue.h +++ b/include/linux/folio_queue.h @@ -81,7 +81,7 @@ static inline unsigned int folioq_count(struct folio_queue *folioq) } /** - * folioq_count: Query if a folio queue segment is full + * folioq_full: Query if a folio queue segment is full * @folioq: The segment to query * * Query if a folio queue segment is fully occupied. Note that this does not -- cgit v1.2.3 From 50c6f6e6806c65e41a039f0edef0816974403253 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Wed, 25 Sep 2024 11:38:55 +0100 Subject: btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class While running checkpatch.pl against a patch that modifies the btrfs_qgroup_extent event class, it complained about using a comma instead of a semicolon: $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch WARNING: Possible comma where semicolon could be used #215: FILE: include/trace/events/btrfs.h:1720: + __entry->bytenr = bytenr, __entry->num_bytes = rec->num_bytes; total: 0 errors, 1 warnings, 184 lines checked So replace the comma with a semicolon to silence checkpatch and possibly other tools. It also makes the code consistent with the rest. Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/trace/events/btrfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index bf60ad50011e..af6b3827fb1d 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1716,7 +1716,7 @@ DECLARE_EVENT_CLASS(btrfs_qgroup_extent, ), TP_fast_assign_btrfs(fs_info, - __entry->bytenr = rec->bytenr, + __entry->bytenr = rec->bytenr; __entry->num_bytes = rec->num_bytes; ), -- cgit v1.2.3 From c0f02536fffbbec71aced36d52a765f8c4493dc2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miquel=20Sabat=C3=A9=20Sol=C3=A0?= Date: Tue, 17 Sep 2024 15:42:46 +0200 Subject: cpufreq: Avoid a bad reference count on CPU node MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the parse_perf_domain function, if the call to of_parse_phandle_with_args returns an error, then the reference to the CPU device node that was acquired at the start of the function would not be properly decremented. Address this by declaring the variable with the __free(device_node) cleanup attribute. Signed-off-by: Miquel Sabaté Solà Acked-by: Viresh Kumar Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com Cc: All applicable Signed-off-by: Rafael J. Wysocki --- include/linux/cpufreq.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index e0e19d9c1323..7fe0981a7e46 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -1107,10 +1107,9 @@ static inline int parse_perf_domain(int cpu, const char *list_name, const char *cell_name, struct of_phandle_args *args) { - struct device_node *cpu_np; int ret; - cpu_np = of_cpu_device_node_get(cpu); + struct device_node *cpu_np __free(device_node) = of_cpu_device_node_get(cpu); if (!cpu_np) return -ENODEV; @@ -1118,9 +1117,6 @@ static inline int parse_perf_domain(int cpu, const char *list_name, args); if (ret < 0) return ret; - - of_node_put(cpu_np); - return 0; } -- cgit v1.2.3 From df5215618fbe425875336d3a2d31bd599ae8c401 Mon Sep 17 00:00:00 2001 From: Jaroslav Kysela Date: Wed, 2 Oct 2024 10:13:06 +0200 Subject: ALSA: hda: fix trigger_tstamp_latched When the trigger_tstamp_latched flag is set, the PCM core code assumes that the low-level driver handles the trigger timestamping itself. Ensure that runtime->trigger_tstamp is always updated. Buglink: https://github.com/alsa-project/alsa-lib/issues/387 Reported-by: Zeno Endemann Signed-off-by: Jaroslav Kysela Link: https://patch.msgid.link/20241002081306.1788405-1-perex@perex.cz Signed-off-by: Takashi Iwai --- include/sound/hdaudio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/sound/hdaudio.h b/include/sound/hdaudio.h index 7e39d486374a..b098ceadbe74 100644 --- a/include/sound/hdaudio.h +++ b/include/sound/hdaudio.h @@ -590,7 +590,7 @@ void snd_hdac_stream_sync_trigger(struct hdac_stream *azx_dev, bool set, void snd_hdac_stream_sync(struct hdac_stream *azx_dev, bool start, unsigned int streams); void snd_hdac_stream_timecounter_init(struct hdac_stream *azx_dev, - unsigned int streams); + unsigned int streams, bool start); int snd_hdac_get_stream_stripe_ctl(struct hdac_bus *bus, struct snd_pcm_substream *substream); -- cgit v1.2.3 From cad3f4a22cfa4081cc2d465d1118cf31708fd82b Mon Sep 17 00:00:00 2001 From: Lizhi Xu Date: Fri, 27 Sep 2024 22:36:42 +0800 Subject: inotify: Fix possible deadlock in fsnotify_destroy_mark [Syzbot reported] WARNING: possible circular locking dependency detected 6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted ------------------------------------------------------ kswapd0/78 is trying to acquire lock: ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 but task is already holding lock: ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline] ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: ... kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044 inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline] inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline] __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline] __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&group->mark_mutex){+.+.}-{3:3}: ... __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934 fsnotify_inoderemove include/linux/fsnotify.h:264 [inline] dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403 __dentry_kill+0x20d/0x630 fs/dcache.c:610 shrink_kill+0xa9/0x2c0 fs/dcache.c:1055 shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082 prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163 super_cache_scan+0x34f/0x4b0 fs/super.c:221 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4815 shrink_many mm/vmscan.c:4876 [inline] lru_gen_shrink_node mm/vmscan.c:4954 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5934 kswapd_shrink_node mm/vmscan.c:6762 [inline] balance_pgdat mm/vmscan.c:6954 [inline] kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223 [Analysis] The problem is that inotify_new_watch() is using GFP_KERNEL to allocate new watches under group->mark_mutex, however if dentry reclaim races with unlinking of an inode, it can end up dropping the last dentry reference for an unlinked inode resulting in removal of fsnotify mark from reclaim context which wants to acquire group->mark_mutex as well. This scenario shows that all notification groups are in principle prone to this kind of a deadlock (previously, we considered only fanotify and dnotify to be problematic for other reasons) so make sure all allocations under group->mark_mutex happen with GFP_NOFS. Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53 Signed-off-by: Lizhi Xu Reviewed-by: Amir Goldstein Signed-off-by: Jan Kara Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com --- include/linux/fsnotify_backend.h | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 8be029bc50b1..3ecf7768e577 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -217,7 +217,6 @@ struct fsnotify_group { #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ -#define FSNOTIFY_GROUP_NOFS 0x04 /* group lock is not direct reclaim safe */ int flags; unsigned int owner_flags; /* stored flags of mark_mutex owner */ @@ -268,22 +267,19 @@ struct fsnotify_group { static inline void fsnotify_group_lock(struct fsnotify_group *group) { mutex_lock(&group->mark_mutex); - if (group->flags & FSNOTIFY_GROUP_NOFS) - group->owner_flags = memalloc_nofs_save(); + group->owner_flags = memalloc_nofs_save(); } static inline void fsnotify_group_unlock(struct fsnotify_group *group) { - if (group->flags & FSNOTIFY_GROUP_NOFS) - memalloc_nofs_restore(group->owner_flags); + memalloc_nofs_restore(group->owner_flags); mutex_unlock(&group->mark_mutex); } static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) { WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); - if (group->flags & FSNOTIFY_GROUP_NOFS) - WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); + WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); } /* When calling fsnotify tell it if the data is a path or inode */ -- cgit v1.2.3 From 5f60d5f6bbc12e782fac78110b0ee62698f3b576 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 1 Oct 2024 15:35:57 -0400 Subject: move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h --- include/asm-generic/Kbuild | 1 - include/asm-generic/uaccess.h | 2 +- include/asm-generic/unaligned.h | 146 ----------------------------------- include/crypto/chacha.h | 2 +- include/crypto/internal/ecc.h | 2 +- include/crypto/internal/poly1305.h | 2 +- include/crypto/sha1_base.h | 2 +- include/crypto/sha256_base.h | 2 +- include/crypto/sha512_base.h | 2 +- include/crypto/sm3_base.h | 2 +- include/crypto/utils.h | 2 +- include/linux/ceph/decode.h | 2 +- include/linux/ceph/libceph.h | 2 +- include/linux/etherdevice.h | 2 +- include/linux/ieee80211.h | 2 +- include/linux/mtd/map.h | 2 +- include/linux/ptp_classify.h | 2 +- include/linux/sunrpc/xdr.h | 2 +- include/linux/tpm.h | 2 +- include/linux/unaligned.h | 146 +++++++++++++++++++++++++++++++++++ include/net/bluetooth/l2cap.h | 2 +- include/net/calipso.h | 2 +- include/net/cipso_ipv4.h | 2 +- include/net/ieee80211_radiotap.h | 2 +- include/net/mac80211.h | 2 +- include/net/mac802154.h | 2 +- include/net/netfilter/nf_tables.h | 2 +- include/rdma/ib_hdrs.h | 2 +- include/rdma/iba.h | 2 +- include/scsi/scsi_transport_fc.h | 2 +- include/target/target_core_backend.h | 2 +- 31 files changed, 174 insertions(+), 175 deletions(-) delete mode 100644 include/asm-generic/unaligned.h create mode 100644 include/linux/unaligned.h (limited to 'include') diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 620b6da429d4..1b43c3a77012 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -58,7 +58,6 @@ mandatory-y += tlbflush.h mandatory-y += topology.h mandatory-y += trace_clock.h mandatory-y += uaccess.h -mandatory-y += unaligned.h mandatory-y += vermagic.h mandatory-y += vga.h mandatory-y += video.h diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h index a5be9e61a2a2..b276f783494c 100644 --- a/include/asm-generic/uaccess.h +++ b/include/asm-generic/uaccess.h @@ -11,7 +11,7 @@ #include #ifdef CONFIG_UACCESS_MEMCPY -#include +#include static __always_inline int __get_user_fn(size_t size, const void __user *from, void *to) diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h deleted file mode 100644 index 95acdd70b3b2..000000000000 --- a/include/asm-generic/unaligned.h +++ /dev/null @@ -1,146 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __ASM_GENERIC_UNALIGNED_H -#define __ASM_GENERIC_UNALIGNED_H - -/* - * This is the most generic implementation of unaligned accesses - * and should work almost anywhere. - */ -#include -#include -#include - -#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr)) -#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr)) - -static inline u16 get_unaligned_le16(const void *p) -{ - return le16_to_cpu(__get_unaligned_t(__le16, p)); -} - -static inline u32 get_unaligned_le32(const void *p) -{ - return le32_to_cpu(__get_unaligned_t(__le32, p)); -} - -static inline u64 get_unaligned_le64(const void *p) -{ - return le64_to_cpu(__get_unaligned_t(__le64, p)); -} - -static inline void put_unaligned_le16(u16 val, void *p) -{ - __put_unaligned_t(__le16, cpu_to_le16(val), p); -} - -static inline void put_unaligned_le32(u32 val, void *p) -{ - __put_unaligned_t(__le32, cpu_to_le32(val), p); -} - -static inline void put_unaligned_le64(u64 val, void *p) -{ - __put_unaligned_t(__le64, cpu_to_le64(val), p); -} - -static inline u16 get_unaligned_be16(const void *p) -{ - return be16_to_cpu(__get_unaligned_t(__be16, p)); -} - -static inline u32 get_unaligned_be32(const void *p) -{ - return be32_to_cpu(__get_unaligned_t(__be32, p)); -} - -static inline u64 get_unaligned_be64(const void *p) -{ - return be64_to_cpu(__get_unaligned_t(__be64, p)); -} - -static inline void put_unaligned_be16(u16 val, void *p) -{ - __put_unaligned_t(__be16, cpu_to_be16(val), p); -} - -static inline void put_unaligned_be32(u32 val, void *p) -{ - __put_unaligned_t(__be32, cpu_to_be32(val), p); -} - -static inline void put_unaligned_be64(u64 val, void *p) -{ - __put_unaligned_t(__be64, cpu_to_be64(val), p); -} - -static inline u32 __get_unaligned_be24(const u8 *p) -{ - return p[0] << 16 | p[1] << 8 | p[2]; -} - -static inline u32 get_unaligned_be24(const void *p) -{ - return __get_unaligned_be24(p); -} - -static inline u32 __get_unaligned_le24(const u8 *p) -{ - return p[0] | p[1] << 8 | p[2] << 16; -} - -static inline u32 get_unaligned_le24(const void *p) -{ - return __get_unaligned_le24(p); -} - -static inline void __put_unaligned_be24(const u32 val, u8 *p) -{ - *p++ = (val >> 16) & 0xff; - *p++ = (val >> 8) & 0xff; - *p++ = val & 0xff; -} - -static inline void put_unaligned_be24(const u32 val, void *p) -{ - __put_unaligned_be24(val, p); -} - -static inline void __put_unaligned_le24(const u32 val, u8 *p) -{ - *p++ = val & 0xff; - *p++ = (val >> 8) & 0xff; - *p++ = (val >> 16) & 0xff; -} - -static inline void put_unaligned_le24(const u32 val, void *p) -{ - __put_unaligned_le24(val, p); -} - -static inline void __put_unaligned_be48(const u64 val, u8 *p) -{ - *p++ = (val >> 40) & 0xff; - *p++ = (val >> 32) & 0xff; - *p++ = (val >> 24) & 0xff; - *p++ = (val >> 16) & 0xff; - *p++ = (val >> 8) & 0xff; - *p++ = val & 0xff; -} - -static inline void put_unaligned_be48(const u64 val, void *p) -{ - __put_unaligned_be48(val, p); -} - -static inline u64 __get_unaligned_be48(const u8 *p) -{ - return (u64)p[0] << 40 | (u64)p[1] << 32 | (u64)p[2] << 24 | - p[3] << 16 | p[4] << 8 | p[5]; -} - -static inline u64 get_unaligned_be48(const void *p) -{ - return __get_unaligned_be48(p); -} - -#endif /* __ASM_GENERIC_UNALIGNED_H */ diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h index b3ea73b81944..5bae6a55b333 100644 --- a/include/crypto/chacha.h +++ b/include/crypto/chacha.h @@ -15,7 +15,7 @@ #ifndef _CRYPTO_CHACHA_H #define _CRYPTO_CHACHA_H -#include +#include #include /* 32-bit stream position, then 96-bit nonce (RFC7539 convention) */ diff --git a/include/crypto/internal/ecc.h b/include/crypto/internal/ecc.h index 0717a53ae732..065f00e4bf40 100644 --- a/include/crypto/internal/ecc.h +++ b/include/crypto/internal/ecc.h @@ -27,7 +27,7 @@ #define _CRYPTO_ECC_H #include -#include +#include /* One digit is u64 qword. */ #define ECC_CURVE_NIST_P192_DIGITS 3 diff --git a/include/crypto/internal/poly1305.h b/include/crypto/internal/poly1305.h index 196aa769f296..e614594f88c1 100644 --- a/include/crypto/internal/poly1305.h +++ b/include/crypto/internal/poly1305.h @@ -6,7 +6,7 @@ #ifndef _CRYPTO_INTERNAL_POLY1305_H #define _CRYPTO_INTERNAL_POLY1305_H -#include +#include #include #include diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h index 2e0e7c3827d1..0c342ed0d038 100644 --- a/include/crypto/sha1_base.h +++ b/include/crypto/sha1_base.h @@ -14,7 +14,7 @@ #include #include -#include +#include typedef void (sha1_block_fn)(struct sha1_state *sst, u8 const *src, int blocks); diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h index ab904d82236f..e0418818d63c 100644 --- a/include/crypto/sha256_base.h +++ b/include/crypto/sha256_base.h @@ -9,7 +9,7 @@ #define _CRYPTO_SHA256_BASE_H #include -#include +#include #include #include #include diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h index b370b3340b16..679916a84cb2 100644 --- a/include/crypto/sha512_base.h +++ b/include/crypto/sha512_base.h @@ -14,7 +14,7 @@ #include #include -#include +#include typedef void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, int blocks); diff --git a/include/crypto/sm3_base.h b/include/crypto/sm3_base.h index 2f3a32ab97bb..b33ed39c2bce 100644 --- a/include/crypto/sm3_base.h +++ b/include/crypto/sm3_base.h @@ -14,7 +14,7 @@ #include #include #include -#include +#include typedef void (sm3_block_fn)(struct sm3_state *sst, u8 const *src, int blocks); diff --git a/include/crypto/utils.h b/include/crypto/utils.h index acbb917a00c6..2594f45777b5 100644 --- a/include/crypto/utils.h +++ b/include/crypto/utils.h @@ -7,7 +7,7 @@ #ifndef _CRYPTO_UTILS_H #define _CRYPTO_UTILS_H -#include +#include #include #include diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index 04f3ace5787b..8fc1aed64113 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -6,7 +6,7 @@ #include #include #include -#include +#include #include diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index 4497d0a6772c..15fb566d3f46 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -4,7 +4,7 @@ #include -#include +#include #include #include #include diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index 30114c25ad12..ecf203f01034 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -21,7 +21,7 @@ #include #include #include -#include +#include #include #ifdef __KERNEL__ diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 30cef3b940eb..456bca45ff05 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -20,7 +20,7 @@ #include #include #include -#include +#include /* * DS bit usage diff --git a/include/linux/mtd/map.h b/include/linux/mtd/map.h index b4fa92a6e44b..1b56796f6cb3 100644 --- a/include/linux/mtd/map.h +++ b/include/linux/mtd/map.h @@ -15,7 +15,7 @@ #include #include -#include +#include #include #ifdef CONFIG_MTD_MAP_BANK_WIDTH_1 diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h index 1b5a953c6bbc..3a74f69e0b59 100644 --- a/include/linux/ptp_classify.h +++ b/include/linux/ptp_classify.h @@ -10,7 +10,7 @@ #ifndef _PTP_CLASSIFY_H_ #define _PTP_CLASSIFY_H_ -#include +#include #include #include #include diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 2f8dc47f1eb0..5f775e104f9a 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -13,7 +13,7 @@ #include #include -#include +#include #include struct bio_vec; diff --git a/include/linux/tpm.h b/include/linux/tpm.h index e93ee8d936a9..587b96b4418e 100644 --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -537,7 +537,7 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf, int rc); void tpm2_end_auth_session(struct tpm_chip *chip); #else -#include +#include static inline int tpm2_start_auth_session(struct tpm_chip *chip) { diff --git a/include/linux/unaligned.h b/include/linux/unaligned.h new file mode 100644 index 000000000000..4a9651017e3c --- /dev/null +++ b/include/linux/unaligned.h @@ -0,0 +1,146 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_UNALIGNED_H +#define __LINUX_UNALIGNED_H + +/* + * This is the most generic implementation of unaligned accesses + * and should work almost anywhere. + */ +#include +#include +#include + +#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr)) +#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr)) + +static inline u16 get_unaligned_le16(const void *p) +{ + return le16_to_cpu(__get_unaligned_t(__le16, p)); +} + +static inline u32 get_unaligned_le32(const void *p) +{ + return le32_to_cpu(__get_unaligned_t(__le32, p)); +} + +static inline u64 get_unaligned_le64(const void *p) +{ + return le64_to_cpu(__get_unaligned_t(__le64, p)); +} + +static inline void put_unaligned_le16(u16 val, void *p) +{ + __put_unaligned_t(__le16, cpu_to_le16(val), p); +} + +static inline void put_unaligned_le32(u32 val, void *p) +{ + __put_unaligned_t(__le32, cpu_to_le32(val), p); +} + +static inline void put_unaligned_le64(u64 val, void *p) +{ + __put_unaligned_t(__le64, cpu_to_le64(val), p); +} + +static inline u16 get_unaligned_be16(const void *p) +{ + return be16_to_cpu(__get_unaligned_t(__be16, p)); +} + +static inline u32 get_unaligned_be32(const void *p) +{ + return be32_to_cpu(__get_unaligned_t(__be32, p)); +} + +static inline u64 get_unaligned_be64(const void *p) +{ + return be64_to_cpu(__get_unaligned_t(__be64, p)); +} + +static inline void put_unaligned_be16(u16 val, void *p) +{ + __put_unaligned_t(__be16, cpu_to_be16(val), p); +} + +static inline void put_unaligned_be32(u32 val, void *p) +{ + __put_unaligned_t(__be32, cpu_to_be32(val), p); +} + +static inline void put_unaligned_be64(u64 val, void *p) +{ + __put_unaligned_t(__be64, cpu_to_be64(val), p); +} + +static inline u32 __get_unaligned_be24(const u8 *p) +{ + return p[0] << 16 | p[1] << 8 | p[2]; +} + +static inline u32 get_unaligned_be24(const void *p) +{ + return __get_unaligned_be24(p); +} + +static inline u32 __get_unaligned_le24(const u8 *p) +{ + return p[0] | p[1] << 8 | p[2] << 16; +} + +static inline u32 get_unaligned_le24(const void *p) +{ + return __get_unaligned_le24(p); +} + +static inline void __put_unaligned_be24(const u32 val, u8 *p) +{ + *p++ = (val >> 16) & 0xff; + *p++ = (val >> 8) & 0xff; + *p++ = val & 0xff; +} + +static inline void put_unaligned_be24(const u32 val, void *p) +{ + __put_unaligned_be24(val, p); +} + +static inline void __put_unaligned_le24(const u32 val, u8 *p) +{ + *p++ = val & 0xff; + *p++ = (val >> 8) & 0xff; + *p++ = (val >> 16) & 0xff; +} + +static inline void put_unaligned_le24(const u32 val, void *p) +{ + __put_unaligned_le24(val, p); +} + +static inline void __put_unaligned_be48(const u64 val, u8 *p) +{ + *p++ = (val >> 40) & 0xff; + *p++ = (val >> 32) & 0xff; + *p++ = (val >> 24) & 0xff; + *p++ = (val >> 16) & 0xff; + *p++ = (val >> 8) & 0xff; + *p++ = val & 0xff; +} + +static inline void put_unaligned_be48(const u64 val, void *p) +{ + __put_unaligned_be48(val, p); +} + +static inline u64 __get_unaligned_be48(const u8 *p) +{ + return (u64)p[0] << 40 | (u64)p[1] << 32 | (u64)p[2] << 24 | + p[3] << 16 | p[4] << 8 | p[5]; +} + +static inline u64 get_unaligned_be48(const void *p) +{ + return __get_unaligned_be48(p); +} + +#endif /* __LINUX_UNALIGNED_H */ diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h index 313d0b972e06..d9c767cf773d 100644 --- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -27,7 +27,7 @@ #ifndef __L2CAP_H #define __L2CAP_H -#include +#include #include /* L2CAP defaults */ diff --git a/include/net/calipso.h b/include/net/calipso.h index f8667a3fda9e..76b9e08c10c2 100644 --- a/include/net/calipso.h +++ b/include/net/calipso.h @@ -25,7 +25,7 @@ #include #include #include -#include +#include /* known doi values */ #define CALIPSO_DOI_UNKNOWN 0x00000000 diff --git a/include/net/cipso_ipv4.h b/include/net/cipso_ipv4.h index c9111bb2f59b..d6780d7903f4 100644 --- a/include/net/cipso_ipv4.h +++ b/include/net/cipso_ipv4.h @@ -28,7 +28,7 @@ #include #include #include -#include +#include /* known doi values */ #define CIPSO_V4_DOI_UNKNOWN 0x00000000 diff --git a/include/net/ieee80211_radiotap.h b/include/net/ieee80211_radiotap.h index 91762faecc13..02fbc036f34e 100644 --- a/include/net/ieee80211_radiotap.h +++ b/include/net/ieee80211_radiotap.h @@ -18,7 +18,7 @@ #define __RADIOTAP_H #include -#include +#include /** * struct ieee80211_radiotap_header - base radiotap header diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 954dff901b69..333e0fae6796 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -22,7 +22,7 @@ #include #include #include -#include +#include /** * DOC: Introduction diff --git a/include/net/mac802154.h b/include/net/mac802154.h index 1b5488fa2ff0..d72006a85f02 100644 --- a/include/net/mac802154.h +++ b/include/net/mac802154.h @@ -7,7 +7,7 @@ #ifndef NET_MAC802154_H #define NET_MAC802154_H -#include +#include #include #include #include diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 49708e7e1339..91ae20cb7648 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -2,7 +2,7 @@ #ifndef _NET_NF_TABLES_H #define _NET_NF_TABLES_H -#include +#include #include #include #include diff --git a/include/rdma/ib_hdrs.h b/include/rdma/ib_hdrs.h index 8ae07c0ecdf7..1c4c1a69937a 100644 --- a/include/rdma/ib_hdrs.h +++ b/include/rdma/ib_hdrs.h @@ -7,7 +7,7 @@ #define IB_HDRS_H #include -#include +#include #include #define IB_SEQ_NAK (3 << 29) diff --git a/include/rdma/iba.h b/include/rdma/iba.h index 6a1115b02a0d..dcae154edc26 100644 --- a/include/rdma/iba.h +++ b/include/rdma/iba.h @@ -7,7 +7,7 @@ #include #include -#include +#include static inline u32 _iba_get8(const u8 *ptr) { diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h index 8e6c60090c62..d02b55261307 100644 --- a/include/scsi/scsi_transport_fc.h +++ b/include/scsi/scsi_transport_fc.h @@ -12,7 +12,7 @@ #include #include -#include +#include #include #include #include diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h index 739df993aa5e..4063a701081b 100644 --- a/include/target/target_core_backend.h +++ b/include/target/target_core_backend.h @@ -3,7 +3,7 @@ #define TARGET_CORE_BACKEND_H #include -#include +#include #include #define TRANSPORT_FLAG_PASSTHROUGH 0x1 -- cgit v1.2.3 From 49d14b54a527289d09a9480f214b8c586322310a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 26 Sep 2024 16:58:36 +0000 Subject: net: test for not too small csum_start in virtio_net_hdr_to_skb() syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/virtio_net.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 276ca543ef44..02a9f4dc594d 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -103,8 +103,10 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, if (!skb_partial_csum_set(skb, start, off)) return -EINVAL; + if (skb_transport_offset(skb) < nh_min_len) + return -EINVAL; - nh_min_len = max_t(u32, nh_min_len, skb_transport_offset(skb)); + nh_min_len = skb_transport_offset(skb); p_off = nh_min_len + thlen; if (!pskb_may_pull(skb, p_off)) return -EINVAL; -- cgit v1.2.3 From a848c29e3486189aaabd5663bc11aea50c5bd144 Mon Sep 17 00:00:00 2001 From: Yanjun Zhang Date: Tue, 1 Oct 2024 16:39:30 +0800 Subject: NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies() On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang Reviewed-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/nfs_fs_sb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 853df3fcd4c2..b804346a9741 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -249,6 +249,7 @@ struct nfs_server { struct list_head layouts; struct list_head delegations; struct list_head ss_copies; + struct list_head ss_src_copies; unsigned long delegation_gen; unsigned long mig_gen; -- cgit v1.2.3 From 65f2a5c366353da6fa724c68347e1de954928143 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 3 Oct 2024 15:34:58 -0400 Subject: nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put() Add nfs_to_nfsd_file_put_local() interface to fix race with nfsd module unload. Similarly, use RCU around nfs_open_local_fh()'s error path call to nfs_to->nfsd_serv_put(). Holding RCU ensures that NFS will safely _call and return_ from its nfs_to calls into the NFSD functions nfsd_file_put_local() and nfsd_serv_put(). Otherwise, if RCU isn't used then there is a narrow window when NFS's reference for the nfsd_file and nfsd_serv are dropped and the NFSD module could be unloaded, which could result in a crash from the return instruction for either nfs_to->nfsd_file_put_local() or nfs_to->nfsd_serv_put(). Reported-by: NeilBrown Signed-off-by: Mike Snitzer Signed-off-by: Anna Schumaker --- include/linux/nfslocalio.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include') diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h index b353abe00357..b0dd9b1eef4f 100644 --- a/include/linux/nfslocalio.h +++ b/include/linux/nfslocalio.h @@ -65,10 +65,25 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *, struct rpc_clnt *, const struct cred *, const struct nfs_fh *, const fmode_t); +static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio) +{ + /* + * Once reference to nfsd_serv is dropped, NFSD could be + * unloaded, so ensure safe return from nfsd_file_put_local() + * by always taking RCU. + */ + rcu_read_lock(); + nfs_to->nfsd_file_put_local(localio); + rcu_read_unlock(); +} + #else /* CONFIG_NFS_LOCALIO */ static inline void nfsd_localio_ops_init(void) { } +static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio) +{ +} #endif /* CONFIG_NFS_LOCALIO */ #endif /* __LINUX_NFSLOCALIO_H */ -- cgit v1.2.3 From 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 Mon Sep 17 00:00:00 2001 From: Anastasia Kovaleva Date: Thu, 3 Oct 2024 13:44:31 +0300 Subject: net: Fix an unsafe loop on the list The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva Reviewed-by: Dmitry Bogdanov Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski --- include/net/sock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/sock.h b/include/net/sock.h index c58ca8dd561b..db29c39e19a7 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -894,6 +894,8 @@ static inline void sk_add_bind_node(struct sock *sk, hlist_for_each_entry_safe(__sk, tmp, list, sk_node) #define sk_for_each_bound(__sk, list) \ hlist_for_each_entry(__sk, list, sk_bind_node) +#define sk_for_each_bound_safe(__sk, tmp, list) \ + hlist_for_each_entry_safe(__sk, tmp, list, sk_bind_node) /** * sk_for_each_entry_offset_rcu - iterate over a list at a given struct offset -- cgit v1.2.3 From 3cb7cf1540ddff5473d6baeb530228d19bc97b8a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 7 Oct 2024 18:41:30 +0000 Subject: net/sched: accept TCA_STAB only for root qdisc Most qdiscs maintain their backlog using qdisc_pkt_len(skb) on the assumption it is invariant between the enqueue() and dequeue() handlers. Unfortunately syzbot can crash a host rather easily using a TBF + SFQ combination, with an STAB on SFQ [1] We can't support TCA_STAB on arbitrary level, this would require to maintain per-qdisc storage. [1] [ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 88.798611] #PF: supervisor read access in kernel mode [ 88.799014] #PF: error_code(0x0000) - not-present page [ 88.799506] PGD 0 P4D 0 [ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI [ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117 [ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00 All code ======== 0: 0f b7 50 12 movzwl 0x12(%rax),%edx 4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax b: 00 c: 48 89 d6 mov %rdx,%rsi f: 48 29 d0 sub %rdx,%rax 12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx 19: 48 c1 e0 03 shl $0x3,%rax 1d: 48 01 c2 add %rax,%rdx 20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx) 25: 7e c0 jle 0xffffffffffffffe7 27: 48 8b 3a mov (%rdx),%rdi 2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction 2d: 4c 89 02 mov %r8,(%rdx) 30: 49 89 50 08 mov %rdx,0x8(%r8) 34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 3b: 00 3c: 48 rex.W 3d: c7 .byte 0xc7 3e: 07 (bad) ... Code starting with the faulting instruction =========================================== 0: 4c 8b 07 mov (%rdi),%r8 3: 4c 89 02 mov %r8,(%rdx) 6: 49 89 50 08 mov %rdx,0x8(%r8) a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 11: 00 12: 48 rex.W 13: c7 .byte 0xc7 14: 07 (bad) ... [ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206 [ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800 [ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000 [ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f [ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140 [ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac [ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000 [ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0 [ 88.808165] Call Trace: [ 88.808459] [ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715) [ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) [ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq [ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf [ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958) [ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673) [ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517) [ 88.812505] __fput (fs/file_table.c:432 (discriminator 1)) [ 88.812735] task_work_run (kernel/task_work.c:230) [ 88.813016] do_exit (kernel/exit.c:940) [ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4)) [ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088) [ 88.813867] do_group_exit (kernel/exit.c:1070) [ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099) [ 88.814490] x64_sys_call (??:?) [ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 88.815495] RIP: 0033:0x7f44560f1975 Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Daniel Borkmann Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/sch_generic.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 79edd5b5e3c9..5d74fa7e694c 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -848,7 +848,6 @@ static inline void qdisc_calculate_pkt_len(struct sk_buff *skb, static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { - qdisc_calculate_pkt_len(skb, sch); return sch->enqueue(skb, sch, to_free); } -- cgit v1.2.3 From 9897713fe1077c90b4a86c9af0a878d56c8888a2 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 26 Sep 2024 19:11:50 +0200 Subject: bcachefs: do not use PF_MEMALLOC_NORECLAIM Patch series "remove PF_MEMALLOC_NORECLAIM" v3. This patch (of 2): bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new inode to achieve GFP_NOWAIT semantic while holding locks. If this allocation fails it will drop locks and use GFP_NOFS allocation context. We would like to drop PF_MEMALLOC_NORECLAIM because it is really dangerous to use if the caller doesn't control the full call chain with this flag set. E.g. if any of the function down the chain needed GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and cause unexpected failure. While this is not the case in this particular case using the scoped gfp semantic is not really needed bacause we can easily pus the allocation context down the chain without too much clutter. [akpm@linux-foundation.org: fix kerneldoc warnings] Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Christoph Hellwig Reviewed-by: Dave Chinner Reviewed-by: Jan Kara # For vfs changes Cc: Al Viro Cc: Christian Brauner Cc: James Morris Cc: Kent Overstreet Cc: Paul Moore Cc: Serge E. Hallyn Cc: Yafang Shao Cc: Matthew Wilcox (Oracle) Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/fs.h | 7 ++++++- include/linux/security.h | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/fs.h b/include/linux/fs.h index e3c603d01337..3559446279c1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3082,7 +3082,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence); extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence); -extern int inode_init_always(struct super_block *, struct inode *); +extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t); +static inline int inode_init_always(struct super_block *sb, struct inode *inode) +{ + return inode_init_always_gfp(sb, inode, GFP_NOFS); +} + extern void inode_init_once(struct inode *); extern void address_space_init_once(struct address_space *mapping); extern struct inode * igrab(struct inode *); diff --git a/include/linux/security.h b/include/linux/security.h index b86ec2afc691..2ec8f3014757 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -348,7 +348,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode, struct cred *new); int security_path_notify(const struct path *path, u64 mask, unsigned int obj_type); -int security_inode_alloc(struct inode *inode); +int security_inode_alloc(struct inode *inode, gfp_t gfp); void security_inode_free(struct inode *inode); int security_inode_init_security(struct inode *inode, struct inode *dir, const struct qstr *qstr, @@ -789,7 +789,7 @@ static inline int security_path_notify(const struct path *path, u64 mask, return 0; } -static inline int security_inode_alloc(struct inode *inode) +static inline int security_inode_alloc(struct inode *inode, gfp_t gfp) { return 0; } -- cgit v1.2.3 From 9a8da05d7ad619beb84d0c6904c3fa7022c6fb9b Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 26 Sep 2024 19:11:51 +0200 Subject: Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN" This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4. There is no existing user of those flags. PF_MEMALLOC_NOWARN is dangerous because a nested allocation context can use GFP_NOFAIL which could cause unexpected failure. Such a code would be hard to maintain because it could be deeper in the call chain. PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that such a allocation contex is inherently unsafe if the context doesn't fully control all allocations called from this context. While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is it doesn't have any user and as Matthew has pointed out we are running out of those flags so better reclaim it without any real users. [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/ Link: https://lkml.kernel.org/r/20240926172940.167084-3-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Dave Chinner Reviewed-by: Vlastimil Babka Cc: Al Viro Cc: Christian Brauner Cc: James Morris Cc: Jan Kara Cc: Kent Overstreet Cc: Paul Moore Cc: Serge E. Hallyn Cc: Yafang Shao Signed-off-by: Andrew Morton --- include/linux/sched.h | 4 ++-- include/linux/sched/mm.h | 17 ++++------------- 2 files changed, 6 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/include/linux/sched.h b/include/linux/sched.h index e6ee4258169a..449dd64ed9ac 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1681,8 +1681,8 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF_MEMALLOC_NORECLAIM 0x00800000 /* All allocation requests will clear __GFP_DIRECT_RECLAIM */ -#define PF_MEMALLOC_NOWARN 0x01000000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF__HOLE__00800000 0x00800000 +#define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 07bb8d4181d7..928a626725e6 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -251,25 +251,16 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | - PF_MEMALLOC_NOFS | - PF_MEMALLOC_NORECLAIM | - PF_MEMALLOC_NOWARN | - PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { /* - * Stronger flags before weaker flags: - * NORECLAIM implies NOIO, which in turn implies NOFS + * NOIO implies both NOIO and NOFS and it is a weaker context + * so always make sure it makes precedence */ - if (pflags & PF_MEMALLOC_NORECLAIM) - flags &= ~__GFP_DIRECT_RECLAIM; - else if (pflags & PF_MEMALLOC_NOIO) + if (pflags & PF_MEMALLOC_NOIO) flags &= ~(__GFP_IO | __GFP_FS); else if (pflags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; - if (pflags & PF_MEMALLOC_NOWARN) - flags |= __GFP_NOWARN; - if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; } -- cgit v1.2.3 From 04b670de2859a8a8b0830779f9c9bda5d39662ab Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 7 Oct 2024 16:54:11 -0400 Subject: closures: Add closure_wait_event_timeout() Add a closure version of wait_event_timeout(), with the same semantics. The closure version is useful because unlike wait_event(), it allows blocking code to run in the conditional expression. Cc: Coly Li Signed-off-by: Kent Overstreet --- include/linux/closure.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 'include') diff --git a/include/linux/closure.h b/include/linux/closure.h index 2af44427107d..880fe85e35e9 100644 --- a/include/linux/closure.h +++ b/include/linux/closure.h @@ -454,4 +454,39 @@ do { \ __closure_wait_event(waitlist, _cond); \ } while (0) +#define __closure_wait_event_timeout(waitlist, _cond, _until) \ +({ \ + struct closure cl; \ + long _t; \ + \ + closure_init_stack(&cl); \ + \ + while (1) { \ + closure_wait(waitlist, &cl); \ + if (_cond) { \ + _t = max_t(long, 1L, _until - jiffies); \ + break; \ + } \ + _t = max_t(long, 0L, _until - jiffies); \ + if (!_t) \ + break; \ + closure_sync_timeout(&cl, _t); \ + } \ + closure_wake_up(waitlist); \ + closure_sync(&cl); \ + _t; \ +}) + +/* + * Returns 0 if timeout expired, remaining time in jiffies (at least 1) if + * condition became true + */ +#define closure_wait_event_timeout(waitlist, _cond, _timeout) \ +({ \ + unsigned long _until = jiffies + _timeout; \ + (_cond) \ + ? max_t(long, 1L, _until - jiffies) \ + : __closure_wait_event_timeout(waitlist, _cond, _until);\ +}) + #endif /* _LINUX_CLOSURE_H */ -- cgit v1.2.3 From 07cc7b0b942bf55ef1a471470ecda8d2a6a6541f Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 8 Oct 2024 11:47:32 -0700 Subject: rtnetlink: Add bulk registration helpers for rtnetlink message handlers. Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"), once rtnl_msg_handlers[protocol] was allocated, the following rtnl_register_module() for the same protocol never failed. However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to be allocated in each rtnl_register_module(), so each call could fail. Many callers of rtnl_register_module() do not handle the returned error, and we need to add many error handlings. To handle that easily, let's add wrapper functions for bulk registration of rtnetlink message handlers. Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- include/net/rtnetlink.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index b45d57b5968a..2d3eb7cb4dff 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -29,6 +29,15 @@ static inline enum rtnl_kinds rtnl_msgtype_kind(int msgtype) return msgtype & RTNL_KIND_MASK; } +struct rtnl_msg_handler { + struct module *owner; + int protocol; + int msgtype; + rtnl_doit_func doit; + rtnl_dumpit_func dumpit; + int flags; +}; + void rtnl_register(int protocol, int msgtype, rtnl_doit_func, rtnl_dumpit_func, unsigned int flags); int rtnl_register_module(struct module *owner, int protocol, int msgtype, @@ -36,6 +45,14 @@ int rtnl_register_module(struct module *owner, int protocol, int msgtype, int rtnl_unregister(int protocol, int msgtype); void rtnl_unregister_all(int protocol); +int __rtnl_register_many(const struct rtnl_msg_handler *handlers, int n); +void __rtnl_unregister_many(const struct rtnl_msg_handler *handlers, int n); + +#define rtnl_register_many(handlers) \ + __rtnl_register_many(handlers, ARRAY_SIZE(handlers)) +#define rtnl_unregister_many(handlers) \ + __rtnl_unregister_many(handlers, ARRAY_SIZE(handlers)) + static inline int rtnl_msg_family(const struct nlmsghdr *nlh) { if (nlmsg_len(nlh) >= sizeof(struct rtgenmsg)) -- cgit v1.2.3 From d51705614f668254cc5def7490df76f9680b4659 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 8 Oct 2024 11:47:35 -0700 Subject: mctp: Handle error of rtnl_register_module(). Since introduced, mctp has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Fixes: 831119f88781 ("mctp: Add neighbour netlink interface") Fixes: 06d2f4c583a7 ("mctp: Add netlink route management") Signed-off-by: Kuniyuki Iwashima Reviewed-by: Jeremy Kerr Signed-off-by: Paolo Abeni --- include/net/mctp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/mctp.h b/include/net/mctp.h index 7b17c52e8ce2..28d59ae94ca3 100644 --- a/include/net/mctp.h +++ b/include/net/mctp.h @@ -295,7 +295,7 @@ void mctp_neigh_remove_dev(struct mctp_dev *mdev); int mctp_routes_init(void); void mctp_routes_exit(void); -void mctp_device_init(void); +int mctp_device_init(void); void mctp_device_exit(void); #endif /* __NET_MCTP_H */ -- cgit v1.2.3