From b18cba09e374637a0a3759d856a6bca94c133952 Mon Sep 17 00:00:00 2001 From: minoura makoto Date: Tue, 13 Dec 2022 13:14:31 +0900 Subject: SUNRPC: ensure the matching upcall is in-flight upon downcall Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined. When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for. Moreover, it might return a msg which is not sent to rpc.gssd yet. We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7. PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up. Actual scheduing of that process might be after rpc.gssd processes the next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). The process is scheduled to see gss_msg->ctx == NULL and gss_msg->msg.errno == 0, therefore it cannot break the loop in gss_create_upcall and is never woken up after that. This patch adds a simple check to ensure that a msg which is not sent to rpc.gssd yet is not chosen as the matching upcall upon receiving a downcall. Signed-off-by: minoura makoto Signed-off-by: Hiroshi Shimamoto Tested-by: Hiroshi Shimamoto Cc: Trond Myklebust Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") Signed-off-by: Trond Myklebust --- include/linux/sunrpc/rpc_pipe_fs.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h index cd188a527d16..3b35b6f6533a 100644 --- a/include/linux/sunrpc/rpc_pipe_fs.h +++ b/include/linux/sunrpc/rpc_pipe_fs.h @@ -92,6 +92,11 @@ extern ssize_t rpc_pipe_generic_upcall(struct file *, struct rpc_pipe_msg *, char __user *, size_t); extern int rpc_queue_upcall(struct rpc_pipe *, struct rpc_pipe_msg *); +/* returns true if the msg is in-flight, i.e., already eaten by the peer */ +static inline bool rpc_msg_is_inflight(const struct rpc_pipe_msg *msg) { + return (msg->copied != 0 && list_empty(&msg->list)); +} + struct rpc_clnt; extern struct dentry *rpc_create_client_dir(struct dentry *, const char *, struct rpc_clnt *); extern int rpc_remove_client_dir(struct rpc_clnt *); -- cgit v1.2.3 From d19ab1f785d0b6b9f709799f0938658903821ba1 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 20 Dec 2022 15:13:34 +0100 Subject: mtd: cfi: allow building spi-intel standalone When MTD or MTD_CFI_GEOMETRY is disabled, the spi-intel driver fails to build, as it includes the shared CFI header: include/linux/mtd/cfi.h:62:2: error: #warning No CONFIG_MTD_CFI_Ix selected. No NOR chip support can work. [-Werror=cpp] 62 | #warning No CONFIG_MTD_CFI_Ix selected. No NOR chip support can work. linux/mtd/spi-nor.h does not actually need to include cfi.h, so remove the inclusion here to fix the warning. This uncovers a missing #include in spi-nor/core.c so add that there to prevent a different build issue. Fixes: e23e5a05d1fd ("mtd: spi-nor: intel-spi: Convert to SPI MEM") Signed-off-by: Arnd Bergmann Reviewed-by: Mika Westerberg Reviewed-by: Tokunori Ikegami Acked-by: Pratyush Yadav Reviewed-by: Tudor Ambarus Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20221220141352.1486360-1-arnd@kernel.org --- include/linux/mtd/spi-nor.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index 25765556223a..a3f8cdca90c8 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -7,7 +7,6 @@ #define __LINUX_MTD_SPI_NOR_H #include -#include #include #include -- cgit v1.2.3 From 8e1858710d9a71d88acd922f2e95d1eddb90eea0 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Thu, 17 Nov 2022 10:57:53 +0800 Subject: ceph: avoid use-after-free in ceph_fl_release_lock() When ceph releasing the file_lock it will try to get the inode pointer from the fl->fl_file, which the memory could already be released by another thread in filp_close(). Because in VFS layer the fl->fl_file doesn't increase the file's reference counter. Will switch to use ceph dedicate lock info to track the inode. And in ceph_fl_release_lock() we should skip all the operations if the fl->fl_u.ceph.inode is not set, which should come from the request file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the inode lock list, which is when copying the lock. Link: https://tracker.ceph.com/issues/57986 Signed-off-by: Xiubo Li Reviewed-by: Jeff Layton Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov --- include/linux/fs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 066555ad1bf8..c1769a2c5d70 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1119,6 +1119,9 @@ struct file_lock { int state; /* state of grant or error if -ve */ unsigned int debug_id; } afs; + struct { + struct inode *inode; + } ceph; } fl_u; } __randomize_layout; -- cgit v1.2.3 From 59b745bb4e0bd445366c45b8df6b51b69134f4f5 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 4 Jan 2023 13:49:54 -0700 Subject: io_uring: move 'poll_multi_queue' bool in io_ring_ctx The cacheline section holding this variable has two gaps, where one is caused by this bool not packing well with structs. This causes it to blow into the next cacheline. Move the variable, shrinking io_ring_ctx by a full cacheline in size. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index dcd8a563ab52..128a67a40065 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -292,6 +292,8 @@ struct io_ring_ctx { struct { spinlock_t completion_lock; + bool poll_multi_queue; + /* * ->iopoll_list is protected by the ctx->uring_lock for * io_uring instances that don't use IORING_SETUP_SQPOLL. @@ -300,7 +302,6 @@ struct io_ring_ctx { */ struct io_wq_work_list iopoll_list; struct io_hash_table cancel_table; - bool poll_multi_queue; struct llist_head work_llist; -- cgit v1.2.3 From ee4b4e2248565babfba807d82c0f3e00c392a4c0 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 4 Jan 2023 14:43:27 -0700 Subject: Revert "block: bio_copy_data_iter" This reverts commit db1c7d77976775483a8ef240b4c705f113e13ea1. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 22078a28d7cb..c1da63f6c808 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -475,6 +475,8 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty); extern void bio_set_pages_dirty(struct bio *bio); extern void bio_check_pages_dirty(struct bio *bio); +extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter, + struct bio *src, struct bvec_iter *src_iter); extern void bio_copy_data(struct bio *dst, struct bio *src); extern void bio_free_pages(struct bio *bio); void guard_bio_eod(struct bio *bio); -- cgit v1.2.3 From 050a4f341f35bf51db321c7f68700f9e0b1a7552 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 4 Jan 2023 14:44:02 -0700 Subject: Revert "block: remove devnode callback from struct block_device_operations" This reverts commit 85d6ce58e493ac8b7122e2fbe3f41b94d6ebdc11. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 301cf1cf4f2f..43d4e073b111 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1395,6 +1395,7 @@ struct block_device_operations { void (*swap_slot_free_notify) (struct block_device *, unsigned long); int (*report_zones)(struct gendisk *, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); + char *(*devnode)(struct gendisk *disk, umode_t *mode); /* returns the length of the identifier or a negative errno: */ int (*get_unique_id)(struct gendisk *disk, u8 id[16], enum blk_unique_id id_type); -- cgit v1.2.3 From 4b83e99ee7092df37a5cf292fde976ebc475ea63 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 4 Jan 2023 14:44:13 -0700 Subject: Revert "pktcdvd: remove driver." MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit f40eb99897af665f11858dd7b56edcb62c3f3c67. There are apparently still users out there of this driver. While we'd love to remove it to ease the maintenance burden, let's reinstate it for now until better (userspace) solutions can be developed. Link: https://lore.kernel.org/lkml/20230104190115.ceglfefco475ev6c@pali/ Reported-by: Pali Rohár Signed-off-by: Jens Axboe --- include/linux/pktcdvd.h | 197 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 include/linux/pktcdvd.h (limited to 'include/linux') diff --git a/include/linux/pktcdvd.h b/include/linux/pktcdvd.h new file mode 100644 index 000000000000..f9c5ac80d59b --- /dev/null +++ b/include/linux/pktcdvd.h @@ -0,0 +1,197 @@ +/* + * Copyright (C) 2000 Jens Axboe + * Copyright (C) 2001-2004 Peter Osterlund + * + * May be copied or modified under the terms of the GNU General Public + * License. See linux/COPYING for more information. + * + * Packet writing layer for ATAPI and SCSI CD-R, CD-RW, DVD-R, and + * DVD-RW devices. + * + */ +#ifndef __PKTCDVD_H +#define __PKTCDVD_H + +#include +#include +#include +#include +#include +#include +#include + +/* default bio write queue congestion marks */ +#define PKT_WRITE_CONGESTION_ON 10000 +#define PKT_WRITE_CONGESTION_OFF 9000 + + +struct packet_settings +{ + __u32 size; /* packet size in (512 byte) sectors */ + __u8 fp; /* fixed packets */ + __u8 link_loss; /* the rest is specified + * as per Mt Fuji */ + __u8 write_type; + __u8 track_mode; + __u8 block_mode; +}; + +/* + * Very crude stats for now + */ +struct packet_stats +{ + unsigned long pkt_started; + unsigned long pkt_ended; + unsigned long secs_w; + unsigned long secs_rg; + unsigned long secs_r; +}; + +struct packet_cdrw +{ + struct list_head pkt_free_list; + struct list_head pkt_active_list; + spinlock_t active_list_lock; /* Serialize access to pkt_active_list */ + struct task_struct *thread; + atomic_t pending_bios; +}; + +/* + * Switch to high speed reading after reading this many kilobytes + * with no interspersed writes. + */ +#define HI_SPEED_SWITCH 512 + +struct packet_iosched +{ + atomic_t attention; /* Set to non-zero when queue processing is needed */ + int writing; /* Non-zero when writing, zero when reading */ + spinlock_t lock; /* Protecting read/write queue manipulations */ + struct bio_list read_queue; + struct bio_list write_queue; + sector_t last_write; /* The sector where the last write ended */ + int successive_reads; +}; + +/* + * 32 buffers of 2048 bytes + */ +#if (PAGE_SIZE % CD_FRAMESIZE) != 0 +#error "PAGE_SIZE must be a multiple of CD_FRAMESIZE" +#endif +#define PACKET_MAX_SIZE 128 +#define FRAMES_PER_PAGE (PAGE_SIZE / CD_FRAMESIZE) +#define PACKET_MAX_SECTORS (PACKET_MAX_SIZE * CD_FRAMESIZE >> 9) + +enum packet_data_state { + PACKET_IDLE_STATE, /* Not used at the moment */ + PACKET_WAITING_STATE, /* Waiting for more bios to arrive, so */ + /* we don't have to do as much */ + /* data gathering */ + PACKET_READ_WAIT_STATE, /* Waiting for reads to fill in holes */ + PACKET_WRITE_WAIT_STATE, /* Waiting for the write to complete */ + PACKET_RECOVERY_STATE, /* Recover after read/write errors */ + PACKET_FINISHED_STATE, /* After write has finished */ + + PACKET_NUM_STATES /* Number of possible states */ +}; + +/* + * Information needed for writing a single packet + */ +struct pktcdvd_device; + +struct packet_data +{ + struct list_head list; + + spinlock_t lock; /* Lock protecting state transitions and */ + /* orig_bios list */ + + struct bio_list orig_bios; /* Original bios passed to pkt_make_request */ + /* that will be handled by this packet */ + int write_size; /* Total size of all bios in the orig_bios */ + /* list, measured in number of frames */ + + struct bio *w_bio; /* The bio we will send to the real CD */ + /* device once we have all data for the */ + /* packet we are going to write */ + sector_t sector; /* First sector in this packet */ + int frames; /* Number of frames in this packet */ + + enum packet_data_state state; /* Current state */ + atomic_t run_sm; /* Incremented whenever the state */ + /* machine needs to be run */ + long sleep_time; /* Set this to non-zero to make the state */ + /* machine run after this many jiffies. */ + + atomic_t io_wait; /* Number of pending IO operations */ + atomic_t io_errors; /* Number of read/write errors during IO */ + + struct bio *r_bios[PACKET_MAX_SIZE]; /* bios to use during data gathering */ + struct page *pages[PACKET_MAX_SIZE / FRAMES_PER_PAGE]; + + int cache_valid; /* If non-zero, the data for the zone defined */ + /* by the sector variable is completely cached */ + /* in the pages[] vector. */ + + int id; /* ID number for debugging */ + struct pktcdvd_device *pd; +}; + +struct pkt_rb_node { + struct rb_node rb_node; + struct bio *bio; +}; + +struct packet_stacked_data +{ + struct bio *bio; /* Original read request bio */ + struct pktcdvd_device *pd; +}; +#define PSD_POOL_SIZE 64 + +struct pktcdvd_device +{ + struct block_device *bdev; /* dev attached */ + dev_t pkt_dev; /* our dev */ + char name[20]; + struct packet_settings settings; + struct packet_stats stats; + int refcnt; /* Open count */ + int write_speed; /* current write speed, kB/s */ + int read_speed; /* current read speed, kB/s */ + unsigned long offset; /* start offset */ + __u8 mode_offset; /* 0 / 8 */ + __u8 type; + unsigned long flags; + __u16 mmc3_profile; + __u32 nwa; /* next writable address */ + __u32 lra; /* last recorded address */ + struct packet_cdrw cdrw; + wait_queue_head_t wqueue; + + spinlock_t lock; /* Serialize access to bio_queue */ + struct rb_root bio_queue; /* Work queue of bios we need to handle */ + int bio_queue_size; /* Number of nodes in bio_queue */ + bool congested; /* Someone is waiting for bio_queue_size + * to drop. */ + sector_t current_sector; /* Keep track of where the elevator is */ + atomic_t scan_queue; /* Set to non-zero when pkt_handle_queue */ + /* needs to be run. */ + mempool_t rb_pool; /* mempool for pkt_rb_node allocations */ + + struct packet_iosched iosched; + struct gendisk *disk; + + int write_congestion_off; + int write_congestion_on; + + struct device *dev; /* sysfs pktcdvd[0-7] dev */ + + struct dentry *dfs_d_root; /* debugfs: devname directory */ + struct dentry *dfs_f_info; /* debugfs: info file */ +}; + +#endif /* __PKTCDVD_H */ -- cgit v1.2.3 From da2e552b469a0cd130ff70a88ccc4139da428a65 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Mon, 28 Nov 2022 19:05:47 +0200 Subject: net/mlx5: Fix command stats access after free Command may fail while driver is reloading and can't accept FW commands till command interface is reinitialized. Such command failure is being logged to command stats. This results in NULL pointer access as command stats structure is being freed and reallocated during mlx5 devlink reload (see kernel log below). Fix it by making command stats statically allocated on driver probe. Kernel log: [ 2394.808802] BUG: unable to handle kernel paging request at 000000000002a9c0 [ 2394.810610] PGD 0 P4D 0 [ 2394.811811] Oops: 0002 [#1] SMP NOPTI ... [ 2394.815482] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0 ... [ 2394.829505] Call Trace: [ 2394.830667] _raw_spin_lock_irq+0x23/0x26 [ 2394.831858] cmd_status_err+0x55/0x110 [mlx5_core] [ 2394.833020] mlx5_access_reg+0xe7/0x150 [mlx5_core] [ 2394.834175] mlx5_query_port_ptys+0x78/0xa0 [mlx5_core] [ 2394.835337] mlx5e_ethtool_get_link_ksettings+0x74/0x590 [mlx5_core] [ 2394.836454] ? kmem_cache_alloc_trace+0x140/0x1c0 [ 2394.837562] __rh_call_get_link_ksettings+0x33/0x100 [ 2394.838663] ? __rtnl_unlock+0x25/0x50 [ 2394.839755] __ethtool_get_link_ksettings+0x72/0x150 [ 2394.840862] duplex_show+0x6e/0xc0 [ 2394.841963] dev_attr_show+0x1c/0x40 [ 2394.843048] sysfs_kf_seq_show+0x9b/0x100 [ 2394.844123] seq_read+0x153/0x410 [ 2394.845187] vfs_read+0x91/0x140 [ 2394.846226] ksys_read+0x4f/0xb0 [ 2394.847234] do_syscall_64+0x5b/0x1a0 [ 2394.848228] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 34f46ae0d4b3 ("net/mlx5: Add command failures data to debugfs") Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index d476255c9a3f..76ef2e4fde38 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -315,7 +315,7 @@ struct mlx5_cmd { struct mlx5_cmd_debug dbg; struct cmd_msg_cache cache[MLX5_NUM_COMMAND_CACHES]; int checksum_disabled; - struct mlx5_cmd_stats *stats; + struct mlx5_cmd_stats stats[MLX5_CMD_OP_MAX]; }; struct mlx5_cmd_mailbox { -- cgit v1.2.3