summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2022-05-09floppy: disable FDRAWCMD by defaultWilly Tarreau
commit 233087ca063686964a53c829d547c7571e3f67bf upstream. Minh Yuan reported a concurrency use-after-free issue in the floppy code between raw_cmd_ioctl and seek_interrupt. [ It turns out this has been around, and that others have reported the KASAN splats over the years, but Minh Yuan had a reproducer for it and so gets primary credit for reporting it for this fix - Linus ] The problem is, this driver tends to break very easily and nowadays, nobody is expected to use FDRAWCMD anyway since it was used to manipulate non-standard formats. The risk of breaking the driver is higher than the risk presented by this race, and accessing the device requires privileges anyway. Let's just add a config option to completely disable this ioctl and leave it disabled by default. Distros shouldn't use it, and only those running on antique hardware might need to enable it. Link: https://lore.kernel.org/all/000000000000b71cdd05d703f6bf@google.com/ Link: https://lore.kernel.org/lkml/CAKcFiNC=MfYVW-Jt9A3=FPJpTwCD2PL_ULNCpsCVE5s8ZeBQgQ@mail.gmail.com Link: https://lore.kernel.org/all/CAEAjamu1FRhz6StCe_55XY5s389ZP_xmCF69k987En+1z53=eg@mail.gmail.com Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Reported-by: syzbot+8e8958586909d62b6840@syzkaller.appspotmail.com Reported-by: cruise k <cruise4k@gmail.com> Reported-by: Kyungtae Kim <kt0755@gmail.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Tested-by: Denis Efremov <efremov@linux.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15drbd: Fix five use after free bugs in get_initial_stateLv Yunlong
[ Upstream commit aadb22ba2f656581b2f733deb3a467c48cc618f6 ] In get_initial_state, it calls notify_initial_state_done(skb,..) if cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(), the skb will be freed by nlmsg_free(skb). Then get_initial_state will goto out and the freed skb will be used by return value skb->len, which is a uaf bug. What's worse, the same problem goes even further: skb can also be freed in the notify_*_state_change -> notify_*_state calls below. Thus 4 additional uaf bugs happened. My patch lets the problem callee functions: notify_initial_state_done and notify_*_state_change return an error code if errors happen. So that the error codes could be propagated and the uaf bugs can be avoid. v2 reports a compilation warning. This v3 fixed this warning and built successfully in my local environment with no additional warnings. v2: https://lore.kernel.org/patchwork/patch/1435218/ Fixes: a29728463b254 ("drbd: Backport the "events2" command") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15loop: use sysfs_emit() in the sysfs xxx show()Chaitanya Kulkarni
[ Upstream commit b27824d31f09ea7b4a6ba2c1b18bd328df3e8bed ] sprintf does not know the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content and it's possible to overrun the PAGE_SIZE buffer length. Use a generic sysfs_emit function that knows the size of the temporary buffer and ensures that no overrun is done for offset attribute in loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15drbd: fix potential silent data corruptionLars Ellenberg
commit f4329d1f848ac35757d9cc5487669d19dfc5979c upstream. Scenario: --------- bio chain generated by blk_queue_split(). Some split bio fails and propagates its error status to the "parent" bio. But then the (last part of the) parent bio itself completes without error. We would clobber the already recorded error status with BLK_STS_OK, causing silent data corruption. Reproducer: ----------- How to trigger this in the real world within seconds: DRBD on top of degraded parity raid, small stripe_cache_size, large read_ahead setting. Drop page cache (sysctl vm.drop_caches=1, fadvise "DONTNEED", umount and mount again, "reboot"). Cause significant read ahead. Large read ahead request is split by blk_queue_split(). Parts of the read ahead that are already in the stripe cache, or find an available stripe cache to use, can be serviced. Parts of the read ahead that would need "too much work", would need to wait for a "stripe_head" to become available, are rejected immediately. For larger read ahead requests that are split in many pieces, it is very likely that some "splits" will be serviced, but then the stripe cache is exhausted/busy, and the remaining ones will be rejected. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: <stable@vger.kernel.org> # 4.13.x Link: https://lore.kernel.org/r/20220330185551.3553196-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15virtio-blk: Use blk_validate_block_size() to validate block sizeXie Yongji
commit 57a13a5b8157d9a8606490aaa1b805bafe6c37e1 upstream. The block layer can't support a block size larger than page size yet. And a block size that's too small or not a power of two won't work either. If a misconfigured device presents an invalid block size in configuration space, it will result in the kernel crash something like below: [ 506.154324] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 506.160416] RIP: 0010:create_empty_buffers+0x24/0x100 [ 506.174302] Call Trace: [ 506.174651] create_page_buffers+0x4d/0x60 [ 506.175207] block_read_full_page+0x50/0x380 [ 506.175798] ? __mod_lruvec_page_state+0x60/0xa0 [ 506.176412] ? __add_to_page_cache_locked+0x1b2/0x390 [ 506.177085] ? blkdev_direct_IO+0x4a0/0x4a0 [ 506.177644] ? scan_shadow_nodes+0x30/0x30 [ 506.178206] ? lru_cache_add+0x42/0x60 [ 506.178716] do_read_cache_page+0x695/0x740 [ 506.179278] ? read_part_sector+0xe0/0xe0 [ 506.179821] read_part_sector+0x36/0xe0 [ 506.180337] adfspart_check_ICS+0x32/0x320 [ 506.180890] ? snprintf+0x45/0x70 [ 506.181350] ? read_part_sector+0xe0/0xe0 [ 506.181906] bdev_disk_changed+0x229/0x5c0 [ 506.182483] blkdev_get_whole+0x6d/0x90 [ 506.183013] blkdev_get_by_dev+0x122/0x2d0 [ 506.183562] device_add_disk+0x39e/0x3c0 [ 506.184472] virtblk_probe+0x3f8/0x79b [virtio_blk] [ 506.185461] virtio_dev_probe+0x15e/0x1d0 [virtio] So let's use a block layer helper to validate the block size. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211026144015.188-5-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zeroXie Yongji
[ Upstream commit dacc73ed0b88f1a787ec20385f42ca9dd9eddcd0 ] Currently the value of max_discard_segment will be set to MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device set 0 to max_discard_seg in configuration space. It's incorrect since the device might not be able to handle such large descriptors. To fix it, let's follow max_segments restrictions in this case. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-11xen/blkfront: don't use gnttab_query_foreign_access() for mapped statusJuergen Gross
Commit abf1fd5919d6238ee3bc5eb4a9b6c3947caa6638 upstream. It isn't enough to check whether a grant is still being in use by calling gnttab_query_foreign_access(), as a mapping could be realized by the other side just after having called that function. In case the call was done in preparation of revoking a grant it is better to do so via gnttab_end_foreign_access_ref() and check the success of that operation instead. For the ring allocation use alloc_pages_exact() in order to avoid high order pages in case of a multi-page ring. If a grant wasn't unmapped by the backend without persistent grants being used, set the device state to "error". This is CVE-2022-23036 / part of XSA-396. Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27floppy: Add max size check for user space requestXiongwei Song
[ Upstream commit 545a32498c536ee152331cd2e7d2416aa0f20e01 ] We need to check the max request size that is from user space before allocating pages. If the request size exceeds the limit, return -EINVAL. This check can avoid the warning below from page allocator. WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline] WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356 Modules linked in: CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344 Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9 RSP: 0018:ffffc90023b87850 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960 R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30 FS: 0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418 raw_cmd_copyin drivers/block/floppy.c:3113 [inline] raw_cmd_ioctl drivers/block/floppy.c:3160 [inline] fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528 fd_ioctl drivers/block/floppy.c:3555 [inline] fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.com Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27floppy: Fix hang in watchdog when disk is ejectedTasos Sahanidis
[ Upstream commit fb48febce7e30baed94dd791e19521abd2c3fd83 ] When the watchdog detects a disk change, it calls cancel_activity(), which in turn tries to cancel the fd_timer delayed work. In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning it is trying to cancel its own work. This results in a hang as cancel_delayed_work_sync() is waiting for the watchdog (itself) to return, which never happens. This can be reproduced relatively consistently by attempting to read a broken floppy, and ejecting it while IO is being attempted and retried. To resolve this, this patch calls cancel_delayed_work() instead, which cancels the work without waiting for the watchdog to return and finish. Before this regression was introduced, the code in this section used del_timer(), and not del_timer_sync() to delete the watchdog timer. Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com Fixes: 070ad7e793dc ("floppy: convert to delayed work and single-thread wq") Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22xen/blkfront: harden blkfront against event channel stormsJuergen Gross
commit 0fd08a34e8e3b67ec9bd8287ac0facf8374b844a upstream. The Xen blkfront driver is still vulnerable for an attack via excessive number of events sent by the backend. Fix that by using lateeoi event channels. This is part of XSA-391 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: don't trust the backend response data blindlyJuergen Gross
commit b94e4b147fd1992ad450e1fea1fdaa3738753373 upstream. Today blkfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Introduce a new state of the ring BLKIF_STATE_ERROR which will be switched to in case an inconsistency is being detected. Recovering from this state is possible only via removing and adding the virtual device again (e.g. via a suspend/resume cycle). Make all warning messages issued due to valid error responses rate limited in order to avoid message floods being triggered by a malicious backend. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: don't take local copy of a request from the ring pageJuergen Gross
commit 8f5a695d99000fc3aa73934d7ced33cfc64dcdab upstream. In order to avoid a malicious backend being able to influence the local copy of a request build the request locally first and then copy it to the ring page instead of doing it the other way round as today. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: read response from backend only onceJuergen Gross
commit 71b66243f9898d0e54296b4e7035fb33cdcb0707 upstream. In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17zram: off by one in read_block_state()Dan Carpenter
[ Upstream commit a88e03cf3d190cf46bc4063a9b7efe87590de5f4 ] snprintf() returns the number of bytes it would have printed if there were space. But it does not count the NUL terminator. So that means that if "count == copied" then this has already overflowed by one character. This bug likely isn't super harmful in real life. Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili Fixes: c0265342bff4 ("zram: introduce zram memory tracking") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17block: ataflop: fix breakage introduced at blk-mq refactoringMichael Schmitz
[ Upstream commit 86d46fdaa12ae5befc16b8d73fc85a3ca0399ea6 ] Refactoring of the Atari floppy driver when converting to blk-mq has broken the state machine in not-so-subtle ways: finish_fdc() must be called when operations on the floppy device have completed. This is crucial in order to relase the ST-DMA lock, which protects against concurrent access to the ST-DMA controller by other drivers (some DMA related, most just related to device register access - broken beyond compare, I know). When rewriting the driver's old do_request() function, the fact that finish_fdc() was called only when all queued requests had completed appears to have been overlooked. Instead, the new request function calls finish_fdc() immediately after the last request has been queued. finish_fdc() executes a dummy seek after most requests, and this overwrites the state machine's interrupt hander that was set up to wait for completion of the read/write request just prior. To make matters worse, finish_fdc() is called before device interrupts are re-enabled, making certain that the read/write interupt is missed. Shifting the finish_fdc() call into the read/write request completion handler ensures the driver waits for the request to actually complete. With a queue depth of 2, we won't see long request sequences, so calling finish_fdc() unconditionally just adds a little overhead for the dummy seeks, and keeps the code simple. While we're at it, kill ataflop_commit_rqs() which does nothing but run finish_fdc() unconditionally, again likely wiping out an in-flight request. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: 6ec3938cff95 ("ataflop: convert to blk-mq") CC: linux-block@vger.kernel.org CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-16Revert "block: nbd: add sanity check for first_minor"Greg Kroah-Hartman
This reverts commit b3fa499d72a0db612f12645265a36751955c0037 which is commit b1a811633f7321cf1ae2bb76a66805b7720e44c9 upstream. The backport of this is reported to be causing some problems, so revert this for now until they are worked out. Link: https://lore.kernel.org/r/CACPK8XfUWoOHr-0RwRoYoskia4fbAbZ7DYf5wWBnv6qUnGq18w@mail.gmail.com Reported-by: Joel Stanley <joel@jms.id.au> Cc: Christoph Hellwig <hch@lst.de> Cc: Pavel Skripkin <paskripkin@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15block: nbd: add sanity check for first_minorPavel Skripkin
[ Upstream commit b1a811633f7321cf1ae2bb76a66805b7720e44c9 ] Syzbot hit WARNING in internal_create_group(). The problem was in too big disk->first_minor. disk->first_minor is initialized by value, which comes from userspace and there wasn't any sanity checks about value correctness. It can cause duplicate creation of sysfs files/links, because disk->first_minor will be passed to MKDEV() which causes truncation to byte. Since maximum minor value is 0xff, let's check if first_minor is correct minor number. NOTE: the root case of the reported warning was in wrong error handling in register_disk(), but we can avoid passing knowingly wrong values to sysfs API, because sysfs error messages can confuse users. For example: user passed 1048576 as index, but sysfs complains about duplicate creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0. Log and reproducer for above example can be found on syzkaller bug report page. Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1 Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-12cryptoloop: add a deprecation warningChristoph Hellwig
[ Upstream commit 222013f9ac30b9cec44301daa8dbd0aae38abffb ] Support for cryptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). Add a warning and a deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-03Revert "floppy: reintroduce O_NDELAY fix"Denis Efremov
commit c7e9d0020361f4308a70cdfd6d5335e273eb8717 upstream. The patch breaks userspace implementations (e.g. fdutils) and introduces regressions in behaviour. Previously, it was possible to O_NDELAY open a floppy device with no media inserted or with write protected media without an error. Some userspace tools use this particular behavior for probing. It's not the first time when we revert this patch. Previous revert is in commit f2791e7eadf4 (Revert "floppy: refactor open() flags handling"). This reverts commit 8a0c014cd20516ade9654fc13b51345ec58e7be8. Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@compro.net/ Reported-by: Mark Hounschell <markh@compro.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Wim Osterholt <wim@djo.tudelft.nl> Cc: Kurt Garloff <kurt@garloff.de> Cc: <stable@vger.kernel.org> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18nbd: Aovid double completion of a requestXie Yongji
[ Upstream commit cddce01160582a5f52ada3da9626c052d852ec42 ] There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-28rbd: always kick acquire on "acquired" and "released" notificationsIlya Dryomov
commit 8798d070d416d18a75770fc19787e96705073f43 upstream. Skipping the "lock has been released" notification if the lock owner is not what we expect based on owner_cid can lead to I/O hangs. One example is our own notifications: because owner_cid is cleared in rbd_unlock(), when we get our own notification it is processed as unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer that requested the lock then doesn't go through with acquiring it, I/O requests that came in while the lock was being quiesced would be stalled until another I/O request is submitted and kicks acquire from rbd_img_exclusive_lock(). This makes the comment in rbd_release_lock() actually true: prior to this change the canceled work was being requeued in response to the "lock has been acquired" notification from rbd_handle_acquired_lock(). Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-28rbd: don't hold lock_rwsem while running_list is being drainedIlya Dryomov
commit ed9eb71085ecb7ded9a5118cec2ab70667cc7350 upstream. Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking on releasing_wait completion. On the I/O completion side, each image request also needs to take lock_rwsem for read. Because rw_semaphore implementation doesn't allow new readers after a writer has indicated interest in the lock, this can result in a deadlock if something that needs to take lock_rwsem for write gets involved. For example: 1. watch error occurs 2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 3. after reestablishing the watch, rbd_reregister_watch() takes lock_rwsem for write and calls rbd_reacquire_lock() 4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on releasing_wait until running_list becomes empty 5. another watch error occurs 6. rbd_watch_errcb() blocks trying to take lock_rwsem for write 7. no in-flight image request can complete and delete itself from running_list because lock_rwsem won't be granted anymore A similar scenario can occur with "lock has been acquired" and "lock has been released" notification handers which also take lock_rwsem for write to update owner_cid. We don't actually get anything useful from sitting on lock_rwsem in rbd_quiesce_lock() -- owner_cid updates certainly don't need to be synchronized with. In fact the whole owner_cid tracking logic could probably be removed from the kernel client because we don't support proxied maintenance operations. Cc: stable@vger.kernel.org # 5.3+ URL: https://tracker.ceph.com/issues/42757 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20virtio-blk: Fix memory leak among suspend/resume procedureXie Yongji
[ Upstream commit b71ba22e7c6c6b279c66f53ee7818709774efa1f ] The vblk->vqs should be freed before we call init_vqs() in virtblk_restore(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084332.280-1-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19nbd: Fix NULL pointer in flush_workqueueSun Ke
[ Upstream commit 79ebe9110fa458d58f1fceb078e2068d7ad37390 ] Open /dev/nbdX first, the config_refs will be 1 and the pointers in nbd_device are still null. Disconnect /dev/nbdX, then reference a null recv_workq. The protection by config_refs in nbd_genl_disconnect is useless. [ 656.366194] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 656.368943] #PF: supervisor write access in kernel mode [ 656.369844] #PF: error_code(0x0002) - not-present page [ 656.370717] PGD 10cc87067 P4D 10cc87067 PUD 1074b4067 PMD 0 [ 656.371693] Oops: 0002 [#1] SMP [ 656.372242] CPU: 5 PID: 7977 Comm: nbd-client Not tainted 5.11.0-rc5-00040-g76c057c84d28 #1 [ 656.373661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 656.375904] RIP: 0010:mutex_lock+0x29/0x60 [ 656.376627] Code: 00 0f 1f 44 00 00 55 48 89 fd 48 83 05 6f d7 fe 08 01 e8 7a c3 ff ff 48 83 05 6a d7 fe 08 01 31 c0 65 48 8b 14 25 00 6d 01 00 <f0> 48 0f b1 55 d [ 656.378934] RSP: 0018:ffffc900005eb9b0 EFLAGS: 00010246 [ 656.379350] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 656.379915] RDX: ffff888104cf2600 RSI: ffffffffaae8f452 RDI: 0000000000000020 [ 656.380473] RBP: 0000000000000020 R08: 0000000000000000 R09: ffff88813bd6b318 [ 656.381039] R10: 00000000000000c7 R11: fefefefefefefeff R12: ffff888102710b40 [ 656.381599] R13: ffffc900005eb9e0 R14: ffffffffb2930680 R15: ffff88810770ef00 [ 656.382166] FS: 00007fdf117ebb40(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000 [ 656.382806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 656.383261] CR2: 0000000000000020 CR3: 0000000100c84000 CR4: 00000000000006e0 [ 656.383819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 656.384370] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 656.384927] Call Trace: [ 656.385111] flush_workqueue+0x92/0x6c0 [ 656.385395] nbd_disconnect_and_put+0x81/0xd0 [ 656.385716] nbd_genl_disconnect+0x125/0x2a0 [ 656.386034] genl_family_rcv_msg_doit.isra.0+0x102/0x1b0 [ 656.386422] genl_rcv_msg+0xfc/0x2b0 [ 656.386685] ? nbd_ioctl+0x490/0x490 [ 656.386954] ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0 [ 656.387354] netlink_rcv_skb+0x62/0x180 [ 656.387638] genl_rcv+0x34/0x60 [ 656.387874] netlink_unicast+0x26d/0x590 [ 656.388162] netlink_sendmsg+0x398/0x6c0 [ 656.388451] ? netlink_rcv_skb+0x180/0x180 [ 656.388750] ____sys_sendmsg+0x1da/0x320 [ 656.389038] ? ____sys_recvmsg+0x130/0x220 [ 656.389334] ___sys_sendmsg+0x8e/0xf0 [ 656.389605] ? ___sys_recvmsg+0xa2/0xf0 [ 656.389889] ? handle_mm_fault+0x1671/0x21d0 [ 656.390201] __sys_sendmsg+0x6d/0xe0 [ 656.390464] __x64_sys_sendmsg+0x23/0x30 [ 656.390751] do_syscall_64+0x45/0x70 [ 656.391017] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To fix it, just add if (nbd->recv_workq) to nbd_disconnect_and_put(). Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210512114331.1233964-2-sunke32@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14drivers/block/null_blk/main: Fix a double free in null_init.Lv Yunlong
[ Upstream commit 72ce11ddfa4e9e1879103581a60b7e34547eaa0a ] In null_init, null_add_dev(dev) is called. In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones via kvfree(dev->zones) in out_cleanup_zone branch and returns err. Then null_init accept the err code and then calls null_free_dev(dev). But in null_free_dev(dev), dev->zones is freed again by null_free_zoned_dev(). My patch set dev->zones to NULL in null_free_zoned_dev() after kvfree(dev->zones) is called, to avoid the double free. Fixes: 2984c8684f962 ("nullb: factor disk parameters") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14xen-blkback: fix compatibility bug with single page ringsPaul Durrant
[ Upstream commit d75e7f63b7c95c527cde42efb5d410d7f961498f ] Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the behaviour of xen-blkback when connecting to a frontend was: - read 'ring-page-order' - if not present then expect a single page ring specified by 'ring-ref' - else expect a ring specified by 'ring-refX' where X is between 0 and 1 << ring-page-order This was correct behaviour, but was broken by the afforementioned commit to become: - read 'ring-page-order' - if not present then expect a single page ring (i.e. ring-page-order = 0) - expect a ring specified by 'ring-refX' where X is between 0 and 1 << ring-page-order - if that didn't work then see if there's a single page ring specified by 'ring-ref' This incorrect behaviour works most of the time but fails when a frontend that sets 'ring-page-order' is unloaded and replaced by one that does not because, instead of reading 'ring-ref', xen-blkback will read the stale 'ring-ref0' left around by the previous frontend will try to map the wrong grant reference. This patch restores the original behaviour. Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210202175659.18452-1-paul@xen.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30xen-blkback: don't leak persistent grants from xen_blkbk_map()Jan Beulich
commit a846738f8c3788d846ed1f587270d2f2e3d32432 upstream. The fix for XSA-365 zapped too many of the ->persistent_gnt[] entries. Ones successfully obtained should not be overwritten, but instead left for xen_blkbk_unmap_prepare() to pick up and put. This is XSA-371. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17zram: fix return value on writeback_storeMinchan Kim
commit 57e0076e6575a7b7cef620a0bd2ee2549ef77818 upstream. writeback_store's return value is overwritten by submit_bio_wait's return value. Thus, writeback_store will return zero since there was no IO error. In the end, write syscall from userspace will see the zero as return value, which could make the process stall to keep trying the write until it will succeed. Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17block: rsxx: fix error return code of rsxx_pci_probe()Jia-Ju Bai
[ Upstream commit df66617bfe87487190a60783d26175b65d2502ce ] When create_singlethread_workqueue returns NULL to card->event_wq, no error return code of rsxx_pci_probe() is assigned. To fix this bug, st is assigned with -ENOMEM in this case. Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Link: https://lore.kernel.org/r/20210310033017.4023-1-baijiaju1990@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-09rsxx: Return -EFAULT if copy_to_user() failsDan Carpenter
[ Upstream commit 77516d25f54912a7baedeeac1b1b828b6f285152 ] The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT to the user if it can't complete the copy. The "st" variable only holds zero on success or negative error codes on failure so the type should be int. Fixes: 36f988e978f8 ("rsxx: Adding in debugfs entries.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07zsmalloc: account the number of compacted pages correctlyRokudo Yan
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream. There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07nbd: handle device refs for DESTROY_ON_DISCONNECT properlyJosef Bacik
commit c9a2f90f4d6b9d42b9912f7aaf68e8d748acfffd upstream. There exists a race where we can be attempting to create a new nbd configuration while a previous configuration is going down, both configured with DESTROY_ON_DISCONNECT. Normally devices all have a reference of 1, as they won't be cleaned up until the module is torn down. However with DESTROY_ON_DISCONNECT we'll make sure that there is only 1 reference (generally) on the device for the config itself, and then once the config is dropped, the device is torn down. The race that exists looks like this TASK1 TASK2 nbd_genl_connect() idr_find() refcount_inc_not_zero(nbd) * count is 2 here ^^ nbd_config_put() nbd_put(nbd) (count is 1) setup new config check DESTROY_ON_DISCONNECT put_dev = true if (put_dev) nbd_put(nbd) * free'd here ^^ In nbd_genl_connect() we assume that the nbd ref count will be 2, however clearly that won't be true if the nbd device had been setup as DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting rid of the runtime flag to check if we need to mess with the nbd device refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if we need to adjust the ref counts. This was reported by syzkaller with the following kasan dump BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451 CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115 nbd_put drivers/block/nbd.c:248 [inline] nbd_release+0x116/0x190 drivers/block/nbd.c:1508 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc1e92b5270 Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270 RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008 R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e Allocated by task 1: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:401 [inline] ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:682 [inline] nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673 nbd_init+0x250/0x271 drivers/block/nbd.c:2394 do_one_initcall+0x103/0x650 init/main.c:1223 do_initcall_level init/main.c:1296 [inline] do_initcalls init/main.c:1312 [inline] do_basic_setup init/main.c:1332 [inline] kernel_init_freeable+0x605/0x689 init/main.c:1533 kernel_init+0xd/0x1b8 init/main.c:1421 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Freed by task 8451: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356 ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362 kasan_slab_free include/linux/kasan.h:192 [inline] slab_free_hook mm/slub.c:1547 [inline] slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580 slab_free mm/slub.c:3143 [inline] kfree+0xdb/0x3b0 mm/slub.c:4139 nbd_dev_remove drivers/block/nbd.c:243 [inline] nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251 nbd_put drivers/block/nbd.c:295 [inline] nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242 nbd_release+0x103/0x190 drivers/block/nbd.c:1507 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888143bf7000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 416 bytes inside of 1024-byte region [ffff888143bf7000, ffff888143bf7400) The buggy address belongs to the page: page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0 head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab|head) raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-and-tested-by: syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04floppy: reintroduce O_NDELAY fixJiri Kosina
commit 8a0c014cd20516ade9654fc13b51345ec58e7be8 upstream. This issue was originally fixed in 09954bad4 ("floppy: refactor open() flags handling"). The fix as a side-effect, however, introduce issue for open(O_ACCMODE) that is being used for ioctl-only open. I wrote a fix for that, but instead of it being merged, full revert of 09954bad4 was performed, re-introducing the O_NDELAY / O_NONBLOCK issue, and it strikes again. This is a forward-port of the original fix to current codebase; the original submission had the changelog below: ==== Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2101221209060.5622@cbobk.fhfr.pm Cc: stable@vger.kernel.org Reported-by: Wim Osterholt <wim@djo.tudelft.nl> Tested-by: Wim Osterholt <wim@djo.tudelft.nl> Reported-and-tested-by: Kurt Garloff <kurt@garloff.de> Fixes: 09954bad4 ("floppy: refactor open() flags handling") Fixes: f2791e7ead ("Revert "floppy: refactor open() flags handling"") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23xen-blkback: fix error handling in xen_blkbk_map()Jan Beulich
commit 871997bc9e423f05c7da7c9178e62dde5df2a7f8 upstream. The function uses a goto-based loop, which may lead to an earlier error getting discarded by a later iteration. Exit this ad-hoc loop when an error was encountered. The out-of-memory error path additionally fails to fill a structure field looked at by xen_blkbk_unmap_prepare() before inspecting the handle which does get properly set (to BLKBACK_INVALID_HANDLE). Since the earlier exiting from the ad-hoc loop requires the same field filling (invalidation) as that on the out-of-memory path, fold both paths. While doing so, drop the pr_alert(), as extra log messages aren't going to help the situation (the kernel will log oom conditions already anyway). This is XSA-365. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23xen-blkback: don't "handle" error by BUG()Jan Beulich
commit 5a264285ed1cd32e26d9de4f3c8c6855e467fd63 upstream. In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03xen-blkfront: allow discard-* nodes to be optionalRoger Pau Monne
commit 0549cd67b01016b579047bce045b386202a8bcfc upstream. This is inline with the specification described in blkif.h: * discard-granularity: should be set to the physical block size if node is not present. * discard-alignment, discard-secure: should be set to 0 if node not present. This was detected as QEMU would only create the discard-granularity node but not discard-alignment, and thus the setup done in blkfront_setup_discard would fail. Fix blkfront_setup_discard to not fail on missing nodes, and also fix blkif_set_queue_limits to set the discard granularity to the physical block size if none is specified in xenbus. Fixes: ed30bf317c5ce ('xen-blkfront: Handle discard requests.') Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-By: Arthur Borsboom <arthurborsboom@gmail.com> Link: https://lore.kernel.org/r/20210119105727.95173-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03nbd: freeze the queue while we're adding connectionsJosef Bacik
commit b98e762e3d71e893b221f871825dc64694cfb258 upstream. When setting up a device, we can krealloc the config->socks array to add new sockets to the configuration. However if we happen to get a IO request in at this point even though we aren't setup we could hit a UAF, as we deref config->socks without any locking, assuming that the configuration was setup already and that ->socks is safe to access it as we have a reference on the configuration. But there's nothing really preventing IO from occurring at this point of the device setup, we don't want to incur the overhead of a lock to access ->socks when it will never change while the device is running. To fix this UAF scenario simply freeze the queue if we are adding sockets. This will protect us from this particular case without adding any additional overhead for the normal running case. Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17block: rsxx: select CONFIG_CRC32Arnd Bergmann
commit 36a106a4c1c100d55ba3d32a21ef748cfcd4fa99 upstream. Without crc32, the driver fails to link: arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config': config.c:(.text+0x124): undefined reference to `crc32_le' Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-06null_blk: Fix zone size initializationDamien Le Moal
commit 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 upstream. For a null_blk device with zoned mode enabled is currently initialized with a number of zones equal to the device capacity divided by the zone size, without considering if the device capacity is a multiple of the zone size. If the zone size is not a divisor of the capacity, the zones end up not covering the entire capacity, potentially resulting is out of bounds accesses to the zone array. Fix this by adding one last smaller zone with a size equal to the remainder of the disk capacity divided by the zone size if the capacity is not a multiple of the zone size. For such smaller last zone, the zone capacity is also checked so that it does not exceed the smaller zone size. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Fixes: ca4b2a011948 ("null_blk: add zone support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()SeongJae Park
commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream. Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen-blkback: set ring->xenblkd to NULL after kthread_stop()Pawel Wieczorkiewicz
commit 1c728719a4da6e654afb9cc047164755072ed7c9 upstream. When xen_blkif_disconnect() is called, the kernel thread behind the block interface is stopped by calling kthread_stop(ring->xenblkd). The ring->xenblkd thread pointer being non-NULL determines if the thread has been already stopped. Normally, the thread's function xen_blkif_schedule() sets the ring->xenblkd to NULL, when the thread's main loop ends. However, when the thread has not been started yet (i.e. wake_up_process() has not been called on it), the xen_blkif_schedule() function would not be called yet. In such case the kthread_stop() call returns -EINTR and the ring->xenblkd remains dangling. When this happens, any consecutive call to xen_blkif_disconnect (for example in frontend_changed() callback) leads to a kernel crash in kthread_stop() (e.g. NULL pointer dereference in exit_creds()). This is XSA-350. Cc: <stable@vger.kernel.org> # 4.12 Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread") Reported-by: Olivier Benjamin <oliben@amazon.com> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18nbd: fix a block_device refcount leak in nbd_releaseChristoph Hellwig
[ Upstream commit 2bd645b2d3f0bacadaa6037f067538e1cd4e42ef ] bdget_disk needs to be paired with bdput to not leak a reference on the block device inode. Fixes: 08ba91ee6e2c ("nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-18nbd: don't update block size after device is startedMing Lei
[ Upstream commit b40813ddcd6bf9f01d020804e4cb8febc480b9e4 ] Mounted NBD device can be resized, one use case is rbd-nbd. Fix the issue by setting up default block size, then not touch it in nbd_size_update() any more. This kind of usage is aligned with loop which has same use case too. Cc: stable@vger.kernel.org Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize") Reported-by: lining <lining2020x@163.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jan Kara <jack@suse.cz> Tested-by: lining <lining2020x@163.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05nbd: make the config put is called before the notifying the waiterXiubo Li
[ Upstream commit 87aac3a80af5cbad93e63250e8a1e19095ba0d30 ] There has one race case for ceph's rbd-nbd tool. When do mapping it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually the nbd device has already unmaped. It dues to if just after the wake_up(), the recv_work() is scheduled out and defers calling the nbd_config_put(), though the map process has exited the "nbd->recv_task" is not cleared. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05xen/blkback: use lateeoi irq bindingJuergen Gross
commit 01263a1fabe30b4d542f34c7e2364a22587ddaf2 upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-17rbd: require global CAP_SYS_ADMIN for mapping and unmappingIlya Dryomov
commit f44d04e696feaf13d192d942c4f14ad2e117065a upstream. It turns out that currently we rely only on sysfs attribute permissions: $ ll /sys/bus/rbd/{add*,remove*} --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove --w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_major This means that images can be mapped and unmapped (i.e. block devices can be created and deleted) by a UID 0 process even after it drops all privileges or by any process with CAP_DAC_OVERRIDE in its user namespace as long as UID 0 is mapped into that user namespace. Be consistent with other virtual block devices (loop, nbd, dm, md, etc) and require CAP_SYS_ADMIN in the initial user namespace for mapping and unmapping, and also for dumping the configuration string and refreshing the image header. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09nbd: restore default timeout when setting it to zeroHou Pu
[ Upstream commit acb19e17c5134dd78668c429ecba5b481f038e6a ] If we configured io timeout of nbd0 to 100s. Later after we finished using it, we configured nbd0 again and set the io timeout to 0. We expect it would timeout after 30 seconds and keep retry. But in fact we could not change the timeout when we set it to 0. the timeout is still the original 100s. So change the timeout to default 30s when we set it to zero. It also behaves same as commit 2da22da57348 ("nbd: fix zero cmd timeout handling v2"). It becomes more important if we were reconfigure a nbd device and the io timeout it set to zero. Because it could take 30s to detect the new socket and thus io could be completed more quickly compared to 100s. Signed-off-by: Hou Pu <houpu@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03block: loop: set discard granularity and alignment for block device backed loopMing Lei
commit bcb21c8cc9947286211327d663ace69f07d37a76 upstream. In case of block device backend, if the backend supports write zeros, the loop device will set queue flag of QUEUE_FLAG_DISCARD. However, limits.discard_granularity isn't setup, and this way is wrong, see the following description in Documentation/ABI/testing/sysfs-block: A discard_granularity of 0 means that the device does not support discard functionality. Especially 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()") starts to take q->limits.discard_granularity for computing max discard sectors. And zero discard granularity may cause kernel oops, or fail discard request even though the loop queue claims discard support via QUEUE_FLAG_DISCARD. Fix the issue by setup discard granularity and alignment. Fixes: c52abf563049 ("loop: Better discard support for block devices") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Xiao Ni <xni@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Evan Green <evgreen@chromium.org> Cc: Gwendal Grignou <gwendal@chromium.org> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03null_blk: fix passing of REQ_FUA flag in null_handle_rqHou Pu
[ Upstream commit 2d62e6b038e729c3e4bfbfcfbd44800ef0883680 ] REQ_FUA should be checked using rq->cmd_flags instead of req_op(). Fixes: deb78b419dfda ("nullb: emulate cache") Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03block: virtio_blk: fix handling single range discard requestMing Lei
[ Upstream commit af822aa68fbdf0a480a17462ed70232998127453 ] 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts to support multi-range discard for virtio-blk. However, the virtio-blk disk may report max discard segment as 1, at least that is exactly what qemu is doing. So far, block layer switches to normal request merge if max discard segment limit is 1, and multiple bios can be merged to single segment. This way may cause memory corruption in virtblk_setup_discard_write_zeroes(). Fix the issue by handling single max discard segment in straightforward way. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Changpeng Liu <changpeng.liu@intel.com> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>