Age | Commit message (Collapse) | Author |
|
commit 1e37f2f84680fa7f8394fd444b6928e334495ccc upstream.
rbd_img_obj_exists_submit() and rbd_img_obj_parent_read_full() are on
the writeback path for cloned images -- we attempt a stat on the parent
object to see if it exists and potentially read it in to call copyup.
GFP_NOIO should be used instead of GFP_KERNEL here.
Link: http://tracker.ceph.com/issues/22014
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5fbd545cd3fd311ea1d6e8be4cedddd0ee5684c7 upstream.
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7277cc67b3916eed47558c64f9c9c0de00a35cda upstream.
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
dump_stack+0x63/0x84
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
refcount_sub_and_test+0x70/0x80
refcount_dec_and_test+0x11/0x20
kobject_put+0x1f/0x50
blk_put_queue+0x15/0x20
disk_release+0xae/0xf0
device_release+0x32/0x90
kobject_release+0x67/0x170
kobject_put+0x2b/0x50
put_disk+0x17/0x20
skd_destruct+0x5c/0x890 [skd]
skd_pci_probe+0x124d/0x13a0 [skd]
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x19e/0x470
worker_thread+0x1dc/0x4a0
kthread+0x125/0x140
ret_from_fork+0x25/0x30
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b15bd8cb37598afb2963f7eb9e2de468d2d60a2f upstream.
Since commit d05d7f40791c ("Merge branch 'for-4.8/core' of
git://git.kernel.dk/linux-block") and 3fc9d690936f ("Merge branch
'for-4.8/drivers' of git://git.kernel.dk/linux-block"), blkfront_resume()
has been using an index for iterating ring_info to check request when
iterating blk_shadow in an inner loop. This seems to have been
accidentally introduced during the massive rewrite of the block layer
macros in the commits.
This may cause crash like this:
[11798.057074] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[11798.058832] IP: [<ffffffff814411fa>] blkfront_resume+0x10a/0x610
....
[11798.061063] Call Trace:
[11798.061063] [<ffffffff8139ce93>] xenbus_dev_resume+0x53/0x140
[11798.061063] [<ffffffff8139ce40>] ? xenbus_dev_probe+0x150/0x150
[11798.061063] [<ffffffff813f359e>] dpm_run_callback+0x3e/0x110
[11798.061063] [<ffffffff813f3a08>] device_resume+0x88/0x190
[11798.061063] [<ffffffff813f4cc0>] dpm_resume+0x100/0x2d0
[11798.061063] [<ffffffff813f5221>] dpm_resume_end+0x11/0x20
[11798.061063] [<ffffffff813950a8>] do_suspend+0xe8/0x1a0
[11798.061063] [<ffffffff813954bd>] shutdown_handler+0xfd/0x130
[11798.061063] [<ffffffff8139aba0>] ? split+0x110/0x110
[11798.061063] [<ffffffff8139ac26>] xenwatch_thread+0x86/0x120
[11798.061063] [<ffffffff810b4570>] ? prepare_to_wait_event+0x110/0x110
[11798.061063] [<ffffffff8108fe57>] kthread+0xd7/0xf0
[11798.061063] [<ffffffff811da811>] ? kfree+0x121/0x170
[11798.061063] [<ffffffff8108fd80>] ? kthread_park+0x60/0x60
[11798.061063] [<ffffffff810863b0>] ? call_usermodehelper_exec_work+0xb0/0xb0
[11798.061063] [<ffffffff810864ea>] ? call_usermodehelper_exec_async+0x13a/0x140
[11798.061063] [<ffffffff81534a45>] ret_from_fork+0x25/0x30
Use the right index in the inner loop.
Fixes: d05d7f40791c ("Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block")
Fixes: 3fc9d690936f ("Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6bf6b0aa3da84a3d9126919a94c49c0fb7ee2fb3 ]
If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 25b4acfc7de0fc4da3bfea3a316f7282c6fbde81 ]
Additionally, don't assign directly to disk->queue, otherwise
blk_put_queue (called via put_disk) will choke (panic) on the errno
stored there.
Bug found by code inspection after Omar found a similar issue in
virtio_blk. Compile-tested only.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71df1d7ccad1c36f7321d6b3b48f2ea42681c363 upstream.
The be structure must not be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a24fa22ce22ae302b3bf8f7008896d52d5d57b8d upstream.
There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.
Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 089bc0143f489bd3a4578bdff5f4ca68fb26f341 upstream.
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other backends do.
Build on the fact that all response structure flavors are actually
identical (the old code did make this assumption too).
This is XSA-216.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 46464411307746e6297a034a9983a22c9dfc5a0c upstream.
Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.
To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.
This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e88f72cb9f54f6d244e55f629fe5e2f34ca6f9ed upstream.
We have this:
ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
nbd.c:(.text+0x247c72): undefined reference to `__divdi3'
due to a recent commit, that did 64-bit division. Use the proper
divider function so that 32-bit compiles don't break.
Fixes: ef77b515243b ("nbd: use loff_t for blocksize and nbd_set_size args")
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ef77b515243b3499d62cf446eda6ca7e0a0b079c upstream.
If we have large devices (say like the 40t drive I was trying to test with) we
will end up overflowing the int arguments to nbd_set_size and not get the right
size for our device. Fix this by using loff_t everywhere so I don't have to
think about this again. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e upstream.
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e02898b423802b1f3a3aaa7f16e896da069ba8f7 upstream.
loop_reread_partitions() needs to do I/O, but we just froze the queue,
so we end up waiting forever. This can easily be reproduced with losetup
-P. Fix it by moving the reread to after we unfreeze the queue.
Fixes: ecdd09597a57 ("block/loop: fix race between I/O and set_status")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ecdd09597a57251323b0de50e3d45e69298c4a83 upstream.
Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:
divide error: 0000 [#1] SMP KASAN
CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
task: ffff88006ba1e840 task.stack: ffff880067338000
RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
RSP: 0018:ffff88006733f108 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88006d000000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
Call Trace:
lo_do_transfer drivers/block/loop.c:251 [inline]
lo_read_transfer drivers/block/loop.c:392 [inline]
do_req_filebacked drivers/block/loop.c:541 [inline]
loop_handle_cmd drivers/block/loop.c:1677 [inline]
loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
ffff88006733f108
---[ end trace 0166f7bd3b0c0933 ]---
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a14d749fcebe97ddf6af6db3d1f6ece85c9ddcb9 upstream.
Most users of BLOCK_PC requests allocate the sense buffer on the stack,
so to avoid DMA to the stack copy them to a field in the heap allocated
virtblk_req structure. Without that any attempt at SCSI passthrough I/O,
including the SG_IO ioctl from userspace will crash the kernel. Note that
this includes running tools like hdparm even when the host does not have
SCSI passthrough enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b09ab054b69b07077bd3292f67e777861ac796e5 upstream.
zram has used per-cpu stream feature from v4.7. It aims for increasing
cache hit ratio of scratch buffer for compressing. Downside of that
approach is that zram should ask memory space for compressed page in
per-cpu context which requires stricted gfp flag which could be failed.
If so, it retries to allocate memory space out of per-cpu context so it
could get memory this time and compress the data again, copies it to the
memory space.
In this scenario, zram assumes the data should never be changed but it is
not true without stable page support. So, If the data is changed under
us, zram can make buffer overrun so that zsmalloc free object chain is
broken so system goes crash like below
https://bugzilla.suse.com/show_bug.cgi?id=997574
This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
device needing *stable write*".
Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7ccfc4ccb703e0f033bd4617580039898e912dd upstream.
Commit b4c5c60920e3 ("zram: avoid lockdep splat by revalidate_disk")
moved revalidate_disk call out of init_lock to avoid lockdep
false-positive splat. However, commit 08eee69fcf6b ("zram: remove
init_lock in zram_make_request") removed init_lock in IO path so there
is no worry about lockdep splat. So, let's restore it.
This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
patch.
Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b4a567e8114327518c09f5632339a5954ab975a3 upstream.
->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not
an errno.
Fixes: f4aa4c7bbac6 ("block: loop: convert to per-device workqueue")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
zram hot_add sysfs attribute is a very 'special' attribute - reading
from it creates a new uninitialized zram device. This file, by a
mistake, can be read by a 'normal' user at the moment, while only root
must be able to create a new zram device, therefore hot_add attribute
must have S_IRUSR mode, not S_IRUGO.
[akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
Fixes: 6566d1a32bf72 ("zram: add dynamic device add/remove functionality")
Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Allen <steven@stebalien.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY). This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.
As described in the bug report below, the following procedure would
cause an Oops with zram:
- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
+ echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
+ echo 2 > /sys/class/zram-control/hot_remove
+ echo 1 > /sys/class/zram-control/hot_remove
+ echo 0 > /sys/class/zram-control/hot_remove
- zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
+ echo 0 > /sys/class/zram-control/hot_remove
- fails with ENODEV (unexpected)
- unload zram kernel module
+ completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
+ mount command is killed
+ following BUG is encountered
BUG: unable to handle kernel paging request at ffffffffa0002ba0
IP: get_disk+0x16/0x50
Oops: 0000 [#1] SMP
CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
Call Trace:
exact_lock+0xc/0x20
kobj_lookup+0xdc/0x160
get_gendisk+0x2f/0x110
__blkdev_get+0x10c/0x3c0
blkdev_get+0x19d/0x2e0
blkdev_open+0x56/0x70
do_dentry_open.isra.19+0x1ff/0x310
vfs_open+0x43/0x60
path_openat+0x2c9/0xf30
do_filp_open+0x79/0xd0
do_sys_open+0x114/0x1e0
SyS_open+0x19/0x20
entry_SYSCALL_64_fastpath+0x13/0x94
This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.
Fixes: 17ec4cd98578 ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reported-by: David Disseldorp <ddiss@suse.de>
Tested-by: David Disseldorp <ddiss@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
aoeblk contains some mysterious code, that wants to elevate the bio
vec page counts while it's under IO. That is not needed, it's
fragile, and it's causing kernel oopses for some.
Reported-by: Tested-by: Don Koch <kochd@us.ibm.com>
Tested-by: Tested-by: Don Koch <kochd@us.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.
DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
tcp_sendmsg() used to totally ignore the size parameter.
57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.
Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time". Apparently that was unlikely enough for most, so this went
unnoticed for years.
Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/
This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.
It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).
It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().
Fixes: b411b3637fa7 ("The DRBD driver")
Cc: <stable@vger.kernel.org> # 2.6.33.x-
Cc: viro@zeniv.linux.org.uk
Cc: christoph.lechleitner@iteg.at
Cc: wolfgang.glas@iteg.at
Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
[changed oneliner to be "obvious" without context; more verbose message]
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
'blk_mq_alloc_request()' returns an error pointer in case of error, not
NULL. So test it with IS_ERR.
Fixes: fd8383fd88a2 ("nbd: convert to blkmq")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The local variable "err" will be set to an appropriate value
by a following statement.
Thus omit the explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.
Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 0eadf37afc250 ("nbd: allow block mq to deal with timeouts")
changed normal usage of nbd->sock_lock to use spin_lock/spin_unlock
rather than the *_irq variants, but it missed this unlock in an
error path.
Found by Coverity, CID 1373871.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Fixes: 0eadf37afc250 ("nbd: allow block mq to deal with timeouts")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If the header object gets deleted (perhaps along with the entire pool),
there is no point in attempting to reregister the watch. Treat this
the same as blacklisting: fail all pending and new I/Os requiring the
lock.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
-EBLACKLISTED from __rbd_register_watch() means that our ceph_client
got blacklisted - we won't be able to restore the watch and reacquire
the lock. Wake up and fail all outstanding requests waiting for the
lock and arrange for all new requests that require the lock to fail
immediately.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Mike Christie <mchristi@redhat.com>
|
|
A good practice is to prefix the names of functions by the name
of the subsystem.
The kthread worker API is a mix of classic kthreads and workqueues. Each
worker has a dedicated kthread. It runs a generic function that process
queued works. It is implemented as part of the kthread subsystem.
This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:
__init_kthread_worker() -> __kthread_init_worker()
init_kthread_worker() -> kthread_init_worker()
init_kthread_work() -> kthread_init_work()
insert_kthread_work() -> kthread_insert_work()
queue_kthread_work() -> kthread_queue_work()
flush_kthread_work() -> kthread_flush_work()
flush_kthread_worker() -> kthread_flush_worker()
Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.
Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:
+ "init" in the function names stands for the verb "initialize"
aka "initialize worker". While "INIT" in the macro names
stands for the noun "INITIALIZER" aka "worker initializer".
+ INIT() macros are used only in DEFINE() macros
+ init() functions are used close to the other kthread()
functions. It looks much better if all the functions
use the same scheme.
+ There will be also kthread_destroy_worker() that will
be used close to kthread_cancel_work(). It is related
to the init() function. Again it looks better if all
functions use the same naming scheme.
+ there are several precedents for such init() function
names, e.g. amd_iommu_init_device(), free_area_init_node(),
jump_label_init_type(), regmap_init_mmio_clk(),
+ It is not an argument but it was inconsistent even before.
[arnd@arndb.de: fix linux-next merge conflict]
Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull Ceph updates from Ilya Dryomov:
"The big ticket item here is support for rbd exclusive-lock feature,
with maintenance operations offloaded to userspace (Douglas Fuller,
Mike Christie and myself). Another block device bullet is a series
fixing up layering error paths (myself).
On the filesystem side, we've got patches that improve our handling of
buffered vs dio write races (Neil Brown) and a few assorted fixes from
Zheng. Also included a couple of random cleanups and a minor CRUSH
update"
* tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits)
crush: remove redundant local variable
crush: don't normalize input of crush_ln iteratively
libceph: ceph_build_auth() doesn't need ceph_auth_build_hello()
libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello()
ceph: fix description for rsize and rasize mount options
rbd: use kmalloc_array() in rbd_header_from_disk()
ceph: use list_move instead of list_del/list_add
ceph: handle CEPH_SESSION_REJECT message
ceph: avoid accessing / when mounting a subpath
ceph: fix mandatory flock check
ceph: remove warning when ceph_releasepage() is called on dirty page
ceph: ignore error from invalidate_inode_pages2_range() in direct write
ceph: fix error handling of start_read()
rbd: add rbd_obj_request_error() helper
rbd: img_data requests don't own their page array
rbd: don't call rbd_osd_req_format_read() for !img_data requests
rbd: rework rbd_img_obj_exists_submit() error paths
rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()
rbd: move bumping img_request refcount into rbd_obj_request_submit()
rbd: mark the original request as done if stat request fails
...
|
|
Pull blk-mq irq/cpu mapping updates from Jens Axboe:
"This is the block-irq topic branch for 4.9-rc. It's mostly from
Christoph, and it allows drivers to specify their own mappings, and
more importantly, to share the blk-mq mappings with the IRQ affinity
mappings. It's a good step towards making this work better out of the
box"
* 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
blk_mq: linux/blk-mq.h does not include all the headers it depends on
blk-mq: kill unused blk_mq_create_mq_map()
blk-mq: get rid of the cpumask in struct blk_mq_tags
nvme: remove the post_scan callout
nvme: switch to use pci_alloc_irq_vectors
blk-mq: provide a default queue mapping for PCI device
blk-mq: allow the driver to pass in a queue mapping
blk-mq: remove ->map_queue
blk-mq: only allocate a single mq_map per tag_set
blk-mq: don't redistribute hardware queues on a CPU hotplug event
|
|
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Delete the local variable "size" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Pull setting an error and marking a request done code into a new
helper. obj_request_img_data_test() check isn't strictly needed right
now, but makes it applicable to !img_data requests and a bit safer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Move the check into rbd_obj_request_destroy() to avoid use-after-free
on errors in rbd_img_request_fill(..., OBJ_REQUEST_PAGES, ...), where
pages, owned by the caller, gets freed in rbd_img_request_fill().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
Accessing obj_request->img_request union field is only valid for object
requests associated with an image (i.e. if obj_request_img_data_test()
returns true). rbd_osd_req_format_read() used to do more, but now it
just sets osd_req->snap_id. Standalone and stat object requests always
go to the HEAD revision and are fine with CEPH_NOSNAP set by libceph,
so get around the invalid union field use by simply not calling
rbd_osd_req_format_read() in those places.
Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
- don't put obj_request before rbd_obj_request_get() if
rbd_obj_request_create() fails
- don't leak pages if rbd_obj_request_create() fails
- don't leak stat_request if rbd_osd_req_create() fails
Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
- fix parent_length == img_request->xferred assert to not fire on
copyup read failures
- don't leak pages if copyup read fails or we can't allocate a new osd
request
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
Commit 0f2d5be792b0 ("rbd: use reference counts for image requests")
added rbd_img_request_get(), which rbd_img_request_fill() calls for
each obj_request added to img_request. It was an urgent band-aid for
the uglyness that is rbd_img_obj_callback() and none of the error paths
were updated.
Given that this img_request reference is meant to represent an
obj_request that hasn't passed through rbd_img_obj_callback() yet,
proper cleanup in appropriate destructors is a challenge. However,
noting that if we don't get a chance to call rbd_obj_request_complete(),
there is not going to be a call to rbd_img_obj_callback(), we can move
rbd_img_request_get() into rbd_obj_request_submit() and fixup the two
places that call rbd_obj_request_complete() directly and not through
rbd_obj_request_submit() to temporarily bump img_request, so that
rbd_img_obj_callback() can put as usual.
This takes care of img_request leaks on errors on the submit side.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
If stat request fails with something other than -ENOENT (which just
means that we need to copyup), the original object request is never
marked as done and therefore never completed. Fix this by moving the
mark done + complete snippet from rbd_img_obj_parent_read_full() into
rbd_img_obj_exists_callback(). The former remains covered, as the
latter is its only caller (through rbd_img_obj_request_submit()).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
Assert once in rbd_img_obj_request_submit().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
- osdc parameter is useless
- starting with commit 5aea3dcd5021 ("libceph: a major OSD client
update"), ceph_osdc_start_request() always returns success
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: David Disseldorp <ddiss@suse.de>
|
|
Add a per-device option to acquire exclusive lock on reads (in addition
to writes and discards). The use case is iSCSI, where it will be used
to prevent execution of stale writes after the implicit failover.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Mike Christie <mchristi@redhat.com>
|
|
We take a mutex when sending commands and send stuff over the network, we need
to have queue_rq called asynchronously.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Fixes: fd8383fd88a2 ("nbd: convert to blkmq")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
LightNVM compatible device drivers does not have a method to expose
LightNVM specific sysfs entries.
To enable LightNVM sysfs entries to be exposed, lightnvm device
drivers require a struct device to attach it to. To allow both the
actual device driver and lightnvm sysfs entries to coexist, the device
driver tracks the lifetime of the nvm_dev structure.
This patch refactors NVMe and null_blk to handle the lifetime of struct
nvm_dev, which eliminates the need for struct gendisk when a lightnvm
compatible device is provided.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
With LightNVM enabled devices, the gendisk structure is not exposed
to the user. This hides the device driver specific sysfs entries, and
prevents binding of LightNVM geometry information to the device.
Refactor the device registration process, so that gendisk and
non-gendisk devices are easily managed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.
This provides better code generation, and better debugability as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Instead of rolling our own timer, just utilize the blk mq req timeout and do the
disconnect if any of our commands timeout.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In preparation for some future changes, change a few of the state bools over to
normal bits to set/clear properly.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|