linux-toradex.git/drivers/block, branch v3.15.5

zram: revalidate disk after capacity change

2014-07-09T18:21:30+00:00

commit 2e32baea46ce542c561a519414c840295b229c8f upstream.

Alexander reported mkswap on /dev/zram0 is failed if other process is
opening the block device file.

Step is as follows,

0. Reset the unused zram device.
1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
   until killed.
2. While that program sleeps, echo the correct value to
   /sys/block/zram0/disksize.
3. Verify (e.g. in /proc/partitions) that the disk size is applied
   correctly. It is.
4. While that program still sleeps, attempt to mkswap /dev/zram0.
   This fails: mkswap: error: swap area needs to be at least 40 KiB

When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
mkswap to get a size of blockdev was zero although zram0 has right size by
2.

The reason is zram didn't revalidate disk after changing capacity so that
size of blockdev's inode is not uptodate until all of file is close.

This patch should fix the BUG.

Signed-off-by: Minchan Kim 
Reported-by: Alexander E. Patrakov 
Tested-by: Alexander E. Patrakov 
Reviewed-by: Sergey Senozhatsky 
Cc: Nitin Gupta 
Acked-by: Jerome Marchand 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

rbd: handle parent_overlap on writes correctly

2014-07-09T18:21:28+00:00

commit 9638556a276125553549fdfe349c464481ec2f39 upstream.

The following check in rbd_img_obj_request_submit()

    rbd_dev->parent_overlap <= obj_request->img_offset

allows the fall through to the non-layered write case even if both
parent_overlap and obj_request->img_offset belong to the same RADOS
object.  This leads to data corruption, because the area to the left of
parent_overlap ends up unconditionally zero-filled instead of being
populated with parent data.  Suppose we want to write 1M to offset 6M
of image bar, which is a clone of foo@snap; object_size is 4M,
parent_overlap is 5M:

    rbd_data..0000000000000001
     ---------------------|----------------------|------------
    | should be copyup'ed | should be zeroed out | write ...
     ---------------------|----------------------|------------
   4M                    5M                     6M
                    parent_overlap    obj_request->img_offset

4..5M should be copyup'ed from foo, yet it is zero-filled, just like
5..6M is.

Given that the only striping mode kernel client currently supports is
chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
parent_overlap up to the next object boundary for the purposes of the
overlap check.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin 
Signed-off-by: Greg Kroah-Hartman

rbd: use reference counts for image requests

2014-07-09T18:21:28+00:00

commit 0f2d5be792b0466b06797f637cfbb0f64dbb408c upstream.

Each image request contains a reference count, but to date it has
not actually been used.  (I think this was just an oversight.) A
recent report involving rbd failing an assertion shed light on why
and where we need to use these reference counts.

Every OSD request associated with an object request uses
rbd_osd_req_callback() as its callback function.  That function will
call a helper function (dependent on the type of OSD request) that
will set the object request's "done" flag if the object request if
appropriate.  If that "done" flag is set, the object request is
passed to rbd_obj_request_complete().

In rbd_obj_request_complete(), requests are processed in sequential
order.  So if an object request completes before one of its
predecessors in the image request, the completion is deferred.
Otherwise, if it's a completing object's "turn" to be completed, it
is passed to rbd_img_obj_end_request(), which records the result of
the operation, accumulates transferred bytes, and so on.  Next, the
successor to this request is checked and if it is marked "done",
(deferred) completion processing is performed on that request, and
so on.  If the last object request in an image request is completed,
rbd_img_request_complete() is called, which (typically) destroys
the image request.

There is a race here, however.  The instant an object request is
marked "done" it can be provided (by a thread handling completion of
one of its predecessor operations) to rbd_img_obj_end_request(),
which (for the last request) can then lead to the image request
getting torn down.  And this can happen *before* that object has
itself entered rbd_img_obj_end_request().  As a result, once it
*does* enter that function, the image request (and even the object
request itself) may have been freed and become invalid.

All that's necessary to avoid this is to properly count references
to the image requests.  We tear down an image request's object
requests all at once--only when the entire image request has
completed.  So there's no need for an image request to count
references for its object requests.  However, we don't want an
image request to go away until the last of its object requests
has passed through rbd_img_obj_callback().  In other words,
we don't want rbd_img_request_complete() to necessarily
result in the image request being destroyed, because it may
get called before we've finished processing on all of its
object requests.

So the fix is to add a reference to an image request for
each of its object requests.  The reference can be viewed
as representing an object request that has not yet finished
its call to rbd_img_obj_callback().  That is emphasized by
getting the reference right after assigning that as the image
object's callback function.  The corresponding release of that
reference is done at the end of rbd_img_obj_callback(), which
every image object request passes through exactly once.

Signed-off-by: Alex Elder 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

mtip32xx: Remove dfs_parent after pci unregister

2014-07-07T01:59:09+00:00

commit af5ded8ccf21627f9614afc03b356712666ed225 upstream.

In module exit, dfs_parent and it's subtree were removed before
unregistering with pci. When debugfs entry for each device is attempted
to remove in pci_remove() context, they don't exist, as dfs_parent and
its children were already ripped apart.

Modified to first unregister with pci and then remove dfs_parent.

Signed-off-by: Asai Thambi S P 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

mtip32xx: Increase timeout for STANDBY IMMEDIATE command

2014-07-07T01:59:09+00:00

commit 670a641420a3d9586eebe7429dfeec4e7ed447aa upstream.

Increased timeout for STANDBY IMMEDIATE command to 2 minutes.

Signed-off-by: Selvan Mani 
Signed-off-by: Asai Thambi S P 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

mtip32xx: Fix ERO and NoSnoop values in PCIe upstream on AMD systems

2014-07-07T01:59:09+00:00

commit d1e714db8129a1d3670e449b87719c78e2c76f9f upstream.

A hardware quirk in P320h/P420m interfere with PCIe transactions on some
AMD chipsets, making P320h/P420m unusable. This workaround is to disable
ERO and NoSnoop bits in the parent and root complex for normal
functioning of these devices

NOTE: This workaround is specific to AMD chipset with a PCIe upstream
device with device id 0x5aXX

Signed-off-by: Asai Thambi S P 
Signed-off-by: Sam Bradshaw 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

zram: correct offset usage in zram_bio_discard

2014-07-01T03:13:55+00:00

commit 38515c73398a4c58059ecf1087e844561b58ee0f upstream.

We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.

The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown.  Consider the following scenario:

On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks.  However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.

This patch corrects the offset usage in zram_bio_discard.

Signed-off-by: Weijie Yang 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Acked-by: Joonsoo Kim 
Cc: Sergey Senozhatsky 
Cc: Bob Liu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

block: virtio_blk: don't hold spin lock during world switch

2014-07-01T03:13:52+00:00

commit e8edca6f7f92234202d6dd163c118ef495244d7c upstream.

Firstly, it isn't necessary to hold lock of vblk->vq_lock
when notifying hypervisor about queued I/O.

Secondly, virtqueue_notify() will cause world switch and
it may take long time on some hypervisors(such as, qemu-arm),
so it isn't good to hold the lock and block other vCPUs.

On arm64 quad core VM(qemu-kvm), the patch can increase I/O
performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
	- without the patch: 14K IOPS
	- with the patch: 34K IOPS

fio script:
	[global]
	direct=1
	bsrange=4k-4k
	timeout=10
	numjobs=4
	ioengine=libaio
	iodepth=64

	filename=/dev/vdc
	group_reporting=1

	[f1]
	rw=randread

Cc: Rusty Russell 
Cc: "Michael S. Tsirkin" 
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ming Lei 
Acked-by: Rusty Russell 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

virtio_blk: fix race between start and stop queue

2014-05-27T14:41:10+00:00

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe 
Cc: Rusty Russell 
Signed-off-by: Ming Lei 
Cc: stable@kernel.org
Signed-off-by: Jens Axboe 

Conflicts:
	drivers/block/virtio_blk.c

floppy: don't write kernel-only members to FDRAWCMD ioctl output

2014-05-05T14:46:56+00:00

Do not leak kernel-only floppy_raw_cmd structure members to userspace.
This includes the linked-list pointer and the pointer to the allocated
DMA space.

Signed-off-by: Matthew Daley 
Signed-off-by: Linus Torvalds