linux-toradex.git/drivers/block/rbd.c, branch v4.12

rbd: implement REQ_OP_WRITE_ZEROES

2017-05-29T09:18:01+00:00

Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman 
Reviewed-by: Christoph Hellwig

Merge tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client

2017-05-10T15:42:33+00:00

Pull ceph updates from Ilya Dryomov:
 "The two main items are support for disabling automatic rbd exclusive
  lock transfers from myself and the long awaited -ENOSPC handling
  series from Jeff.

  The former will allow rbd users to take advantage of exclusive lock's
  built-in blacklist/break-lock functionality while staying in control
  of who owns the lock. With the latter in place, we will abort
  filesystem writes on -ENOSPC instead of having them block
  indefinitely.

  Beyond that we've got the usual pile of filesystem fixes from Zheng,
  some refcount_t conversion patches from Elena and a patch for an
  ancient open() flags handling bug from Alexander"

* tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client: (31 commits)
  ceph: fix memory leak in __ceph_setxattr()
  ceph: fix file open flags on ppc64
  ceph: choose readdir frag based on previous readdir reply
  rbd: exclusive map option
  rbd: return ResponseMessage result from rbd_handle_request_lock()
  rbd: kill rbd_is_lock_supported()
  rbd: support updating the lock cookie without releasing the lock
  rbd: store lock cookie
  rbd: ignore unlock errors
  rbd: fix error handling around rbd_init_disk()
  rbd: move rbd_unregister_watch() call into rbd_dev_image_release()
  rbd: move rbd_dev_destroy() call out of rbd_dev_image_release()
  ceph: when seeing write errors on an inode, switch to sync writes
  Revert "ceph: SetPageError() for writeback pages if writepages fails"
  ceph: handle epoch barriers in cap messages
  libceph: add an epoch_barrier field to struct ceph_osd_client
  libceph: abort already submitted but abortable requests when map or pool goes full
  libceph: allow requests to return immediately on full conditions if caller wishes
  libceph: remove req->r_replay_version
  ceph: make seeky readdir more efficient
  ...

fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-05-09T00:15:15+00:00

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani 
Reviewed-by: Arnd Bergmann 
M:	Ilya Dryomov 
M:	"Yan, Zheng" 
M:	Sage Weil 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

rbd: exclusive map option

2017-05-04T07:19:24+00:00

Support disabling automatic exclusive lock transfers to allow users
to be in charge of which node should own the lock while being able to
reuse exclusive lock's built-in blacklist/break-lock functionality.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: return ResponseMessage result from rbd_handle_request_lock()

2017-05-04T07:19:24+00:00

Right now it's just 0, but "no automatic exclusive lock transfers" mode
code will need -EROFS.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: kill rbd_is_lock_supported()

2017-05-04T07:19:23+00:00

Currently the exclusive lock is acquired only if the mapping is
writable, i.e. an image HEAD mapped in rw mode.  This means that we
don't acquire the lock for executing a read from a snapshot or an image
HEAD mapped in ro mode, even if lock_on_read is set.  This is somewhat
weird and inconsistent with "no automatic exclusive lock transfers"
mode, where the lock is acquired unconditionally.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: support updating the lock cookie without releasing the lock

2017-05-04T07:19:23+00:00

As we no longer release the lock before potentially raising BLACKLISTED
in rbd_reregister_watch(), the "either locked or blacklisted" assert in
rbd_queue_workfn() needs to go: we can be both locked and blacklisted
at that point now.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: store lock cookie

2017-05-04T07:19:23+00:00

In preparation for supporting set_cookie method (or rather set_cookie
fallback for older OSDs), store the lock cookie on lock and use it on
unlock instead of recalculating from rbd_dev->watch_cookie.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: ignore unlock errors

2017-05-04T07:19:23+00:00

Currently the lock_state is set to UNLOCKED (preventing further I/O),
but RELEASED_LOCK notification isn't sent.  Be consistent with userspace
and treat ceph_cls_unlock() errors as the image is unlocked.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman

rbd: fix error handling around rbd_init_disk()

2017-05-04T07:19:23+00:00

add_disk() takes an extra reference on disk->queue, which is put in
put_disk() -> disk_release().  Avoiding blk_cleanup_queue() (which also
puts the queue) until add_disk() sets GENHD_FL_UP works for the queue
itself, but leaks various queue internals.  Conditioning tag_set freeing
on GENHD_FL_UP is wrong too: all error paths after rbd_init_disk() leak
the tag_set.

Move the final "announce" steps out of rbd_dev_device_setup() so that
it can be unwound like any other function.  Leave "announce" steps to
do_rbd_add/remove().

Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman