rbd: eliminate a race in lock_dwork draining on unmap

Given how rbd_lock_add_request() and rbd_img_exclusive_lock() are written, lock_dwork may be (re)queued more than it's actually needed: for example in case a new I/O request comes in while we are in the middle of rbd_acquire_lock() on behalf of another I/O request. This is expected and with rbd_release_lock() preemptively canceling lock_dwork is benign under normal operation. A more problematic example is maybe_kick_acquire(): if (have_requests || delayed_work_pending(&rbd_dev->lock_dwork)) { dout("%s rbd_dev %p kicking lock_dwork\n", __func__, rbd_dev); mod_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); } It's not unrealistic for lock_dwork to get canceled right after delayed_work_pending() returns true and for mod_delayed_work() to requeue it right there anyway. This is a classic TOCTOU race. When it comes to unmapping the image, there is an implicit assumption of no self-initiated exclusive lock activity past the point of return from rbd_dev_image_unlock() which unlocks the lock if it happens to be held. This unlock is assumed to be final and lock_dwork (as well as all other exclusive lock tasks, really) isn't expected to get queued again. However, lock_dwork is canceled only in cancel_tasks_sync() (i.e. later in the unmap sequence) and on top of that the cancellation can get in effect nullified by maybe_kick_acquire(). This may result in rbd_acquire_lock() executing after rbd_dev_device_release() and rbd_dev_image_release() run and free and/or reset a bunch of things. One of the possible failure modes then is a violated rbd_assert(rbd_image_format_valid(rbd_dev->image_format)); in rbd_dev_header_info() which is called via rbd_dev_refresh() from rbd_post_acquire_action(). Redo exclusive lock task draining to provide saner semantics and try to meet the assumptions around rbd_dev_image_unlock(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
author: Ilya Dryomov <idryomov@gmail.com> 2026-05-19 23:07:26 +0200
committer: Ilya Dryomov <idryomov@gmail.com> 2026-05-20 22:09:08 +0200
commit: 9fc75b71fdd38465c76c6f6a884cdd4ae3c72d90 (patch)
tree: 9da9589f07ad49dfb20a059e1dad15fd4bef8be5
parent: 5200f5f493f79f14bbdc349e402a40dfb32f23c8 (diff)
1 files changed, 8 insertions, 12 deletions
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 4065336ebd1f..6c1e7347e6a7 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4565,24 +4565,12 @@ out:
 	return ret;
 }
 
-static void cancel_tasks_sync(struct rbd_device *rbd_dev)
-{
-	dout("%s rbd_dev %p\n", __func__, rbd_dev);
-
-	cancel_work_sync(&rbd_dev->acquired_lock_work);
-	cancel_work_sync(&rbd_dev->released_lock_work);
-	cancel_delayed_work_sync(&rbd_dev->lock_dwork);
-	cancel_work_sync(&rbd_dev->unlock_work);
-}
-
 /*
  * header_rwsem must not be held to avoid a deadlock with
  * rbd_dev_refresh() when flushing notifies.
  */
 static void rbd_unregister_watch(struct rbd_device *rbd_dev)
 {
-	cancel_tasks_sync(rbd_dev);
-
 	mutex_lock(&rbd_dev->watch_mutex);
 	if (rbd_dev->watch_state == RBD_WATCH_STATE_REGISTERED)
 		__rbd_unregister_watch(rbd_dev);
@@ -6548,10 +6536,18 @@ out_err:
 
 static void rbd_dev_image_unlock(struct rbd_device *rbd_dev)
 {
+	dout("%s rbd_dev %p\n", __func__, rbd_dev);
+
+	disable_delayed_work_sync(&rbd_dev->lock_dwork);
+	disable_work_sync(&rbd_dev->unlock_work);
+
 	down_write(&rbd_dev->lock_rwsem);
 	if (__rbd_is_lock_owner(rbd_dev))
 		__rbd_release_lock(rbd_dev);
 	up_write(&rbd_dev->lock_rwsem);
+
+	flush_work(&rbd_dev->acquired_lock_work);
+	flush_work(&rbd_dev->released_lock_work);
 }
 
 /*
author	Ilya Dryomov <idryomov@gmail.com>	2026-05-19 23:07:26 +0200
committer	Ilya Dryomov <idryomov@gmail.com>	2026-05-20 22:09:08 +0200
commit	9fc75b71fdd38465c76c6f6a884cdd4ae3c72d90 (patch)
tree	9da9589f07ad49dfb20a059e1dad15fd4bef8be5
parent	5200f5f493f79f14bbdc349e402a40dfb32f23c8 (diff)