summaryrefslogtreecommitdiff
path: root/net/ceph/osd_client.c
diff options
context:
space:
mode:
authorIlya Dryomov <idryomov@gmail.com>2015-11-02 17:13:58 +0100
committerIlya Dryomov <idryomov@gmail.com>2015-11-02 23:37:46 +0100
commit583d0fef756a7615e50f0f68ea0892a497d03971 (patch)
tree6064fa517086baae5daa12f683d9af51d4be9ed4 /net/ceph/osd_client.c
parenta51983e4dd2d4d63912aab939f657c4cd476e21a (diff)
libceph: clear msg->con in ceph_msg_release() only
The following bit in ceph_msg_revoke_incoming() is unsafe: struct ceph_connection *con = msg->con; if (!con) return; mutex_lock(&con->mutex); <more msg->con use> There is nothing preventing con from getting destroyed right after msg->con test. One easy way to reproduce this is to disable message signing only on the server side and try to map an image. The system will go into a libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature loop which has to be interrupted with Ctrl-C. Hit Ctrl-C and you are likely to end up with a random GP fault if the reset handler executes "within" ceph_msg_revoke_incoming(): <yet another reply w/o a signature> ... <Ctrl-C> rbd_obj_request_end ceph_osdc_cancel_request __unregister_request ceph_osdc_put_request ceph_msg_revoke_incoming ... osd_reset __kick_osd_requests __reset_osd remove_osd ceph_con_close reset_connection <clear con->in_msg->con> <put con ref> put_osd <free osd/con> <msg->con use> <-- !!! If ceph_msg_revoke_incoming() executes "before" the reset handler, osd/con will be leaked because ceph_msg_revoke_incoming() clears con->in_msg but doesn't put con ref, while reset_connection() only puts con ref if con->in_msg != NULL. The current msg->con scheme was introduced by commits 38941f8031bf ("libceph: have messages point to their connection") and 92ce034b5a74 ("libceph: have messages take a connection reference"), which defined when messages get associated with a connection and when that association goes away. Part of the problem is that this association is supposed to go away in much too many places; closing this race entirely requires either a rework of the existing or an addition of a new layer of synchronization. In lieu of that, we can make it *much* less likely to hit by disassociating messages only on their destruction and resend through a different connection. This makes the code simpler and is probably a good thing to do regardless - this patch adds a msg_con_set() helper which is is called from only three places: ceph_con_send() and ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Diffstat (limited to 'net/ceph/osd_client.c')
-rw-r--r--net/ceph/osd_client.c3
1 files changed, 0 insertions, 3 deletions
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 118e4ce37ecc..f8f235930d88 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2850,9 +2850,6 @@ static struct ceph_msg *get_reply(struct ceph_connection *con,
goto out;
}
- if (req->r_reply->con)
- dout("%s revoking msg %p from old con %p\n", __func__,
- req->r_reply, req->r_reply->con);
ceph_msg_revoke_incoming(req->r_reply);
if (front_len > req->r_reply->front_alloc_len) {