linux-toradex.git/net/ceph, branch v3.10.62

libceph: do not crash on large auth tickets

2014-11-21T17:22:53+00:00

commit aaef31703a0cf6a733e651885bfb49edc3ac6774 upstream.

Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[]  [] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [] scatterwalk_done+0x38/0x40
[   28.688088]  [] scatterwalk_done+0x38/0x40
[   28.688088]  [] blkcipher_walk_done+0x182/0x220
[   28.688088]  [] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [] ceph_encrypt2+0x93/0xb0
[   28.688088]  [] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [] get_authorizer+0x80/0xd0
[   28.688088]  [] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: ceph-msgr workqueue needs a resque worker

2014-11-14T16:48:01+00:00

commit f9865f06f7f18c6661c88d0511f05c48612319cc upstream.

Commit f363e45fd118 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
wrong - libceph is very much a memory reclaim path, so restore it.

Signed-off-by: Ilya Dryomov 
Tested-by: Micha Krause 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: gracefully handle large reply messages from the mon

2014-09-17T16:04:02+00:00

commit 73c3d4812b4c755efeca0140f606f83772a39ce4 upstream.

We preallocate a few of the message types we get back from the mon.  If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.

Signed-off-by: Sage Weil 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

libceph: rename ceph_msg::front_max to front_alloc_len

2014-09-17T16:04:02+00:00

commit 3cea4c3071d4e55e9d7356efe9d0ebf92f0c2204 upstream.

Rename front_max field of struct ceph_msg to front_alloc_len to make
its purpose more clear.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: do not hard code max auth ticket len

2014-09-17T16:04:01+00:00

commit c27a3e4d667fdcad3db7b104f75659478e0c68d8 upstream.

We hard code cephx auth ticket buffer size to 256 bytes.  This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper).  Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: add process_one_ticket() helper

2014-09-17T16:04:01+00:00

commit 597cda357716a3cf8d994cb11927af917c8d71fa upstream.

Add a helper for processing individual cephx auth tickets.  Needed for
the next commit, which deals with allocating ticket buffers.  (Most of
the diff here is whitespace - view with git diff -b).

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly

2014-09-17T16:04:01+00:00

commit 5f740d7e1531099b888410e6bab13f68da9b1a4d upstream.

Determining ->last_piece based on the value of ->page_offset + length
is incorrect because length here is the length of the entire message.
->last_piece set to false even if page array data item length is <=
PAGE_SIZE, which results in invalid length passed to
ceph_tcp_{send,recv}page() and causes various asserts to fire.

    # cat pages-cursor-init.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    # rbd_resize calls librbd rbd_resize(), size is in bytes
    ./rbd_resize bar $(((4 << 20) + 512))
    rbd resize --size 10 bar
    BAR_DEV=$(rbd map bar)
    # trigger a 512-byte copyup -- 512-byte page array data item
    dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5

The problem exists only in ceph_msg_data_pages_cursor_init(),
ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
unnecessary.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Reviewed-by: Alex Elder 
Signed-off-by: Greg Kroah-Hartman

libceph: fix corruption when using page_count 0 page in rbd

2014-06-07T20:25:40+00:00

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream.

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

https://github.com/zfsonlinux/spl/issues/241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil 
Cc: Yehuda Sadeh 
Signed-off-by: Chunwei Chen 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

libceph: resend all writes after the osdmap loses the full flag

2014-03-31T16:58:13+00:00

commit 9a1ea2dbff11547a8e664f143c1ffefc586a577a upstream.

With the current full handling, there is a race between osds and
clients getting the first map marked full. If the osd wins, it will
return -ENOSPC to any writes, but the client may already have writes
in flight. This results in the client getting the error and
propagating it up the stack. For rbd, the block layer turns this into
EIO, which can cause corruption in filesystems above it.

To avoid this race, osds are being changed to drop writes that came
from clients with an osdmap older than the last osdmap marked full.
In order for this to work, clients must resend all writes after they
encounter a full -> not full transition in the osdmap. osds will wait
for an updated map instead of processing a request from a client with
a newer map, so resent writes will not be dropped by the osd unless
there is another not full -> full transition.

This approach requires both osds and clients to be fixed to avoid the
race. Old clients talking to osds with this fix may hang instead of
returning EIO and potentially corrupting an fs. New clients talking to
old osds have the same behavior as before if they encounter this race.

Fixes: http://tracker.ceph.com/issues/6938

Reviewed-by: Sage Weil 
Signed-off-by: Josh Durgin 
Signed-off-by: Greg Kroah-Hartman

libceph: block I/O when PAUSE or FULL osd map flags are set

2014-03-31T16:58:12+00:00

commit d29adb34a94715174c88ca93e8aba955850c9bde upstream.

The PAUSEWR and PAUSERD flags are meant to stop the cluster from
processing writes and reads, respectively. The FULL flag is set when
the cluster determines that it is out of space, and will no longer
process writes.  PAUSEWR and PAUSERD are purely client-side settings
already implemented in userspace clients. The osd does nothing special
with these flags.

When the FULL flag is set, however, the osd responds to all writes
with -ENOSPC. For cephfs, this makes sense, but for rbd the block
layer translates this into EIO.  If a cluster goes from full to
non-full quickly, a filesystem on top of rbd will not behave well,
since some writes succeed while others get EIO.

Fix this by blocking any writes when the FULL flag is set in the osd
client. This is the same strategy used by userspace, so apply it by
default.  A follow-on patch makes this configurable.

__map_request() is called to re-target osd requests in case the
available osds changed.  Add a paused field to a ceph_osd_request, and
set it whenever an appropriate osd map flag is set.  Avoid queueing
paused requests in __map_request(), but force them to be resent if
they become unpaused.

Also subscribe to the next osd map from the monitor if any of these
flags are set, so paused requests can be unblocked as soon as
possible.

Fixes: http://tracker.ceph.com/issues/6079

Reviewed-by: Sage Weil 
Signed-off-by: Josh Durgin 
Signed-off-by: Greg Kroah-Hartman