linux-toradex.git/net/ceph, branch v4.9.106

libceph: validate con->state at the top of try_write()

2018-05-01T22:13:09+00:00

commit 9c55ad1c214d9f8c4594ac2c3fa392c1c32431a7 upstream.

ceph_con_workfn() validates con->state before calling try_read() and
then try_write().  However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock.  When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_socket_sendmsg+0x5/0x20

Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman 
Signed-off-by: Greg Kroah-Hartman

libceph: reschedule a tick in finish_hunting()

2018-05-01T22:13:08+00:00

commit 7b4c443d139f1d2b5570da475f7a9cbcef86740c upstream.

If we go without an established session for a while, backoff delay will
climb to 30 seconds.  The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:

  [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
  [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
  [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
  [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
  [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
  [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
  [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
  [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon

The regular keepalive interval is 10 seconds.  After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.

Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman 
Signed-off-by: Greg Kroah-Hartman

libceph: un-backoff on tick when we have a authenticated session

2018-05-01T22:13:08+00:00

commit facb9f6eba3df4e8027301cc0e514dc582a1b366 upstream.

This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.

Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov 
Reviewed-by: Jason Dillaman 
Signed-off-by: Greg Kroah-Hartman

libceph: NULL deref on crush_decode() error path

2018-04-13T17:48:05+00:00

[ Upstream commit 293dffaad8d500e1a5336eeb90d544cf40d4fbd8 ]

If there is not enough space then ceph_decode_32_safe() does a goto bad.
We need to return an error code in that situation.  The current code
returns ERR_PTR(0) which is NULL.  The callers are not expecting that
and it results in a NULL dereference.

Fixes: f24e9980eb86 ("ceph: OSD client")
Signed-off-by: Dan Carpenter 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

libceph: don't WARN() if user tries to add invalid key

2017-11-30T08:39:03+00:00

commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream.

The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a
user tries to add a key of type "ceph" with an invalid payload as
follows (assuming CONFIG_CEPH_LIB=y):

    echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \
	| keyctl padd ceph desc @s

This can be hit by fuzzers.  As this is merely bad input and not a
kernel bug, replace the WARN_ON() with return -EINVAL.

Fixes: 7af3ea189a9a ("libceph: stop allocating a new cipher on every crypto request")
Signed-off-by: Eric Biggers 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

libceph: force GFP_NOIO for socket allocations

2017-04-08T07:30:30+00:00

commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream.

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [] schedule+0x29/0x70
    [] schedule_timeout+0x1bd/0x200
    [] ? ttwu_do_wakeup+0x2c/0x120
    [] ? ttwu_do_activate.constprop.135+0x66/0x70
    [] wait_for_completion+0xbf/0x180
    [] ? try_to_wake_up+0x390/0x390
    [] flush_work+0x165/0x250
    [] ? worker_detach_from_pool+0xd0/0xd0
    [] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [] ? __slab_free+0xee/0x234
    [] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [] ? lookup_page_cgroup_used+0xe/0x30
    [] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [] ? wake_atomic_t_function+0x40/0x40
    [] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [] super_cache_scan+0x178/0x180
    [] shrink_slab_node+0x14e/0x340
    [] ? mem_cgroup_iter+0x16b/0x450
    [] shrink_slab+0x100/0x140
    [] do_try_to_free_pages+0x335/0x490
    [] try_to_free_pages+0xb9/0x1f0
    [] ? __alloc_pages_direct_compact+0x69/0x1be
    [] __alloc_pages_nodemask+0x69a/0xb40
    [] alloc_pages_current+0x9e/0x110
    [] new_slab+0x2c5/0x390
    [] __slab_alloc+0x33b/0x459
    [] ? sock_alloc_inode+0x2d/0xd0
    [] ? inet_sendmsg+0x71/0xc0
    [] ? sock_alloc_inode+0x2d/0xd0
    [] kmem_cache_alloc+0x1a2/0x1b0
    [] sock_alloc_inode+0x2d/0xd0
    [] alloc_inode+0x26/0xa0
    [] new_inode_pseudo+0x1a/0x70
    [] sock_alloc+0x1e/0x80
    [] __sock_create+0x95/0x220
    [] sock_create_kern+0x24/0x30
    [] con_work+0xef9/0x2050 [libceph]
    [] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [] process_one_work+0x159/0x4f0
    [] worker_thread+0x11b/0x530
    [] ? create_worker+0x1d0/0x1d0
    [] kthread+0xc9/0xe0
    [] ? flush_kthread_worker+0x90/0x90
    [] ret_from_fork+0x58/0x90
    [] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov 
Signed-off-by: Ilya Dryomov 
Reviewed-by: Jeff Layton 
Signed-off-by: Greg Kroah-Hartman

libceph: don't set weight to IN when OSD is destroyed

2017-03-30T07:41:27+00:00

commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream.

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs.  Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

ceph: update readpages osd request according to size of pages

2017-03-12T05:41:53+00:00

commit d641df819db8b80198fd85d9de91137e8a823b07 upstream.

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.

Signed-off-by: Yan, Zheng 
Reviewed-by: Jeff Layton 
Signed-off-by: Greg Kroah-Hartman

libceph: stop allocating a new cipher on every crypto request

2017-01-26T07:24:46+00:00

commit 7af3ea189a9a13f090de51c97f676215dabc1205 upstream.

This is useless and more importantly not allowed on the writeback path,
because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
can recurse back into the filesystem:

    kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
     ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
     ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
     00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
     [] ? schedule+0x31/0x80
     [] ? schedule_preempt_disabled+0xa/0x10
     [] ? __mutex_lock_slowpath+0xb4/0x130
     [] ? mutex_lock+0x1b/0x30
     [] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
     [] ? move_active_pages_to_lru+0x125/0x270
     [] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
     [] ? __list_lru_walk_one.isra.3+0x33/0x120
     [] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
     [] ? super_cache_scan+0x17e/0x190
     [] ? shrink_slab.part.38+0x1e3/0x3d0
     [] ? shrink_node+0x10a/0x320
     [] ? do_try_to_free_pages+0xf4/0x350
     [] ? try_to_free_pages+0xea/0x1b0
     [] ? __alloc_pages_nodemask+0x61d/0xe60
     [] ? cache_grow_begin+0x9d/0x560
     [] ? fallback_alloc+0x148/0x1c0
     [] ? __crypto_alloc_tfm+0x37/0x130
     [] ? __kmalloc+0x1eb/0x580
     [] ? crush_choose_firstn+0x3eb/0x470 [libceph]
     [] ? __crypto_alloc_tfm+0x37/0x130
     [] ? crypto_spawn_tfm+0x39/0x60
     [] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
     [] ? __crypto_alloc_tfm+0xcc/0x130
     [] ? crypto_skcipher_init_tfm+0x113/0x180
     [] ? crypto_create_tfm+0x43/0xb0
     [] ? crypto_larval_lookup+0x150/0x150
     [] ? crypto_alloc_tfm+0x72/0x120
     [] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
     [] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
     [] ? release_sock+0x40/0x90
     [] ? tcp_recvmsg+0x4b4/0xae0
     [] ? ceph_encrypt2+0x54/0xc0 [libceph]
     [] ? ceph_x_encrypt+0x5d/0x90 [libceph]
     [] ? calcu_signature+0x5f/0x90 [libceph]
     [] ? ceph_x_sign_message+0x35/0x50 [libceph]
     [] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
     [] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
     [] ? queue_con_delay+0x33/0xd0 [libceph]
     [] ? __submit_request+0x20d/0x2f0 [libceph]
     [] ? ceph_osdc_start_request+0x28/0x30 [libceph]
     [] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
     [] ? process_one_work+0x160/0x410
     [] ? worker_thread+0x4d/0x480
     [] ? process_one_work+0x410/0x410
     [] ? kthread+0xcd/0xf0
     [] ? ret_from_fork+0x1f/0x40
     [] ? kthread_create_on_node+0x190/0x190

Allocating the cipher along with the key fixes the issue - as long the
key doesn't change, a single cipher context can be used concurrently in
multiple requests.

We still can't take that GFP_KERNEL allocation though.  Both
ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

Reported-by: Lucas Stach 
Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

libceph: uninline ceph_crypto_key_destroy()

2017-01-26T07:24:46+00:00

commit 6db2304aabb070261ad34923bfd83c43dfb000e3 upstream.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman