linux-toradex.git/drivers/md/bcache, branch v4.4.129

bcache: segregate flash only volume write streams

2018-04-13T17:50:22+00:00

[ Upstream commit 4eca1cb28d8b0574ca4f1f48e9331c5f852d43b9 ]

In such scenario that there are some flash only volumes
, and some cached devices, when many tasks request these devices in
writeback mode, the write IOs may fall to the same bucket as bellow:
| cached data | flash data | cached data | cached data| flash data|
then after writeback of these cached devices, the bucket would
be like bellow bucket:
| free | flash data | free | free | flash data |

So, there are many free space in this bucket, but since data of flash
only volumes still exists, so this bucket cannot be reclaimable,
which would cause waste of bucket space.

In this patch, we segregate flash only volume write streams from
cached devices, so data from flash only volumes and cached devices
can store in different buckets.

Compare to v1 patch, this patch do not add a additionally open bucket
list, and it is try best to segregate flash only volume write streams
from cached devices, sectors of flash only volumes may still be mixed
with dirty sectors of cached device, but the number is very small.

[mlyle: fixed commit log formatting, permissions, line endings]

Signed-off-by: Tang Junhui 
Reviewed-by: Michael Lyle 
Signed-off-by: Michael Lyle 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: stop writeback thread after detaching

2018-04-13T17:50:22+00:00

[ Upstream commit 8d29c4426b9f8afaccf28de414fde8a722b35fdf ]

Currently, when a cached device detaching from cache, writeback thread is
not stopped, and writeback_rate_update work is not canceled. For example,
after the following command:
echo 1 >/sys/block/sdb/bcache/detach
you can still see the writeback thread. Then you attach the device to the
cache again, bcache will create another writeback thread, for example,
after below command:
echo  ba0fb5cd-658a-4533-9806-6ce166d883b9 > /sys/block/sdb/bcache/attach
then you will see 2 writeback threads.
This patch stops writeback thread and cancels writeback_rate_update work
when cached device detaching from cache.

Compare with patch v1, this v2 patch moves code down into the register
lock for safety in case of any future changes as Coly and Mike suggested.

[edit by mlyle: commit log spelling/formatting]

Signed-off-by: Tang Junhui 
Reviewed-by: Michael Lyle 
Signed-off-by: Michael Lyle 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: don't attach backing with duplicate UUID

2018-03-18T10:17:49+00:00

commit 86755b7a96faed57f910f9e6b8061e019ac1ec08 upstream.

This can happen e.g. during disk cloning.

This is an incomplete fix: it does not catch duplicate UUIDs earlier
when things are still unattached.  It does not unregister the device.
Further changes to cope better with this are planned but conflict with
Coly's ongoing improvements to handling device errors.  In the meantime,
one can manually stop the device after this has happened.

Attempts to attach a duplicate device result in:

[  136.372404] loop: module loaded
[  136.424461] bcache: register_bdev() registered backing device loop0
[  136.424464] bcache: bch_cached_dev_attach() Tried to attach loop0 but duplicate UUID already attached

My test procedure is:

  dd if=/dev/sdb1 of=imgfile bs=1024 count=262144
  losetup -f imgfile

Signed-off-by: Michael Lyle 
Reviewed-by: Tang Junhui 
Cc: 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

bcache: check return value of register_shrinker

2018-02-03T16:04:27+00:00

[ Upstream commit 6c4ca1e36cdc1a0a7a84797804b87920ccbebf51 ]

register_shrinker is now __must_check, so check it to kill a warning.
Caller of bch_btree_cache_alloc in super.c appropriately checks return
value so this is fully plumbed through.

This V2 fixes checkpatch warnings and improves the commit description,
as I was too hasty getting the previous version out.

Signed-off-by: Michael Lyle 
Reviewed-by: Vojtech Pavlik 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: fix wrong cache_misses statistics

2017-12-20T09:04:59+00:00

[ Upstream commit c157313791a999646901b3e3c6888514ebc36d62 ]

Currently, Cache missed IOs are identified by s->cache_miss, but actually,
there are many situations that missed IOs are not assigned a value for
s->cache_miss in cached_dev_cache_miss(), for example, a bypassed IO
(s->iop.bypass = 1), or the cache_bio allocate failed. In these situations,
it will go to out_put or out_submit, and s->cache_miss is null, which leads
bch_mark_cache_accounting() to treat this IO as a hit IO.

[ML: applied by 3-way merge]

Signed-off-by: tang.junhui 
Reviewed-by: Michael Lyle 
Reviewed-by: Coly Li 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: explicitly destroy mutex while exiting

2017-12-20T09:04:59+00:00

[ Upstream commit 330a4db89d39a6b43f36da16824eaa7a7509d34d ]

mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.

As Coly pointed out in a previous review, bcache_exit() may not be
able to handle all the references properly if userspace registers
cache and backing devices right before bch_debug_init runs and
bch_debug_init failes later. So not exposing userspace interface
until everything is ready to avoid that issue.

Signed-off-by: Liang Chen 
Reviewed-by: Michael Lyle 
Reviewed-by: Coly Li 
Reviewed-by: Eric Wheeler 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: recover data from backing when data is clean

2017-12-09T17:42:37+00:00

commit e393aa2446150536929140739f09c6ecbcbea7f0 upstream.

When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s->iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)

It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.

Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happened, until the whole cache
device is clean, because the condition
(s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in
cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.

In this patch, we use s->read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.

[edited by mlyle to fix up whitespace, commit log title, comment
spelling]

Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean")
Signed-off-by: Hua Rui 
Reviewed-by: Michael Lyle 
Reviewed-by: Coly Li 
Signed-off-by: Michael Lyle 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

bcache: only permit to recovery read error when cache device is clean

2017-12-09T17:42:37+00:00

commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream.

When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.

For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.

With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.

For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.

Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc->has_data in writethrough mode still makes sense.

Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
    bug to fix, and option to permit it is unnecessary. So this version
    the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure  to
    allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.

[small change to patch comment spelling by mlyle]

Signed-off-by: Coly Li 
Signed-off-by: Michael Lyle 
Reported-by: Arne Wolf 
Reviewed-by: Michael Lyle 
Cc: Kent Overstreet 
Cc: Nix 
Cc: Kai Krakow 
Cc: Eric Wheeler 
Cc: Junhui Tang 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

bcache: Fix building error on MIPS

2017-12-05T10:22:51+00:00

commit cf33c1ee5254c6a430bc1538232b49c3ea13e613 upstream.

This patch try to fix the building error on MIPS. The reason is MIPS
has already defined the PTR macro, which conflicts with the PTR macro
in include/uapi/linux/bcache.h.

[fixed by mlyle: corrected a line-length issue]

Signed-off-by: Huacai Chen 
Reviewed-by: Michael Lyle 
Signed-off-by: Michael Lyle 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

bcache: check ca->alloc_thread initialized before wake up it

2017-11-30T08:37:20+00:00

commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.

In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.

There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.

The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca->alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca->alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca->alloc_thread, checking whether it is allocated, and only
wake up ca->alloc_thread when it is not NULL.

Signed-off-by: Coly Li 
Reported-by: Jorg Bornschein 
Cc: Kent Overstreet 
Reviewed-by: Michael Lyle 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman