linux-toradex.git/fs/btrfs/ordered-data.c, branch v5.5

Btrfs: fix block group remaining RO forever after error during device replace

2019-11-18T17:07:55+00:00

When doing a device replace, while at scrub.c:scrub_enumerate_chunks(), we
set the block group to RO mode and then wait for any ongoing writes into
extents of the block group to complete. While doing that wait we overwrite
the value of the variable 'ret' and can break out of the loop if an error
happens without turning the block group back into RW mode. So what happens
is the following:

1) btrfs_inc_block_group_ro() returns 0, meaning it set the block group
   to RO mode (its ->ro field set to 1 or incremented to some value > 1);

2) Then btrfs_wait_ordered_roots() returns a value > 0;

3) Then if either joining or committing the transaction fails, we break
   out of the loop wihtout calling btrfs_dec_block_group_ro(), leaving
   the block group in RO mode forever.

To fix this, just remove the code that waits for ongoing writes to extents
of the block group, since it's not needed because in the initial setup
phase of a device replace operation, before starting to find all chunks
and their extents, we set the target device for replace while holding
fs_info->dev_replace->rwsem, which ensures that after releasing that
semaphore, any writes into the source device are made to the target device
as well (__btrfs_map_block() guarantees that). So while at
scrub_enumerate_chunks() we only need to worry about finding and copying
extents (from the source device to the target device) that were written
before we started the device replace operation.

Fixes: f0e9b7d6401959 ("Btrfs: fix race setting block group readonly during device replace")
Signed-off-by: Filipe Manana 
Signed-off-by: David Sterba

btrfs: get rid of unique workqueue helper functions

2019-11-18T11:46:48+00:00

Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.

However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.

While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().

Reviewed-by: Nikolay Borisov 
Reviewed-by: Filipe Manana 
Signed-off-by: Omar Sandoval 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: move cond_wake_up functions out of ctree

2019-09-09T12:59:15+00:00

The file ctree.h serves as a header for everything and has become quite
bloated. Split some helpers that are generic and create a new file that
should be the catch-all for code that's not btrfs-specific.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: David Sterba

btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range

2019-07-26T10:21:22+00:00

btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
"cachedp", but it never goes backs to the caller. Thus the caller still
see its "cached_state" to be NULL and never free the state allocated
under btrfs_lock_and_flush_ordered_range(). As a result, we will
see massive state leak with e.g. fstests btrfs/005. Fix this bug by
properly handling the pointers.

Fixes: bd80d94efb83 ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
Reviewed-by: Nikolay Borisov 
Signed-off-by: Naohiro Aota 
Signed-off-by: David Sterba

btrfs: migrate the delalloc space stuff to it's own home

2019-07-04T15:26:17+00:00

We have code for data and metadata reservations for delalloc.  There's
quite a bit of code here, and it's used in a lot of places so I've
separated it out to it's own file.  inode.c and file.c are already
pretty large, and this code is complicated enough to live in its own
space.

Signed-off-by: Josef Bacik 
Signed-off-by: David Sterba

btrfs: don't assume ordered sums to be 4 bytes

2019-07-01T11:35:00+00:00

BTRFS has the implicit assumption that a checksum in btrfs_orderd_sums
is 4 bytes. While this is true for CRC32C, it is not for any other
checksum.

Change the data type to be a byte array and adjust loop index
calculation accordingly.

This includes moving the adjustment of 'index' by 'ins_size' in
btrfs_csum_file_blocks() before dividing 'ins_size' by the checksum
size, because before this patch the 'sums' member of 'struct
btrfs_ordered_sum' was 4 Bytes in size and afterwards it is only one
byte.

Reviewed-by: Nikolay Borisov 
Signed-off-by: Johannes Thumshirn 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range

2019-07-01T11:34:59+00:00

In case no cached_state argument is passed to
btrfs_lock_and_flush_ordered_range use one locally in the function. This
optimises the case when an ordered extent is found since the unlock
function will be able to unlock that state directly without searching
for it again.

Reviewed-by: Josef Bacik 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: add new helper btrfs_lock_and_flush_ordered_range

2019-07-01T11:34:59+00:00

There is a certain idiom used in multiple places in btrfs' codebase,
dealing with flushing an ordered range. Factor this in a separate
function that can be reused. Future patches will replace the existing
code with that function.

Reviewed-by: Josef Bacik 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: track DIO bytes in flight

2019-04-29T17:25:37+00:00

When diagnosing a slowdown of generic/224 I noticed we were not doing
anything when calling into shrink_delalloc().  This is because all
writes in 224 are O_DIRECT, not delalloc, and thus our delalloc_bytes
counter is 0, which short circuits most of the work inside of
shrink_delalloc().  However O_DIRECT writes still consume metadata
resources and generate ordered extents, which we can still wait on.

Fix this by tracking outstanding DIO write bytes, and use this as well
as the delalloc bytes counter to decide if we need to lookup and wait on
any ordered extents.  If we have more DIO writes than delalloc bytes
we'll go ahead and wait on any ordered extents regardless of our flush
state as flushing delalloc is likely to not gain us anything.

Signed-off-by: Josef Bacik 
[ use dio instead of odirect in identifiers ]
Signed-off-by: David Sterba

btrfs: Remove redundant inode argument from btrfs_add_ordered_sum

2019-04-29T17:02:40+00:00

Ordered csums are keyed off of a btrfs_ordered_extent, which already has
a reference to the inode. This implies that an explicit inode argument
is redundant. So remove it.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba