diff options
| author | Johannes Thumshirn <johannes.thumshirn@wdc.com> | 2026-02-10 12:04:23 +0100 |
|---|---|---|
| committer | David Sterba <dsterba@suse.com> | 2026-04-07 18:55:55 +0200 |
| commit | e2a7fd22378f6500bcf979edc71e6837271eacfd (patch) | |
| tree | 47b46ff6eab5e98823d9e400bd0b7c86ae433907 /fs | |
| parent | 258e46a6385c57a3caef3fb1dc888e2efcfe5b18 (diff) | |
btrfs: zoned: add zone reclaim flush state for DATA space_info
On zoned block devices, DATA block groups can accumulate large amounts
of zone_unusable space (space between the write pointer and zone end).
When zone_unusable reaches high levels (e.g., 98% of total space), new
allocations fail with ENOSPC even though space could be reclaimed by
relocating data and resetting zones.
The existing flush states don't handle this scenario effectively - they
either try to free cached space (which doesn't exist for zone_unusable)
or reset empty zones (which doesn't help when zones contain valid data
mixed with zone_unusable space).
Add a new RECLAIM_ZONES flush state that triggers the block group
reclaim machinery. This state:
- Calls btrfs_reclaim_sweep() to identify reclaimable block groups
- Calls btrfs_reclaim_bgs() to queue reclaim work
- Waits for reclaim_bgs_work to complete via flush_work()
- Commits the transaction to finalize changes
The reclaim work (btrfs_reclaim_bgs_work) safely relocates valid data
from fragmented block groups to other locations before resetting zones,
converting zone_unusable space back into usable space.
Insert RECLAIM_ZONES before RESET_ZONES in data_flush_states so that
we attempt to reclaim partially-used block groups before falling back
to resetting completely empty ones.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs')
| -rw-r--r-- | fs/btrfs/space-info.c | 22 | ||||
| -rw-r--r-- | fs/btrfs/space-info.h | 1 |
2 files changed, 23 insertions, 0 deletions
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 227178f8b589..e46c0d6ae862 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -129,6 +129,15 @@ * churn a lot and we can avoid making some extent tree modifications if we * are able to delay for as long as possible. * + * RECLAIM_ZONES + * This state only works for the zoned mode. In zoned mode, we cannot reuse + * regions that have once been allocated and then been freed until we reset + * the zone, due to the sequential write requirement. The RECLAIM_ZONES state + * calls the reclaim machinery, evacuating the still valid data in these + * block-groups and relocates it to the data_reloc_bg. Afterwards these + * block-groups get deleted and the transaction is committed. This frees up + * space to use for new allocations. + * * RESET_ZONES * This state works only for the zoned mode. On the zoned mode, we cannot * reuse once allocated then freed region until we reset the zone, due to @@ -905,6 +914,18 @@ static void flush_space(struct btrfs_space_info *space_info, u64 num_bytes, if (ret > 0 || ret == -ENOSPC) ret = 0; break; + case RECLAIM_ZONES: + if (btrfs_is_zoned(fs_info)) { + btrfs_reclaim_sweep(fs_info); + btrfs_delete_unused_bgs(fs_info); + btrfs_reclaim_bgs(fs_info); + flush_work(&fs_info->reclaim_bgs_work); + ASSERT(current->journal_info == NULL); + ret = btrfs_commit_current_transaction(root); + } else { + ret = 0; + } + break; case RUN_DELAYED_IPUTS: /* * If we have pending delayed iputs then we could free up a @@ -1403,6 +1424,7 @@ static const enum btrfs_flush_state data_flush_states[] = { FLUSH_DELALLOC_FULL, RUN_DELAYED_IPUTS, COMMIT_TRANS, + RECLAIM_ZONES, RESET_ZONES, ALLOC_CHUNK_FORCE, }; diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 6f96cf48d7da..174b1ecf63be 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -113,6 +113,7 @@ enum btrfs_flush_state { RUN_DELAYED_IPUTS = 10, COMMIT_TRANS = 11, RESET_ZONES = 12, + RECLAIM_ZONES = 13, }; enum btrfs_space_info_sub_group { |
