btrfs: zoned: add zone reclaim flush state for DATA space_info

On zoned block devices, DATA block groups can accumulate large amounts of zone_unusable space (space between the write pointer and zone end). When zone_unusable reaches high levels (e.g., 98% of total space), new allocations fail with ENOSPC even though space could be reclaimed by relocating data and resetting zones. The existing flush states don't handle this scenario effectively - they either try to free cached space (which doesn't exist for zone_unusable) or reset empty zones (which doesn't help when zones contain valid data mixed with zone_unusable space). Add a new RECLAIM_ZONES flush state that triggers the block group reclaim machinery. This state: - Calls btrfs_reclaim_sweep() to identify reclaimable block groups - Calls btrfs_reclaim_bgs() to queue reclaim work - Waits for reclaim_bgs_work to complete via flush_work() - Commits the transaction to finalize changes The reclaim work (btrfs_reclaim_bgs_work) safely relocates valid data from fragmented block groups to other locations before resetting zones, converting zone_unusable space back into usable space. Insert RECLAIM_ZONES before RESET_ZONES in data_flush_states so that we attempt to reclaim partially-used block groups before falling back to resetting completely empty ones. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
author: Johannes Thumshirn <johannes.thumshirn@wdc.com> 2026-02-10 12:04:23 +0100
committer: David Sterba <dsterba@suse.com> 2026-04-07 18:55:55 +0200
commit: e2a7fd22378f6500bcf979edc71e6837271eacfd (patch)
tree: 47b46ff6eab5e98823d9e400bd0b7c86ae433907 /fs
parent: 258e46a6385c57a3caef3fb1dc888e2efcfe5b18 (diff)
2 files changed, 23 insertions, 0 deletions
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 227178f8b589..e46c0d6ae862 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -129,6 +129,15 @@
  *     churn a lot and we can avoid making some extent tree modifications if we
  *     are able to delay for as long as possible.
  *
+ *   RECLAIM_ZONES
+ *     This state only works for the zoned mode. In zoned mode, we cannot reuse
+ *     regions that have once been allocated and then been freed until we reset
+ *     the zone, due to the sequential write requirement. The RECLAIM_ZONES state
+ *     calls the reclaim machinery, evacuating the still valid data in these
+ *     block-groups and relocates it to the data_reloc_bg. Afterwards these
+ *     block-groups get deleted and the transaction is committed. This frees up
+ *     space to use for new allocations.
+ *
  *   RESET_ZONES
  *     This state works only for the zoned mode. On the zoned mode, we cannot
  *     reuse once allocated then freed region until we reset the zone, due to
@@ -905,6 +914,18 @@ static void flush_space(struct btrfs_space_info *space_info, u64 num_bytes,
 		if (ret > 0 || ret == -ENOSPC)
 			ret = 0;
 		break;
+	case RECLAIM_ZONES:
+		if (btrfs_is_zoned(fs_info)) {
+			btrfs_reclaim_sweep(fs_info);
+			btrfs_delete_unused_bgs(fs_info);
+			btrfs_reclaim_bgs(fs_info);
+			flush_work(&fs_info->reclaim_bgs_work);
+			ASSERT(current->journal_info == NULL);
+			ret = btrfs_commit_current_transaction(root);
+		} else {
+			ret = 0;
+		}
+		break;
 	case RUN_DELAYED_IPUTS:
 		/*
 		 * If we have pending delayed iputs then we could free up a
@@ -1403,6 +1424,7 @@ static const enum btrfs_flush_state data_flush_states[] = {
 	FLUSH_DELALLOC_FULL,
 	RUN_DELAYED_IPUTS,
 	COMMIT_TRANS,
+	RECLAIM_ZONES,
 	RESET_ZONES,
 	ALLOC_CHUNK_FORCE,
 };
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 6f96cf48d7da..174b1ecf63be 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -113,6 +113,7 @@ enum btrfs_flush_state {
 	RUN_DELAYED_IPUTS	= 10,
 	COMMIT_TRANS		= 11,
 	RESET_ZONES		= 12,
+	RECLAIM_ZONES		= 13,
 };
 
 enum btrfs_space_info_sub_group {
author	Johannes Thumshirn <johannes.thumshirn@wdc.com>	2026-02-10 12:04:23 +0100
committer	David Sterba <dsterba@suse.com>	2026-04-07 18:55:55 +0200
commit	e2a7fd22378f6500bcf979edc71e6837271eacfd (patch)
tree	47b46ff6eab5e98823d9e400bd0b7c86ae433907 /fs
parent	258e46a6385c57a3caef3fb1dc888e2efcfe5b18 (diff)