<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/xfs/Makefile, branch master</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>xfs: add media verification ioctl</title>
<updated>2026-01-21T02:06:52+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2026-01-21T02:06:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8accfd65d31f25b9df15ec2419179b6fa0b21d5'/>
<id>b8accfd65d31f25b9df15ec2419179b6fa0b21d5</id>
<content type='text'>
Add a new privileged ioctl so that xfs_scrub can ask the kernel to
verify the media of the devices backing an xfs filesystem, and have any
resulting media errors reported to fsnotify and xfs_healer.

To accomplish this, the kernel allocates a folio between the base page
size and 1MB, and issues read IOs to a gradually incrementing range of
one of the storage devices underlying an xfs filesystem.  If any error
occurs, that raw error is reported to the calling process.  If the error
happens to be one of the ones that the kernel considers indicative of
data loss, then it will also be reported to xfs_healthmon and fsnotify.

Driving the verification from the kernel enables xfs (and by extension
xfs_scrub) to have precise control over the size and error handling of
IOs that are issued to the underlying block device, and to emit
notifications about problems to other relevant kernel subsystems
immediately.

Note that the caller is also allowed to reduce the size of the IO and
to ask for a relaxation period after each IO.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new privileged ioctl so that xfs_scrub can ask the kernel to
verify the media of the devices backing an xfs filesystem, and have any
resulting media errors reported to fsnotify and xfs_healer.

To accomplish this, the kernel allocates a folio between the base page
size and 1MB, and issues read IOs to a gradually incrementing range of
one of the storage devices underlying an xfs filesystem.  If any error
occurs, that raw error is reported to the calling process.  If the error
happens to be one of the ones that the kernel considers indicative of
data loss, then it will also be reported to xfs_healthmon and fsnotify.

Driving the verification from the kernel enables xfs (and by extension
xfs_scrub) to have precise control over the size and error handling of
IOs that are issued to the underlying block device, and to emit
notifications about problems to other relevant kernel subsystems
immediately.

Note that the caller is also allowed to reduce the size of the IO and
to ask for a relaxation period after each IO.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: start creating infrastructure for health monitoring</title>
<updated>2026-01-21T02:06:45+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2026-01-21T02:06:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a48373e7d35a89f6f9b39f0d0da9bf158af054ee'/>
<id>a48373e7d35a89f6f9b39f0d0da9bf158af054ee</id>
<content type='text'>
Start creating helper functions and infrastructure to pass filesystem
health events to a health monitoring file.  Since this is an
administrative interface, we only support a single health monitor
process per filesystem, so we don't need to use anything fancy such as
notifier chains (== tons of indirect calls).

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Start creating helper functions and infrastructure to pass filesystem
health events to a health monitoring file.  Since this is an
administrative interface, we only support a single health monitor
process per filesystem, so we don't need to use anything fancy such as
notifier chains (== tons of indirect calls).

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: export zone stats in /proc/*/mountstats</title>
<updated>2025-03-03T15:17:10+00:00</updated>
<author>
<name>Hans Holmberg</name>
<email>hans.holmberg@wdc.com</email>
</author>
<published>2024-11-30T04:53:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5443041b9c6308db91f58eaf401d8f7f81874b74'/>
<id>5443041b9c6308db91f58eaf401d8f7f81874b74</id>
<content type='text'>
Add the per-zone life time hint and the used block distribution
for fully written zones, grouping reclaimable zones in fixed-percentage
buckets spanning 0..9%, 10..19% and full zones as 100% used as well as a
few statistics about the zone allocator and open and reclaimable zones
in /proc/*/mountstats.

This gives good insight into data fragmentation and data placement
success rate.

Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Co-developed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the per-zone life time hint and the used block distribution
for fully written zones, grouping reclaimable zones in fixed-percentage
buckets spanning 0..9%, 10..19% and full zones as 100% used as well as a
few statistics about the zone allocator and open and reclaimable zones
in /proc/*/mountstats.

This gives good insight into data fragmentation and data placement
success rate.

Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Co-developed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: implement zoned garbage collection</title>
<updated>2025-03-03T15:17:07+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-02-15T18:37:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=080d01c41d44f0993f2c235a6bfdb681f0a66be6'/>
<id>080d01c41d44f0993f2c235a6bfdb681f0a66be6</id>
<content type='text'>
RT groups on a zoned file system need to be completely empty before their
space can be reused.  This means that partially empty groups need to be
emptied entirely to free up space if no entirely free groups are
available.

Add a garbage collection thread that moves all data out of the least used
zone when not enough free zones are available, and which resets all zones
that have been emptied.  To find empty zone a simple set of 10 buckets
based on the amount of space used in the zone is used.  To empty zones,
the rmap is walked to find the owners and the data is read and then
written to the new place.

To automatically defragment files the rmap records are sorted by inode
and logical offset.  This means defragmentation of parallel writes into
a single zone happens automatically when performing garbage collection.
Because holding the iolock over the entire GC cycle would inject very
noticeable latency for other accesses to the inodes, the iolock is not
taken while performing I/O.  Instead the I/O completion handler checks
that the mapping hasn't changed over the one recorded at the start of
the GC cycle and doesn't update the mapping if it change.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
RT groups on a zoned file system need to be completely empty before their
space can be reused.  This means that partially empty groups need to be
emptied entirely to free up space if no entirely free groups are
available.

Add a garbage collection thread that moves all data out of the least used
zone when not enough free zones are available, and which resets all zones
that have been emptied.  To find empty zone a simple set of 10 buckets
based on the amount of space used in the zone is used.  To empty zones,
the rmap is walked to find the owners and the data is read and then
written to the new place.

To automatically defragment files the rmap records are sorted by inode
and logical offset.  This means defragmentation of parallel writes into
a single zone happens automatically when performing garbage collection.
Because holding the iolock over the entire GC cycle would inject very
noticeable latency for other accesses to the inodes, the iolock is not
taken while performing I/O.  Instead the I/O completion handler checks
that the mapping hasn't changed over the one recorded at the start of
the GC cycle and doesn't update the mapping if it change.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: add support for zoned space reservations</title>
<updated>2025-03-03T15:17:07+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-02-13T08:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0bb2193056b5969e4148fc0909e89a5362da873e'/>
<id>0bb2193056b5969e4148fc0909e89a5362da873e</id>
<content type='text'>
For zoned file systems garbage collection (GC) has to take the iolock
and mmaplock after moving data to a new place to synchronize with
readers.  This means waiting for garbage collection with the iolock can
deadlock.

To avoid this, the worst case required blocks have to be reserved before
taking the iolock, which is done using a new RTAVAILABLE counter that
tracks blocks that are free to write into and don't require garbage
collection.  The new helpers try to take these available blocks, and
if there aren't enough available it wakes and waits for GC.  This is
done using a list of on-stack reservations to ensure fairness.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For zoned file systems garbage collection (GC) has to take the iolock
and mmaplock after moving data to a new place to synchronize with
readers.  This means waiting for garbage collection with the iolock can
deadlock.

To avoid this, the worst case required blocks have to be reserved before
taking the iolock, which is done using a new RTAVAILABLE counter that
tracks blocks that are free to write into and don't require garbage
collection.  The new helpers try to take these available blocks, and
if there aren't enough available it wakes and waits for GC.  This is
done using a list of on-stack reservations to ensure fairness.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: add the zoned space allocator</title>
<updated>2025-03-03T15:16:56+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-02-13T04:49:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4e4d52075577707f8393e3fc74c1ef79ca1d3ce6'/>
<id>4e4d52075577707f8393e3fc74c1ef79ca1d3ce6</id>
<content type='text'>
For zoned RT devices space is always allocated at the write pointer, that
is right after the last written block and only recorded on I/O completion.

Because the actual allocation algorithm is very simple and just involves
picking a good zone - preferably the one used for the last write to the
inode.  As the number of zones that can written at the same time is
usually limited by the hardware, selecting a zone is done as late as
possible from the iomap dio and buffered writeback bio submissions
helpers just before submitting the bio.

Given that the writers already took a reservation before acquiring the
iolock, space will always be readily available if an open zone slot is
available.  A new structure is used to track these open zones, and
pointed to by the xfs_rtgroup.  Because zoned file systems don't have
a rsum cache the space for that pointer can be reused.

Allocations are only recorded at I/O completion time.  The scheme used
for that is very similar to the reflink COW end I/O path.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For zoned RT devices space is always allocated at the write pointer, that
is right after the last written block and only recorded on I/O completion.

Because the actual allocation algorithm is very simple and just involves
picking a good zone - preferably the one used for the last write to the
inode.  As the number of zones that can written at the same time is
usually limited by the hardware, selecting a zone is done as late as
possible from the iomap dio and buffered writeback bio submissions
helpers just before submitting the bio.

Given that the writers already took a reservation before acquiring the
iolock, space will always be readily available if an open zone slot is
available.  A new structure is used to track these open zones, and
pointed to by the xfs_rtgroup.  Because zoned file systems don't have
a rsum cache the space for that pointer can be reused.

Allocations are only recorded at I/O completion time.  The scheme used
for that is very similar to the reflink COW end I/O path.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: parse and validate hardware zone information</title>
<updated>2025-03-03T15:16:46+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2024-11-17T05:28:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=720c2d58348329ce57bfa7ecef93ee0c9bf4b405'/>
<id>720c2d58348329ce57bfa7ecef93ee0c9bf4b405</id>
<content type='text'>
Add support to validate and parse reported hardware zone state.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support to validate and parse reported hardware zone state.

Co-developed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: online repair of the realtime refcount btree</title>
<updated>2024-12-23T21:06:16+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2024-11-21T00:21:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=83ccffc489975db625d7f2600d39faa8b0ff69d6'/>
<id>83ccffc489975db625d7f2600d39faa8b0ff69d6</id>
<content type='text'>
Port the data device's refcount btree repair code to the realtime
refcount btree.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Port the data device's refcount btree repair code to the realtime
refcount btree.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: scrub the realtime refcount btree</title>
<updated>2024-12-23T21:06:14+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2024-11-21T00:21:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c27929670de144ec76a0dab2f3a168cb4897b314'/>
<id>c27929670de144ec76a0dab2f3a168cb4897b314</id>
<content type='text'>
Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfs: introduce realtime refcount btree ondisk definitions</title>
<updated>2024-12-23T21:06:10+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2024-11-21T00:20:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9abe03a0e4f978615a2b1b484b8d09ca84c16ea0'/>
<id>9abe03a0e4f978615a2b1b484b8d09ca84c16ea0</id>
<content type='text'>
Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation, though the changes to actually root the rtrefcount btree
in an inode come later.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation, though the changes to actually root the rtrefcount btree
in an inode come later.

Signed-off-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
