diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-22 18:44:48 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-22 18:44:48 -0700 |
| commit | 502d801f0ab03e4f32f9a33d203154ce84887921 (patch) | |
| tree | 8dd98de794f62fae7a0a5117ed232c6edc478fe2 | |
| parent | 4708cac0e22cfd217f48f7cec3c35e5922efcccd (diff) | |
| parent | 803d09a554055aba160a62abd1e4b1260b899dc1 (diff) | |
Merge tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most notable change is the removal of the fscache backend: it has
been deprecated for almost two years, mainly because EROFS file-backed
mounts and fanotify pre-content hooks (together with erofs-utils) now
provide better functionality and simpler codebase. In addition,
fscache has depended on netfslib for years, which is undesirable for
EROFS since it is a local filesystem. More details in [1].
In addition, sparse support has been added to the pcluster layout,
which is helpful for large sparse AI datasets, and map requests for
chunk-based inodes have been optimized to be more efficient as well.
There are also the usual fixes and cleanups.
Summary:
- Report more consecutive chunks of the same type for
each iomap request
- Add sparse support for the pcluster layout
- Update the EROFS documentation overview
- Remove the deprecated fscache backend
- Various fixes and cleanups"
Link: https://lore.kernel.org/r/20260622013622.934174-1-hsiangkao@linux.alibaba.com [1]
* tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: handle 48-bit blocks_hi for compressed inodes
erofs: remove fscache backend entirely
erofs: simplify RCU read critical sections
erofs: add sparse support to pcluster layout
erofs: add folio order to trace_erofs_read_folio
erofs: introduce erofs_map_chunks()
erofs: call erofs_exit_ishare() before rcu_barrier()
erofs: update the overview of the documentation
erofs: clean up erofs_ishare_fill_inode()
| -rw-r--r-- | Documentation/filesystems/erofs.rst | 138 | ||||
| -rw-r--r-- | fs/erofs/Kconfig | 21 | ||||
| -rw-r--r-- | fs/erofs/Makefile | 1 | ||||
| -rw-r--r-- | fs/erofs/data.c | 135 | ||||
| -rw-r--r-- | fs/erofs/erofs_fs.h | 2 | ||||
| -rw-r--r-- | fs/erofs/fscache.c | 664 | ||||
| -rw-r--r-- | fs/erofs/inode.c | 7 | ||||
| -rw-r--r-- | fs/erofs/internal.h | 72 | ||||
| -rw-r--r-- | fs/erofs/ishare.c | 47 | ||||
| -rw-r--r-- | fs/erofs/super.c | 98 | ||||
| -rw-r--r-- | fs/erofs/zdata.c | 38 | ||||
| -rw-r--r-- | fs/erofs/zmap.c | 33 | ||||
| -rw-r--r-- | include/trace/events/erofs.h | 9 |
13 files changed, 231 insertions, 1034 deletions
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst index fe06308e546c..4230884fb359 100644 --- a/Documentation/filesystems/erofs.rst +++ b/Documentation/filesystems/erofs.rst @@ -7,83 +7,90 @@ EROFS - Enhanced Read-Only File System Overview ======== -EROFS filesystem stands for Enhanced Read-Only File System. It aims to form a -generic read-only filesystem solution for various read-only use cases instead -of just focusing on storage space saving without considering any side effects -of runtime performance. - -It is designed to meet the needs of flexibility, feature extendability and user -payload friendly, etc. Apart from those, it is still kept as a simple -random-access friendly high-performance filesystem to get rid of unneeded I/O -amplification and memory-resident overhead compared to similar approaches. - -It is implemented to be a better choice for the following scenarios: - - - read-only storage media or - - - part of a fully trusted read-only solution, which means it needs to be +EROFS (Enhanced Read-Only File System) is a modern, efficient, and secure +read-only kernel filesystem designed for various use cases including immutable +system images, container images, application sandbox images, and dataset +distribution. + +An immutable image filesystem can be regarded as an enhanced archive format +which allows golden images to be built once and mounted everywhere -- images are +bit-for-bit identical across all deployments and can be verified, audited, or +shared without concerns about runtime modifications (in this model, all user +writes should be redirected into another trusted filesystem, for example, via +overlayfs for copy-on-write-style redirection, by design). + +EROFS is a dedicated implementation of the image filesystem idea above, with a +flexible, hierarchical on-disk design so that needed features can be enabled on +demand. Filesystem data in the core format is strictly block-aligned in order +to perform optimally on all kinds of storage media, including block devices and +memory-backed devices. The on-disk format is easy to parse and purposely avoids +the unnecessary metadata redundancy found in generic writable filesystems, which +can suffer from extra inconsistency issues -- making it ideal for security +auditing and untrusted remote access. In addition, designs such as inline data, +inline/shared extended attributes, and optimized (de)compression provide better +space efficiency while maintaining high performance. + +In short, EROFS aims to be a better fit for the following scenarios: + + - As part of a secure immutable storage solution, where it needs to be immutable and bit-for-bit identical to the official golden image for - their releases due to security or other considerations and - - - hope to minimize extra storage space with guaranteed end-to-end performance - by using compact layout, transparent file compression and direct access, - especially for those embedded devices with limited memory and high-density - hosts with numerous containers. + each individual copy, in order to meet security, data sharing, and/or + other requirements; -Here are the main features of EROFS: + - Minimizing storage overhead with guaranteed end-to-end performance + by using compact (meta)data layout, optimized transparent data compression, + deduplication and direct access, especially for those embedded devices with + limited memory and high-density hosts with numerous containers. - - Little endian on-disk design; +Here is the list of highlights: - - Block-based distribution and file-based distribution over fscache are - supported; + - Little endian on-disk design with 48-bit block addressing, supporting up + to 1 EiB filesystem capacity with 4 KiB block size; - - Support multiple devices to refer to external blobs, which can be used - for container images; + - Two compact inode metadata layouts for space and performance efficiency: - - 32-bit block addresses for each device, therefore 16TiB address space at - most with 4KiB block size for now; + ======================== ======== ====================================== + compact extended + ======================== ======== ====================================== + Inode core metadata size 32 bytes 64 bytes + Max file size 4 GiB 16 EiB (also limited by max. vol size) + Max uids/gids 65536 4294967296 + Nanosecond timestamps no yes + Max hardlinks 65536 4294967296 + ======================== ======== ====================================== - - Two inode layouts for different requirements: + - Support tailpacking inline data for better space efficiency and reduce + unneeded I/O amplification; - ===================== ============ ====================================== - compact (v1) extended (v2) - ===================== ============ ====================================== - Inode metadata size 32 bytes 64 bytes - Max file size 4 GiB 16 EiB (also limited by max. vol size) - Max uids/gids 65536 4294967296 - Per-inode timestamp no yes (64 + 32-bit timestamp) - Max hardlinks 65536 4294967296 - Metadata reserved 8 bytes 18 bytes - ===================== ============ ====================================== + - Block-based and file-backed distribution are both supported; - - Support extended attributes as an option; + - Multiple devices to reference external data blobs: inode data can be + optionally placed into external blobs, which enables image layering and data + sharing among different filesystems; - - Support a bloom filter that speeds up negative extended attribute lookups; + - Inline and shared extended attributes with an optional bloom filter that + speeds up negative extended attribute lookups; - - Support POSIX.1e ACLs by using extended attributes; + - POSIX.1e ACLs by using extended attributes; - - Support transparent data compression as an option: - LZ4, MicroLZMA, DEFLATE and Zstandard algorithms can be used on a per-file - basis; In addition, inplace decompression is also supported to avoid bounce - compressed buffers and unnecessary page cache thrashing. + - Transparent data compression as an option: Supported algorithms (LZ4, + MicroLZMA, DEFLATE and Zstandard) can be selected on a per-inode basis. + Both the on-disk metadata and decompression runtime have been heavily + optimized to minimize the overhead for better performance. - - Support chunk-based data deduplication and rolling-hash compressed data - deduplication; + - Merging tail-end data into a special inode as fragments; - - Support tailpacking inline compared to byte-addressed unaligned metadata - or smaller block size alternatives; + - Chunk-based deduplication and rolling-hash compressed data deduplication; - - Support merging tail-end data into a special inode as fragments. + - Direct I/O and FSDAX support on uncompressed inodes for use cases such as + secure containers, loop devices, and ramdisks that do not need page caching; - - Support large folios to make use of THPs (Transparent Hugepages); + - Page cache sharing among inodes with identical content fingerprints on + the same machine. - - Support direct I/O on uncompressed files to avoid double caching for loop - devices; +For more detailed information, please refer to our documentation site: - - Support FSDAX on uncompressed images for secure containers and ramdisks in - order to get rid of unnecessary page cache. - - - Support file-based on-demand loading with the Fscache infrastructure. +- https://erofs.docs.kernel.org The following git tree provides the file system user-space tools under development, such as a formatting tool (mkfs.erofs), an on-disk consistency & @@ -91,10 +98,6 @@ compatibility checking tool (fsck.erofs), and a debugging tool (dump.erofs): - git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -For more information, please also refer to the documentation site: - -- https://erofs.docs.kernel.org - Bugs and patches are welcome, please kindly help us and send to the following linux-erofs mailing list: @@ -127,12 +130,9 @@ dax A legacy option which is an alias for ``dax=always``. device=%s Specify a path to an extra device to be used together. directio (For file-backed mounts) Use direct I/O to access backing files, and asynchronous I/O will be enabled if supported. -fsid=%s Specify a filesystem image ID for Fscache back-end. -domain_id=%s Specify a trusted domain ID for fscache mode so that - different images with the same blobs, identified by blob IDs, - can share storage within the same trusted domain. - Also used for different filesystems with inode page sharing - enabled to share page cache within the trusted domain. +domain_id=%s Specify a trusted domain ID. Filesystems sharing the same + domain ID can share page cache across mounts when inode + page sharing is enabled. (not shown in mountinfo output) fsoffset=%llu Specify block-aligned filesystem offset for the primary device. inode_share Enable inode page sharing for this filesystem. Inodes with identical content within the same domain ID can share the diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index 97c48ebe8458..4789b1077d8c 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -3,13 +3,11 @@ config EROFS_FS tristate "EROFS filesystem support" depends on BLOCK - select CACHEFILES if EROFS_FS_ONDEMAND select CRC32 select CRYPTO if EROFS_FS_ZIP_ACCEL select CRYPTO_DEFLATE if EROFS_FS_ZIP_ACCEL select FS_IOMAP select LZ4_DECOMPRESS if EROFS_FS_ZIP - select NETFS_SUPPORT if EROFS_FS_ONDEMAND select XXHASH if EROFS_FS_XATTR select XZ_DEC if EROFS_FS_ZIP_LZMA select XZ_DEC_MICROLZMA if EROFS_FS_ZIP_LZMA @@ -109,9 +107,6 @@ config EROFS_FS_BACKED_BY_FILE be used to simplify error-prone lifetime management of unnecessary virtual block devices. - Note that this feature, along with ongoing fanotify pre-content - hooks, will eventually replace "EROFS over fscache." - If you don't want to enable this feature, say N. config EROFS_FS_ZIP @@ -172,20 +167,6 @@ config EROFS_FS_ZIP_ACCEL If unsure, say N. -config EROFS_FS_ONDEMAND - bool "EROFS fscache-based on-demand read support (deprecated)" - depends on EROFS_FS - select FSCACHE - select CACHEFILES_ONDEMAND - help - This permits EROFS to use fscache-backed data blobs with on-demand - read support. - - It is now deprecated and scheduled to be removed from the kernel - after fanotify pre-content hooks are landed. - - If unsure, say N. - config EROFS_FS_PCPU_KTHREAD bool "EROFS per-cpu decompression kthread workers" depends on EROFS_FS_ZIP @@ -207,7 +188,7 @@ config EROFS_FS_PCPU_KTHREAD_HIPRI config EROFS_FS_PAGE_CACHE_SHARE bool "EROFS page cache share support (experimental)" - depends on EROFS_FS && EROFS_FS_XATTR && !EROFS_FS_ONDEMAND + depends on EROFS_FS && EROFS_FS_XATTR help This enables page cache sharing among inodes with identical content fingerprints on the same machine. diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile index a80e1762b607..30423496786f 100644 --- a/fs/erofs/Makefile +++ b/fs/erofs/Makefile @@ -9,5 +9,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) += decompressor_deflate.o erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) += decompressor_zstd.o erofs-$(CONFIG_EROFS_FS_ZIP_ACCEL) += decompressor_crypto.o erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o -erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) += ishare.o diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 44da21c9d777..9aa48c8d67d1 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -80,9 +80,7 @@ int erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb, if (erofs_is_fileio_mode(sbi)) { buf->file = sbi->dif0.file; /* some fs like FUSE needs it */ buf->mapping = buf->file->f_mapping; - } else if (erofs_is_fscache_mode(sb)) - buf->mapping = sbi->dif0.fscache->inode->i_mapping; - else + } else buf->mapping = sb->s_bdev->bd_mapping; return 0; } @@ -98,17 +96,73 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, return erofs_bread(buf, offset, true); } -int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) +static int erofs_map_chunks(struct inode *inode, struct erofs_map_blocks *map) { struct erofs_buf buf = __EROFS_BUF_INITIALIZER; struct super_block *sb = inode->i_sb; - unsigned int unit, blksz = sb->s_blocksize; struct erofs_inode *vi = EROFS_I(inode); struct erofs_inode_chunk_index *idx; - erofs_blk_t startblk, addrmask; - bool tailpacking; + unsigned int unit = vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES ? + sizeof(*idx) : EROFS_BLOCK_MAP_ENTRY_SIZE; + erofs_blk_t addrmask = (vi->chunkformat & EROFS_CHUNK_FORMAT_48BIT) ? + BIT_ULL(48) - 1 : BIT_ULL(32) - 1; + u64 nr = map->m_la >> vi->chunkbits, chunksize = 1ULL << vi->chunkbits; + erofs_off_t pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + + vi->xattr_isize, unit) + unit * nr; + /* m_llen will be clamped to EOF in the end */ + erofs_off_t endpos = round_up(pos + 1, sb->s_blocksize); + u64 last, addr; + + idx = erofs_read_metabuf(&buf, sb, pos, erofs_inode_in_metabox(inode)); + if (IS_ERR(idx)) + return PTR_ERR(idx); + + map->m_la = nr << vi->chunkbits; + map->m_llen = 0; + nr = 0; + do { + if (unit == EROFS_BLOCK_MAP_ENTRY_SIZE) { + addr = le32_to_cpu(((__le32 *)idx)[nr]); + if (addr == (u32)EROFS_NULL_ADDR) + addr = EROFS_NULL_ADDR; + } else { + addr = (((u64)le16_to_cpu(idx[nr].startblk_hi) << 32) | + le32_to_cpu(idx[nr].startblk_lo)) & addrmask; + if (addr ^ (EROFS_NULL_ADDR & addrmask)) + addr |= (u64)(le16_to_cpu(idx[nr].device_id) & + EROFS_SB(sb)->device_id_mask) << 48; + else + addr = EROFS_NULL_ADDR; + } + if (!nr) { + last = addr; + continue; + } + /* expand and account the prior chunk here */ + map->m_llen += chunksize; + if (last != EROFS_NULL_ADDR) + last += erofs_blknr(sb, chunksize); + } while (addr == last && pos + (++nr) * unit < endpos); + + if (last != EROFS_NULL_ADDR) { + map->m_pa = erofs_pos(sb, last & addrmask) - map->m_llen; + map->m_deviceid = last >> 48; + map->m_flags = EROFS_MAP_MAPPED; + } + if (addr == last) + map->m_llen += chunksize; + map->m_llen = min_t(erofs_off_t, map->m_llen, + round_up(inode->i_size - map->m_la, sb->s_blocksize)); + erofs_put_metabuf(&buf); + return 0; +} + +int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) +{ + struct super_block *sb = inode->i_sb; + struct erofs_inode *vi = EROFS_I(inode); + bool tailinline = (vi->datalayout == EROFS_INODE_FLAT_INLINE); erofs_off_t pos; - u64 chunknr; int err = 0; trace_erofs_map_blocks_enter(inode, map, 0); @@ -116,13 +170,10 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) map->m_flags = 0; if (map->m_la >= inode->i_size) goto out; - - if (vi->datalayout != EROFS_INODE_CHUNK_BASED) { - tailpacking = (vi->datalayout == EROFS_INODE_FLAT_INLINE); - if (!tailpacking && vi->startblk == EROFS_NULL_ADDR) - goto out; - pos = erofs_pos(sb, erofs_iblks(inode) - tailpacking); - + if (vi->datalayout == EROFS_INODE_CHUNK_BASED) { + err = erofs_map_chunks(inode, map); + } else if (tailinline || vi->startblk != EROFS_NULL_ADDR) { + pos = erofs_pos(sb, erofs_iblks(inode) - tailinline); map->m_flags = EROFS_MAP_MAPPED; if (map->m_la < pos) { map->m_pa = erofs_pos(sb, vi->startblk) + map->m_la; @@ -132,57 +183,15 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) vi->xattr_isize + erofs_blkoff(sb, map->m_la); map->m_llen = inode->i_size - map->m_la; map->m_flags |= EROFS_MAP_META; - } - goto out; - } - - if (vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES) - unit = sizeof(*idx); /* chunk index */ - else - unit = EROFS_BLOCK_MAP_ENTRY_SIZE; /* block map */ - - chunknr = map->m_la >> vi->chunkbits; - pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + - vi->xattr_isize, unit) + unit * chunknr; - - idx = erofs_read_metabuf(&buf, sb, pos, erofs_inode_in_metabox(inode)); - if (IS_ERR(idx)) { - err = PTR_ERR(idx); - goto out; - } - map->m_la = chunknr << vi->chunkbits; - map->m_llen = min_t(erofs_off_t, 1UL << vi->chunkbits, - round_up(inode->i_size - map->m_la, blksz)); - if (vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES) { - addrmask = (vi->chunkformat & EROFS_CHUNK_FORMAT_48BIT) ? - BIT_ULL(48) - 1 : BIT_ULL(32) - 1; - startblk = (((u64)le16_to_cpu(idx->startblk_hi) << 32) | - le32_to_cpu(idx->startblk_lo)) & addrmask; - if ((startblk ^ EROFS_NULL_ADDR) & addrmask) { - map->m_deviceid = le16_to_cpu(idx->device_id) & - EROFS_SB(sb)->device_id_mask; - map->m_pa = erofs_pos(sb, startblk); - map->m_flags = EROFS_MAP_MAPPED; - } - } else { - startblk = le32_to_cpu(*(__le32 *)idx); - if (startblk != (u32)EROFS_NULL_ADDR) { - map->m_pa = erofs_pos(sb, startblk); - map->m_flags = EROFS_MAP_MAPPED; + if (erofs_blkoff(sb, map->m_pa) + map->m_llen > + sb->s_blocksize) { + erofs_err(sb, "inline data across blocks @ nid %llu", vi->nid); + return -EFSCORRUPTED; + } } } - erofs_put_metabuf(&buf); out: - if (!err) { - map->m_plen = map->m_llen; - /* inline data should be located in the same meta block */ - if ((map->m_flags & EROFS_MAP_META) && - erofs_blkoff(sb, map->m_pa) + map->m_plen > blksz) { - erofs_err(sb, "inline data across blocks @ nid %llu", vi->nid); - DBG_BUGON(1); - return -EFSCORRUPTED; - } - } + map->m_plen = err ? 0 : map->m_llen; trace_erofs_map_blocks_exit(inode, map, 0, err); return err; } diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index 7871b16c1d33..16ec4fd33ac6 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -396,6 +396,8 @@ enum { /* (noncompact only, HEAD) This pcluster refers to partial decompressed data */ #define Z_EROFS_LI_PARTIAL_REF (1 << 15) +/* (noncompact only, HEAD) This pcluster can also be regarded as a HOLE */ +#define Z_EROFS_LI_HOLE (1 << 14) /* Set on 1st non-head lcluster to store compressed block counti (in blocks) */ #define Z_EROFS_LI_D0_CBLKCNT (1 << 11) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c deleted file mode 100644 index 685c68774379..000000000000 --- a/fs/erofs/fscache.c +++ /dev/null @@ -1,664 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * Copyright (C) 2022, Alibaba Cloud - * Copyright (C) 2022, Bytedance Inc. All rights reserved. - */ -#include <linux/fscache.h> -#include "internal.h" - -static DEFINE_MUTEX(erofs_domain_list_lock); -static DEFINE_MUTEX(erofs_domain_cookies_lock); -static LIST_HEAD(erofs_domain_list); -static LIST_HEAD(erofs_domain_cookies_list); -static struct vfsmount *erofs_pseudo_mnt; - -struct erofs_fscache_io { - struct netfs_cache_resources cres; - struct iov_iter iter; - netfs_io_terminated_t end_io; - void *private; - refcount_t ref; -}; - -struct erofs_fscache_rq { - struct address_space *mapping; /* The mapping being accessed */ - loff_t start; /* Start position */ - size_t len; /* Length of the request */ - size_t submitted; /* Length of submitted */ - short error; /* 0 or error that occurred */ - refcount_t ref; -}; - -static bool erofs_fscache_io_put(struct erofs_fscache_io *io) -{ - if (!refcount_dec_and_test(&io->ref)) - return false; - if (io->cres.ops) - io->cres.ops->end_operation(&io->cres); - kfree(io); - return true; -} - -static void erofs_fscache_req_complete(struct erofs_fscache_rq *req) -{ - struct folio *folio; - bool failed = req->error; - pgoff_t start_page = req->start / PAGE_SIZE; - pgoff_t last_page = ((req->start + req->len) / PAGE_SIZE) - 1; - - XA_STATE(xas, &req->mapping->i_pages, start_page); - - rcu_read_lock(); - xas_for_each(&xas, folio, last_page) { - if (xas_retry(&xas, folio)) - continue; - if (!failed) - folio_mark_uptodate(folio); - folio_unlock(folio); - } - rcu_read_unlock(); -} - -static void erofs_fscache_req_put(struct erofs_fscache_rq *req) -{ - if (!refcount_dec_and_test(&req->ref)) - return; - erofs_fscache_req_complete(req); - kfree(req); -} - -static struct erofs_fscache_rq *erofs_fscache_req_alloc(struct address_space *mapping, - loff_t start, size_t len) -{ - struct erofs_fscache_rq *req = kzalloc_obj(*req); - - if (!req) - return NULL; - req->mapping = mapping; - req->start = start; - req->len = len; - refcount_set(&req->ref, 1); - return req; -} - -static void erofs_fscache_req_io_put(struct erofs_fscache_io *io) -{ - struct erofs_fscache_rq *req = io->private; - - if (erofs_fscache_io_put(io)) - erofs_fscache_req_put(req); -} - -static void erofs_fscache_req_end_io(void *priv, ssize_t transferred_or_error) -{ - struct erofs_fscache_io *io = priv; - struct erofs_fscache_rq *req = io->private; - - if (IS_ERR_VALUE(transferred_or_error)) - req->error = transferred_or_error; - erofs_fscache_req_io_put(io); -} - -static struct erofs_fscache_io *erofs_fscache_req_io_alloc(struct erofs_fscache_rq *req) -{ - struct erofs_fscache_io *io = kzalloc_obj(*io); - - if (!io) - return NULL; - io->end_io = erofs_fscache_req_end_io; - io->private = req; - refcount_inc(&req->ref); - refcount_set(&io->ref, 1); - return io; -} - -/* - * Read data from fscache described by cookie at pstart physical address - * offset, and fill the read data into buffer described by io->iter. - */ -static int erofs_fscache_read_io_async(struct fscache_cookie *cookie, - loff_t pstart, struct erofs_fscache_io *io) -{ - enum netfs_io_source source; - struct netfs_cache_resources *cres = &io->cres; - struct iov_iter *iter = &io->iter; - int ret; - - ret = fscache_begin_read_operation(cres, cookie); - if (ret) - return ret; - - while (iov_iter_count(iter)) { - size_t orig_count = iov_iter_count(iter), len = orig_count; - unsigned long flags = 1 << NETFS_SREQ_ONDEMAND; - - source = cres->ops->prepare_ondemand_read(cres, - pstart, &len, LLONG_MAX, &flags, 0); - if (WARN_ON(len == 0)) - source = NETFS_INVALID_READ; - if (source != NETFS_READ_FROM_CACHE) { - erofs_err(NULL, "prepare_ondemand_read failed (source %d)", source); - return -EIO; - } - - iov_iter_truncate(iter, len); - refcount_inc(&io->ref); - ret = fscache_read(cres, pstart, iter, NETFS_READ_HOLE_FAIL, - io->end_io, io); - if (ret == -EIOCBQUEUED) - ret = 0; - if (ret) { - erofs_err(NULL, "fscache_read failed (ret %d)", ret); - return ret; - } - if (WARN_ON(iov_iter_count(iter))) - return -EIO; - - iov_iter_reexpand(iter, orig_count - len); - pstart += len; - } - return 0; -} - -struct erofs_fscache_bio { - struct erofs_fscache_io io; - struct bio bio; /* w/o bdev to share bio_add_page/endio() */ - struct bio_vec bvecs[BIO_MAX_VECS]; -}; - -static void erofs_fscache_bio_endio(void *priv, ssize_t transferred_or_error) -{ - struct erofs_fscache_bio *io = priv; - - if (IS_ERR_VALUE(transferred_or_error)) - io->bio.bi_status = errno_to_blk_status(transferred_or_error); - bio_endio(&io->bio); - BUILD_BUG_ON(offsetof(struct erofs_fscache_bio, io) != 0); - erofs_fscache_io_put(&io->io); -} - -struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev) -{ - struct erofs_fscache_bio *io; - - io = kmalloc_obj(*io, GFP_KERNEL | __GFP_NOFAIL); - bio_init(&io->bio, NULL, io->bvecs, BIO_MAX_VECS, REQ_OP_READ); - io->io.private = mdev->m_dif->fscache->cookie; - io->io.end_io = erofs_fscache_bio_endio; - refcount_set(&io->io.ref, 1); - return &io->bio; -} - -void erofs_fscache_submit_bio(struct bio *bio) -{ - struct erofs_fscache_bio *io = container_of(bio, - struct erofs_fscache_bio, bio); - int ret; - - iov_iter_bvec(&io->io.iter, ITER_DEST, io->bvecs, bio->bi_vcnt, - bio->bi_iter.bi_size); - ret = erofs_fscache_read_io_async(io->io.private, - bio->bi_iter.bi_sector << 9, &io->io); - erofs_fscache_io_put(&io->io); - if (!ret) - return; - bio->bi_status = errno_to_blk_status(ret); - bio_endio(bio); -} - -static int erofs_fscache_meta_read_folio(struct file *data, struct folio *folio) -{ - struct erofs_fscache *ctx = folio->mapping->host->i_private; - int ret = -ENOMEM; - struct erofs_fscache_rq *req; - struct erofs_fscache_io *io; - - req = erofs_fscache_req_alloc(folio->mapping, - folio_pos(folio), folio_size(folio)); - if (!req) { - folio_unlock(folio); - return ret; - } - - io = erofs_fscache_req_io_alloc(req); - if (!io) { - req->error = ret; - goto out; - } - iov_iter_xarray(&io->iter, ITER_DEST, &folio->mapping->i_pages, - folio_pos(folio), folio_size(folio)); - - ret = erofs_fscache_read_io_async(ctx->cookie, folio_pos(folio), io); - if (ret) - req->error = ret; - - erofs_fscache_req_io_put(io); -out: - erofs_fscache_req_put(req); - return ret; -} - -static int erofs_fscache_data_read_slice(struct erofs_fscache_rq *req) -{ - struct address_space *mapping = req->mapping; - struct inode *inode = mapping->host; - struct super_block *sb = inode->i_sb; - struct erofs_fscache_io *io; - struct erofs_map_blocks map; - struct erofs_map_dev mdev; - loff_t pos = req->start + req->submitted; - size_t count; - int ret; - - map.m_la = pos; - ret = erofs_map_blocks(inode, &map); - if (ret) - return ret; - - if (map.m_flags & EROFS_MAP_META) { - struct erofs_buf buf = __EROFS_BUF_INITIALIZER; - struct iov_iter iter; - size_t size = map.m_llen; - void *src; - - src = erofs_read_metabuf(&buf, sb, map.m_pa, - erofs_inode_in_metabox(inode)); - if (IS_ERR(src)) - return PTR_ERR(src); - - iov_iter_xarray(&iter, ITER_DEST, &mapping->i_pages, pos, PAGE_SIZE); - if (copy_to_iter(src, size, &iter) != size) { - erofs_put_metabuf(&buf); - return -EFAULT; - } - iov_iter_zero(PAGE_SIZE - size, &iter); - erofs_put_metabuf(&buf); - req->submitted += PAGE_SIZE; - return 0; - } - - count = req->len - req->submitted; - if (!(map.m_flags & EROFS_MAP_MAPPED)) { - struct iov_iter iter; - - iov_iter_xarray(&iter, ITER_DEST, &mapping->i_pages, pos, count); - iov_iter_zero(count, &iter); - req->submitted += count; - return 0; - } - - count = min_t(size_t, map.m_llen - (pos - map.m_la), count); - DBG_BUGON(!count || count % PAGE_SIZE); - - mdev = (struct erofs_map_dev) { - .m_deviceid = map.m_deviceid, - .m_pa = map.m_pa, - }; - ret = erofs_map_dev(sb, &mdev); - if (ret) - return ret; - - io = erofs_fscache_req_io_alloc(req); - if (!io) - return -ENOMEM; - iov_iter_xarray(&io->iter, ITER_DEST, &mapping->i_pages, pos, count); - ret = erofs_fscache_read_io_async(mdev.m_dif->fscache->cookie, - mdev.m_pa + (pos - map.m_la), io); - erofs_fscache_req_io_put(io); - - req->submitted += count; - return ret; -} - -static int erofs_fscache_data_read(struct erofs_fscache_rq *req) -{ - int ret; - - do { - ret = erofs_fscache_data_read_slice(req); - if (ret) - req->error = ret; - } while (!ret && req->submitted < req->len); - return ret; -} - -static int erofs_fscache_read_folio(struct file *file, struct folio *folio) -{ - struct erofs_fscache_rq *req; - int ret; - - req = erofs_fscache_req_alloc(folio->mapping, - folio_pos(folio), folio_size(folio)); - if (!req) { - folio_unlock(folio); - return -ENOMEM; - } - - ret = erofs_fscache_data_read(req); - erofs_fscache_req_put(req); - return ret; -} - -static void erofs_fscache_readahead(struct readahead_control *rac) -{ - struct erofs_fscache_rq *req; - - if (!readahead_count(rac)) - return; - - req = erofs_fscache_req_alloc(rac->mapping, - readahead_pos(rac), readahead_length(rac)); - if (!req) - return; - - /* The request completion will drop refs on the folios. */ - while (readahead_folio(rac)) - ; - - erofs_fscache_data_read(req); - erofs_fscache_req_put(req); -} - -static const struct address_space_operations erofs_fscache_meta_aops = { - .read_folio = erofs_fscache_meta_read_folio, -}; - -const struct address_space_operations erofs_fscache_access_aops = { - .read_folio = erofs_fscache_read_folio, - .readahead = erofs_fscache_readahead, -}; - -static void erofs_fscache_domain_put(struct erofs_domain *domain) -{ - mutex_lock(&erofs_domain_list_lock); - if (refcount_dec_and_test(&domain->ref)) { - list_del(&domain->list); - if (list_empty(&erofs_domain_list)) { - kern_unmount(erofs_pseudo_mnt); - erofs_pseudo_mnt = NULL; - } - fscache_relinquish_volume(domain->volume, NULL, false); - mutex_unlock(&erofs_domain_list_lock); - kfree_sensitive(domain->domain_id); - kfree(domain); - return; - } - mutex_unlock(&erofs_domain_list_lock); -} - -static int erofs_fscache_register_volume(struct super_block *sb) -{ - struct erofs_sb_info *sbi = EROFS_SB(sb); - char *domain_id = sbi->domain_id; - struct fscache_volume *volume; - char *name; - int ret = 0; - - name = kasprintf(GFP_KERNEL, "erofs,%s", - domain_id ? domain_id : sbi->fsid); - if (!name) - return -ENOMEM; - - volume = fscache_acquire_volume(name, NULL, NULL, 0); - if (IS_ERR_OR_NULL(volume)) { - erofs_err(sb, "failed to register volume for %s", name); - ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP; - volume = NULL; - } - - sbi->volume = volume; - kfree(name); - return ret; -} - -static int erofs_fscache_init_domain(struct super_block *sb) -{ - int err; - struct erofs_domain *domain; - struct erofs_sb_info *sbi = EROFS_SB(sb); - - domain = kzalloc_obj(struct erofs_domain); - if (!domain) - return -ENOMEM; - - domain->domain_id = kstrdup(sbi->domain_id, GFP_KERNEL); - if (!domain->domain_id) { - kfree(domain); - return -ENOMEM; - } - - err = erofs_fscache_register_volume(sb); - if (err) - goto out; - - if (!erofs_pseudo_mnt) { - struct vfsmount *mnt = kern_mount(&erofs_anon_fs_type); - if (IS_ERR(mnt)) { - err = PTR_ERR(mnt); - goto out; - } - erofs_pseudo_mnt = mnt; - } - - domain->volume = sbi->volume; - refcount_set(&domain->ref, 1); - list_add(&domain->list, &erofs_domain_list); - sbi->domain = domain; - return 0; -out: - kfree_sensitive(domain->domain_id); - kfree(domain); - return err; -} - -static int erofs_fscache_register_domain(struct super_block *sb) -{ - int err; - struct erofs_domain *domain; - struct erofs_sb_info *sbi = EROFS_SB(sb); - - mutex_lock(&erofs_domain_list_lock); - list_for_each_entry(domain, &erofs_domain_list, list) { - if (!strcmp(domain->domain_id, sbi->domain_id)) { - sbi->domain = domain; - sbi->volume = domain->volume; - refcount_inc(&domain->ref); - mutex_unlock(&erofs_domain_list_lock); - return 0; - } - } - err = erofs_fscache_init_domain(sb); - mutex_unlock(&erofs_domain_list_lock); - return err; -} - -static struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, - char *name, unsigned int flags) -{ - struct fscache_volume *volume = EROFS_SB(sb)->volume; - struct erofs_fscache *ctx; - struct fscache_cookie *cookie; - struct super_block *isb; - struct inode *inode; - int ret; - - ctx = kzalloc_obj(*ctx); - if (!ctx) - return ERR_PTR(-ENOMEM); - INIT_LIST_HEAD(&ctx->node); - refcount_set(&ctx->ref, 1); - - cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE, - name, strlen(name), NULL, 0, 0); - if (!cookie) { - erofs_err(sb, "failed to get cookie for %s", name); - ret = -EINVAL; - goto err; - } - fscache_use_cookie(cookie, false); - - /* - * Allocate anonymous inode in global pseudo mount for shareable blobs, - * so that they are accessible among erofs fs instances. - */ - isb = flags & EROFS_REG_COOKIE_SHARE ? erofs_pseudo_mnt->mnt_sb : sb; - inode = new_inode(isb); - if (!inode) { - erofs_err(sb, "failed to get anon inode for %s", name); - ret = -ENOMEM; - goto err_cookie; - } - - inode->i_size = OFFSET_MAX; - inode->i_mapping->a_ops = &erofs_fscache_meta_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL); - inode->i_blkbits = EROFS_SB(sb)->blkszbits; - inode->i_private = ctx; - - ctx->cookie = cookie; - ctx->inode = inode; - return ctx; - -err_cookie: - fscache_unuse_cookie(cookie, NULL, NULL); - fscache_relinquish_cookie(cookie, false); -err: - kfree(ctx); - return ERR_PTR(ret); -} - -static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx) -{ - fscache_unuse_cookie(ctx->cookie, NULL, NULL); - fscache_relinquish_cookie(ctx->cookie, false); - iput(ctx->inode); - kfree(ctx->name); - kfree(ctx); -} - -static struct erofs_fscache *erofs_domain_init_cookie(struct super_block *sb, - char *name, unsigned int flags) -{ - struct erofs_fscache *ctx; - struct erofs_domain *domain = EROFS_SB(sb)->domain; - - ctx = erofs_fscache_acquire_cookie(sb, name, flags); - if (IS_ERR(ctx)) - return ctx; - - ctx->name = kstrdup(name, GFP_KERNEL); - if (!ctx->name) { - erofs_fscache_relinquish_cookie(ctx); - return ERR_PTR(-ENOMEM); - } - - refcount_inc(&domain->ref); - ctx->domain = domain; - list_add(&ctx->node, &erofs_domain_cookies_list); - return ctx; -} - -static struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, - char *name, unsigned int flags) -{ - struct erofs_fscache *ctx; - struct erofs_domain *domain = EROFS_SB(sb)->domain; - - flags |= EROFS_REG_COOKIE_SHARE; - mutex_lock(&erofs_domain_cookies_lock); - list_for_each_entry(ctx, &erofs_domain_cookies_list, node) { - if (ctx->domain != domain || strcmp(ctx->name, name)) - continue; - if (!(flags & EROFS_REG_COOKIE_NEED_NOEXIST)) { - refcount_inc(&ctx->ref); - } else { - erofs_err(sb, "%s already exists in domain %s", name, - domain->domain_id); - ctx = ERR_PTR(-EEXIST); - } - mutex_unlock(&erofs_domain_cookies_lock); - return ctx; - } - ctx = erofs_domain_init_cookie(sb, name, flags); - mutex_unlock(&erofs_domain_cookies_lock); - return ctx; -} - -struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, - unsigned int flags) -{ - if (EROFS_SB(sb)->domain_id) - return erofs_domain_register_cookie(sb, name, flags); - return erofs_fscache_acquire_cookie(sb, name, flags); -} - -void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) -{ - struct erofs_domain *domain = NULL; - - if (!ctx) - return; - if (!ctx->domain) - return erofs_fscache_relinquish_cookie(ctx); - - mutex_lock(&erofs_domain_cookies_lock); - if (refcount_dec_and_test(&ctx->ref)) { - domain = ctx->domain; - list_del(&ctx->node); - erofs_fscache_relinquish_cookie(ctx); - } - mutex_unlock(&erofs_domain_cookies_lock); - if (domain) - erofs_fscache_domain_put(domain); -} - -int erofs_fscache_register_fs(struct super_block *sb) -{ - int ret; - struct erofs_sb_info *sbi = EROFS_SB(sb); - struct erofs_fscache *fscache; - unsigned int flags = 0; - - if (sbi->domain_id) - ret = erofs_fscache_register_domain(sb); - else - ret = erofs_fscache_register_volume(sb); - if (ret) - return ret; - - /* - * When shared domain is enabled, using NEED_NOEXIST to guarantee - * the primary data blob (aka fsid) is unique in the shared domain. - * - * For non-shared-domain case, fscache_acquire_volume() invoked by - * erofs_fscache_register_volume() has already guaranteed - * the uniqueness of primary data blob. - * - * Acquired domain/volume will be relinquished in kill_sb() on error. - */ - if (sbi->domain_id) - flags |= EROFS_REG_COOKIE_NEED_NOEXIST; - fscache = erofs_fscache_register_cookie(sb, sbi->fsid, flags); - if (IS_ERR(fscache)) - return PTR_ERR(fscache); - - sbi->dif0.fscache = fscache; - return 0; -} - -void erofs_fscache_unregister_fs(struct super_block *sb) -{ - struct erofs_sb_info *sbi = EROFS_SB(sb); - - erofs_fscache_unregister_cookie(sbi->dif0.fscache); - - if (sbi->domain) - erofs_fscache_domain_put(sbi->domain); - else - fscache_relinquish_volume(sbi->volume, NULL, false); - - sbi->dif0.fscache = NULL; - sbi->volume = NULL; - sbi->domain = NULL; -} diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index a188c570087a..45afe5c50de8 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -191,8 +191,9 @@ static int erofs_read_inode(struct inode *inode) err = -EFSCORRUPTED; goto err_out; } else { - inode->i_blocks = le32_to_cpu(copied.i_u.blocks_lo) << - (sb->s_blocksize_bits - 9); + inode->i_blocks = (le32_to_cpu(copied.i_u.blocks_lo) | + ((u64)le16_to_cpu(copied.i_nb.blocks_hi) << 32)) << + (sb->s_blocksize_bits - 9); } if (vi->datalayout == EROFS_INODE_CHUNK_BASED) { @@ -255,7 +256,7 @@ static int erofs_fill_inode(struct inode *inode) } mapping_set_large_folios(inode->i_mapping); - aops = erofs_get_aops(inode, false); + aops = erofs_get_aops(inode); if (IS_ERR(aops)) return PTR_ERR(aops); inode->i_mapping->a_ops = aops; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4792490161ec..580f8d9f14e7 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -23,6 +23,8 @@ __printf(2, 3) void _erofs_printk(struct super_block *sb, const char *fmt, ...); #define erofs_err(sb, fmt, ...) \ _erofs_printk(sb, KERN_ERR fmt "\n", ##__VA_ARGS__) +#define erofs_warn(sb, fmt, ...) \ + _erofs_printk(sb, KERN_WARNING fmt "\n", ##__VA_ARGS__) #define erofs_info(sb, fmt, ...) \ _erofs_printk(sb, KERN_INFO fmt "\n", ##__VA_ARGS__) @@ -41,7 +43,6 @@ typedef u64 erofs_blk_t; struct erofs_device_info { char *path; - struct erofs_fscache *fscache; struct file *file; struct dax_device *dax_dev; u64 fsoff, dax_part_off; @@ -78,24 +79,6 @@ struct erofs_sb_lz4_info { u16 max_pclusterblks; }; -struct erofs_domain { - refcount_t ref; - struct list_head list; - struct fscache_volume *volume; - char *domain_id; -}; - -struct erofs_fscache { - struct fscache_cookie *cookie; - struct inode *inode; /* anonymous inode for the blob */ - - /* used for share domain mode */ - struct erofs_domain *domain; - struct list_head node; - refcount_t ref; - char *name; -}; - struct erofs_xattr_prefix_item { struct erofs_xattr_long_prefix *prefix; u8 infix_len; @@ -160,10 +143,6 @@ struct erofs_sb_info { struct completion s_kobj_unregister; erofs_off_t dir_ra_bytes; - /* fscache support */ - struct fscache_volume *volume; - struct erofs_domain *domain; - char *fsid; char *domain_id; }; @@ -189,12 +168,6 @@ static inline bool erofs_is_fileio_mode(struct erofs_sb_info *sbi) extern struct file_system_type erofs_anon_fs_type; -static inline bool erofs_is_fscache_mode(struct super_block *sb) -{ - return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && - !erofs_is_fileio_mode(EROFS_SB(sb)) && !sb->s_bdev; -} - enum { EROFS_ZIP_CACHE_DISABLED, EROFS_ZIP_CACHE_READAHEAD, @@ -411,11 +384,9 @@ struct erofs_map_dev { }; extern const struct super_operations erofs_sops; - extern const struct address_space_operations erofs_aops; extern const struct address_space_operations erofs_fileio_aops; extern const struct address_space_operations z_erofs_aops; -extern const struct address_space_operations erofs_fscache_access_aops; extern const struct inode_operations erofs_generic_iops; extern const struct inode_operations erofs_symlink_iops; @@ -428,10 +399,6 @@ extern const struct file_operations erofs_ishare_fops; extern const struct iomap_ops z_erofs_iomap_report_ops; -/* flags for erofs_fscache_register_cookie() */ -#define EROFS_REG_COOKIE_SHARE 0x0001 -#define EROFS_REG_COOKIE_NEED_NOEXIST 0x0002 - void *erofs_read_metadata(struct super_block *sb, struct erofs_buf *buf, erofs_off_t *offset, int *lengthp); void erofs_unmap_metabuf(struct erofs_buf *buf); @@ -471,7 +438,7 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count) } static inline const struct address_space_operations * -erofs_get_aops(struct inode *realinode, bool no_fscache) +erofs_get_aops(struct inode *realinode) { if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) { if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) @@ -481,9 +448,6 @@ erofs_get_aops(struct inode *realinode, bool no_fscache) "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!"); return &z_erofs_aops; } - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !no_fscache && - erofs_is_fscache_mode(realinode->i_sb)) - return &erofs_fscache_access_aops; if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && erofs_is_fileio_mode(EROFS_SB(realinode->i_sb))) return &erofs_fileio_aops; @@ -546,36 +510,6 @@ static inline struct bio *erofs_fileio_bio_alloc(struct erofs_map_dev *mdev) { r static inline void erofs_fileio_submit_bio(struct bio *bio) {} #endif -#ifdef CONFIG_EROFS_FS_ONDEMAND -int erofs_fscache_register_fs(struct super_block *sb); -void erofs_fscache_unregister_fs(struct super_block *sb); - -struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, unsigned int flags); -void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache); -struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev); -void erofs_fscache_submit_bio(struct bio *bio); -#else -static inline int erofs_fscache_register_fs(struct super_block *sb) -{ - return -EOPNOTSUPP; -} -static inline void erofs_fscache_unregister_fs(struct super_block *sb) {} - -static inline -struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, - char *name, unsigned int flags) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache) -{ -} -static inline struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev) { return NULL; } -static inline void erofs_fscache_submit_bio(struct bio *bio) {} -#endif - #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE int __init erofs_init_ishare(void); void erofs_exit_ishare(void); diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c index 6ed66b17359b..0868c12fc15b 100644 --- a/fs/erofs/ishare.c +++ b/fs/erofs/ishare.c @@ -40,49 +40,42 @@ static int erofs_ishare_iget5_set(struct inode *inode, void *data) bool erofs_ishare_fill_inode(struct inode *inode) { struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb); - struct erofs_inode *vi = EROFS_I(inode); const struct address_space_operations *aops; + struct erofs_inode *vi = EROFS_I(inode); struct erofs_inode_fingerprint fp; - struct inode *sharedinode; - unsigned long hash; + struct inode *si; - aops = erofs_get_aops(inode, true); + aops = erofs_get_aops(inode); if (IS_ERR(aops)) return false; if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id)) return false; - hash = xxh32(fp.opaque, fp.size, 0); - sharedinode = iget5_locked(erofs_ishare_mnt->mnt_sb, hash, - erofs_ishare_iget5_eq, erofs_ishare_iget5_set, - &fp); - if (!sharedinode) { - kfree(fp.opaque); - return false; - } - if (inode_state_read_once(sharedinode) & I_NEW) { - sharedinode->i_mapping->a_ops = aops; - sharedinode->i_size = vi->vfs_inode.i_size; - unlock_new_inode(sharedinode); + si = iget5_locked(erofs_ishare_mnt->mnt_sb, + xxh32(fp.opaque, fp.size, 0), + erofs_ishare_iget5_eq, erofs_ishare_iget5_set, &fp); + if (si && (inode_state_read_once(si) & I_NEW)) { + si->i_mapping->a_ops = aops; + si->i_size = inode->i_size; + unlock_new_inode(si); } else { kfree(fp.opaque); - if (aops != sharedinode->i_mapping->a_ops) { - iput(sharedinode); + if (!si || aops != si->i_mapping->a_ops) { + iput(si); return false; } - if (sharedinode->i_size != vi->vfs_inode.i_size) { - _erofs_printk(inode->i_sb, KERN_WARNING - "size(%lld:%lld) not matches for the same fingerprint\n", - vi->vfs_inode.i_size, sharedinode->i_size); - iput(sharedinode); + if (si->i_size != inode->i_size) { + erofs_warn(inode->i_sb, "i_size mismatch (%lld != %lld) for the same fingerprint", + inode->i_size, si->i_size); + iput(si); return false; } } - vi->sharedinode = sharedinode; + vi->sharedinode = si; INIT_LIST_HEAD(&vi->ishare_list); - spin_lock(&EROFS_I(sharedinode)->ishare_lock); - list_add(&vi->ishare_list, &EROFS_I(sharedinode)->ishare_list); - spin_unlock(&EROFS_I(sharedinode)->ishare_lock); + spin_lock(&EROFS_I(si)->ishare_lock); + list_add(&vi->ishare_list, &EROFS_I(si)->ishare_list); + spin_unlock(&EROFS_I(si)->ishare_lock); return true; } diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 802add6652fd..86fa5c6a0c70 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -126,7 +126,6 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, struct erofs_device_info *dif, erofs_off_t *pos) { struct erofs_sb_info *sbi = EROFS_SB(sb); - struct erofs_fscache *fscache; struct erofs_deviceslot *dis; struct file *file; bool _48bit; @@ -145,12 +144,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, return -ENOMEM; } - if (erofs_is_fscache_mode(sb)) { - fscache = erofs_fscache_register_cookie(sb, dif->path, 0); - if (IS_ERR(fscache)) - return PTR_ERR(fscache); - dif->fscache = fscache; - } else if (!sbi->devs->flatdev) { + if (!sbi->devs->flatdev) { file = erofs_is_fileio_mode(sbi) ? filp_open(dif->path, O_RDONLY | O_LARGEFILE, 0) : bdev_file_open_by_path(dif->path, @@ -216,7 +210,7 @@ static int erofs_scan_devices(struct super_block *sb, if (!ondisk_extradevs) return 0; - if (!sbi->devs->extra_devices && !erofs_is_fscache_mode(sb)) + if (!sbi->devs->extra_devices) sbi->devs->flatdev = true; sbi->device_id_mask = roundup_pow_of_two(ondisk_extradevs + 1) - 1; @@ -372,8 +366,6 @@ static int erofs_read_superblock(struct super_block *sb) erofs_info(sb, "EXPERIMENTAL 48-bit layout support in use. Use at your own risk!"); if (erofs_sb_has_metabox(sbi)) erofs_info(sb, "EXPERIMENTAL metadata compression support in use. Use at your own risk!"); - if (erofs_is_fscache_mode(sb)) - erofs_info(sb, "[deprecated] fscache-based on-demand read feature in use. Use at your own risk!"); out: erofs_put_metabuf(&buf); return ret; @@ -393,8 +385,7 @@ static void erofs_default_options(struct erofs_sb_info *sbi) enum { Opt_user_xattr, Opt_acl, Opt_cache_strategy, Opt_dax, Opt_dax_enum, - Opt_device, Opt_fsid, Opt_domain_id, Opt_directio, Opt_fsoffset, - Opt_inode_share, + Opt_device, Opt_domain_id, Opt_directio, Opt_fsoffset, Opt_inode_share, }; static const struct constant_table erofs_param_cache_strategy[] = { @@ -418,7 +409,6 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { fsparam_flag("dax", Opt_dax), fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums), fsparam_string("device", Opt_device), - fsparam_string("fsid", Opt_fsid), fsparam_string("domain_id", Opt_domain_id), fsparam_flag_no("directio", Opt_directio), fsparam_u64("fsoffset", Opt_fsoffset), @@ -509,25 +499,14 @@ static int erofs_fc_parse_param(struct fs_context *fc, } ++sbi->devs->extra_devices; break; -#ifdef CONFIG_EROFS_FS_ONDEMAND - case Opt_fsid: - kfree(sbi->fsid); - sbi->fsid = kstrdup(param->string, GFP_KERNEL); - if (!sbi->fsid) - return -ENOMEM; - break; -#endif -#if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) case Opt_domain_id: - kfree_sensitive(sbi->domain_id); - sbi->domain_id = no_free_ptr(param->string); - break; -#else - case Opt_fsid: - case Opt_domain_id: - errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); + if (!IS_ENABLED(CONFIG_EROFS_FS_PAGE_CACHE_SHARE)) { + errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); + } else { + kfree_sensitive(sbi->domain_id); + sbi->domain_id = no_free_ptr(param->string); + } break; -#endif case Opt_directio: if (!IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE)) errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); @@ -620,12 +599,7 @@ static void erofs_set_sysfs_name(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); - if (sbi->domain_id && sbi->fsid) - super_set_sysfs_name_generic(sb, "%s,%s", sbi->domain_id, - sbi->fsid); - else if (sbi->fsid) - super_set_sysfs_name_generic(sb, "%s", sbi->fsid); - else if (erofs_is_fileio_mode(sbi)) + if (erofs_is_fileio_mode(sbi)) super_set_sysfs_name_generic(sb, "%s", bdi_dev_name(sb->s_bdi)); else @@ -680,11 +654,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; - if (erofs_is_fscache_mode(sb)) { - err = erofs_fscache_register_fs(sb); - if (err) - return err; - } err = super_setup_bdi(sb); if (err) return err; @@ -703,11 +672,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) return err; if (sb->s_blocksize_bits != sbi->blkszbits) { - if (erofs_is_fscache_mode(sb)) { - errorfc(fc, "unsupported blksize for fscache mode"); - return -EINVAL; - } - if (erofs_is_fileio_mode(sbi)) { sb->s_blocksize = 1 << sbi->blkszbits; sb->s_blocksize_bits = sbi->blkszbits; @@ -716,14 +680,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) return -EINVAL; } } - - if (sbi->dif0.fsoff) { - if (sbi->dif0.fsoff & (sb->s_blocksize - 1)) - return invalfc(fc, "fsoffset %llu is not aligned to block size %lu", - sbi->dif0.fsoff, sb->s_blocksize); - if (erofs_is_fscache_mode(sb)) - return invalfc(fc, "cannot use fsoffset in fscache mode"); - } + if (sbi->dif0.fsoff & (sb->s_blocksize - 1)) + return invalfc(fc, "fsoffset %llu is not aligned to block size %lu", + sbi->dif0.fsoff, sb->s_blocksize); if (test_opt(&sbi->opt, DAX_ALWAYS) && sbi->blkszbits != PAGE_SHIFT) { erofs_info(sb, "unsupported blocksize for DAX"); @@ -793,16 +752,13 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc) static int erofs_fc_get_tree(struct fs_context *fc) { - struct erofs_sb_info *sbi = fc->s_fs_info; int ret; - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid) - return get_tree_nodev(fc, erofs_fc_fill_super); - ret = get_tree_bdev_flags(fc, erofs_fc_fill_super, IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) ? GET_TREE_BDEV_QUIET_LOOKUP : 0); if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && ret == -ENOTBLK) { + struct erofs_sb_info *sbi = fc->s_fs_info; struct file *file; if (!fc->source) @@ -827,8 +783,8 @@ static int erofs_fc_reconfigure(struct fs_context *fc) DBG_BUGON(!sb_rdonly(sb)); - if (new_sbi->fsid || new_sbi->domain_id) - erofs_info(sb, "ignoring reconfiguration for fsid|domain_id."); + if (new_sbi->domain_id) + erofs_info(sb, "ignoring reconfiguration for domain_id."); if (test_opt(&new_sbi->opt, POSIX_ACL)) fc->sb_flags |= SB_POSIXACL; @@ -848,8 +804,6 @@ static int erofs_release_device_info(int id, void *ptr, void *data) fs_put_dax(dif->dax_dev, NULL); if (dif->file) fput(dif->file); - erofs_fscache_unregister_cookie(dif->fscache); - dif->fscache = NULL; kfree(dif->path); kfree(dif); return 0; @@ -867,7 +821,6 @@ static void erofs_free_dev_context(struct erofs_dev_context *devs) static void erofs_sb_free(struct erofs_sb_info *sbi) { erofs_free_dev_context(sbi->devs); - kfree(sbi->fsid); kfree_sensitive(sbi->domain_id); if (sbi->dif0.file) fput(sbi->dif0.file); @@ -928,14 +881,12 @@ static void erofs_kill_sb(struct super_block *sb) { struct erofs_sb_info *sbi = EROFS_SB(sb); - if ((IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && sbi->fsid) || - sbi->dif0.file) + if (sbi->dif0.file) kill_anon_super(sb); else kill_block_super(sb); erofs_drop_internal_inodes(sbi); fs_put_dax(sbi->dif0.dax_dev, NULL); - erofs_fscache_unregister_fs(sb); erofs_sb_free(sbi); sb->s_fs_info = NULL; } @@ -950,7 +901,6 @@ static void erofs_put_super(struct super_block *sb) erofs_drop_internal_inodes(sbi); erofs_free_dev_context(sbi->devs); sbi->devs = NULL; - erofs_fscache_unregister_fs(sb); } static struct file_system_type erofs_fs_type = { @@ -962,14 +912,12 @@ static struct file_system_type erofs_fs_type = { }; MODULE_ALIAS_FS("erofs"); -#if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE static void erofs_free_anon_inode(struct inode *inode) { struct erofs_inode *vi = EROFS_I(inode); -#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE kfree(vi->fingerprint.opaque); -#endif kmem_cache_free(erofs_inode_cachep, vi); } @@ -1048,11 +996,11 @@ shrinker_err: static void __exit erofs_module_exit(void) { unregister_filesystem(&erofs_fs_type); + erofs_exit_ishare(); - /* Ensure all RCU free inodes / pclusters are safe to be destroyed. */ + /* ensure all delayed rcu free inodes & pclusters are flushed */ rcu_barrier(); - erofs_exit_ishare(); erofs_exit_sysfs(); z_erofs_exit_subsystem(); erofs_exit_shrinker(); @@ -1099,12 +1047,6 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",dax=never"); if (erofs_is_fileio_mode(sbi) && test_opt(opt, DIRECT_IO)) seq_puts(seq, ",directio"); - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) { - if (sbi->fsid) - seq_printf(seq, ",fsid=%s", sbi->fsid); - if (sbi->domain_id) - seq_printf(seq, ",domain_id=%s", sbi->domain_id); - } if (sbi->dif0.fsoff) seq_printf(seq, ",fsoffset=%llu", sbi->dif0.fsoff); if (test_opt(opt, INODE_SHARE)) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index c6240dccbb0f..74520e910259 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -806,6 +806,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_frontend *fe) struct super_block *sb = fe->inode->i_sb; struct z_erofs_pcluster *pcl = NULL; void *ptr = NULL; + bool needretry; int ret; DBG_BUGON(fe->pcl); @@ -825,19 +826,16 @@ static int z_erofs_pcluster_begin(struct z_erofs_frontend *fe) } ptr = map->buf.page; } else { - while (1) { + do { rcu_read_lock(); pcl = xa_load(&EROFS_SB(sb)->managed_pslots, map->m_pa); - if (!pcl || z_erofs_get_pcluster(pcl)) { - DBG_BUGON(pcl && map->m_pa != pcl->pos); - rcu_read_unlock(); - break; - } + needretry = pcl && !z_erofs_get_pcluster(pcl); rcu_read_unlock(); - } + } while (needretry); } if (pcl) { + DBG_BUGON(map->m_pa != pcl->pos); fe->pcl = pcl; ret = -EEXIST; } else { @@ -1459,21 +1457,19 @@ static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io, if (sbi->sync_decompress == EROFS_SYNC_DECOMPRESS_AUTO) sbi->sync_decompress = EROFS_SYNC_DECOMPRESS_FORCE_ON; #ifdef CONFIG_EROFS_FS_PCPU_KTHREAD - struct kthread_worker *worker; + scoped_guard(rcu) { + struct kthread_worker *worker; - rcu_read_lock(); - worker = rcu_dereference( + worker = rcu_dereference( z_erofs_pcpu_workers[raw_smp_processor_id()]); - if (!worker) { - INIT_WORK(&io->u.work, z_erofs_decompressqueue_work); - queue_work(z_erofs_workqueue, &io->u.work); - } else { - kthread_queue_work(worker, &io->u.kthread_work); + if (worker) { + kthread_queue_work(worker, &io->u.kthread_work); + return; + } } - rcu_read_unlock(); -#else - queue_work(z_erofs_workqueue, &io->u.work); + INIT_WORK(&io->u.work, z_erofs_decompressqueue_work); #endif + queue_work(z_erofs_workqueue, &io->u.work); return; } gfp_flag = memalloc_noio_save(); @@ -1714,8 +1710,6 @@ static void z_erofs_submit_queue(struct z_erofs_frontend *f, drain_io: if (erofs_is_fileio_mode(EROFS_SB(sb))) erofs_fileio_submit_bio(bio); - else if (erofs_is_fscache_mode(sb)) - erofs_fscache_submit_bio(bio); else submit_bio(bio); @@ -1744,8 +1738,6 @@ drain_io: if (!bio) { if (erofs_is_fileio_mode(EROFS_SB(sb))) bio = erofs_fileio_bio_alloc(&mdev); - else if (erofs_is_fscache_mode(sb)) - bio = erofs_fscache_bio_alloc(&mdev); else bio = bio_alloc(mdev.m_bdev, BIO_MAX_VECS, REQ_OP_READ, GFP_NOIO); @@ -1774,8 +1766,6 @@ drain_io: if (bio) { if (erofs_is_fileio_mode(EROFS_SB(sb))) erofs_fileio_submit_bio(bio); - else if (erofs_is_fscache_mode(sb)) - erofs_fscache_submit_bio(bio); else submit_bio(bio); } diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index e1a02a2c8406..bab521613552 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -15,8 +15,9 @@ struct z_erofs_maprecorder { u8 type, headtype; u16 clusterofs; u16 delta[2]; - erofs_blk_t pblk, compressedblks; + erofs_blk_t pblk; erofs_off_t nextpackoff; + int compressedblks; bool partialref, in_mbox; }; @@ -54,7 +55,12 @@ static int z_erofs_load_full_lcluster(struct z_erofs_maprecorder *m, u64 lcn) } else { m->partialref = !!(advise & Z_EROFS_LI_PARTIAL_REF); m->clusterofs = le16_to_cpu(di->di_clusterofs); - m->pblk = le32_to_cpu(di->di_u.blkaddr); + if (advise & Z_EROFS_LI_HOLE) { + m->compressedblks = 0; + m->pblk = EROFS_NULL_ADDR; + } else { + m->pblk = le32_to_cpu(di->di_u.blkaddr); + } } return 0; } @@ -309,9 +315,10 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m, ((m->headtype == Z_EROFS_LCLUSTER_TYPE_PLAIN || m->headtype == Z_EROFS_LCLUSTER_TYPE_HEAD2) && !bigpcl2) || (lcn << vi->z_lclusterbits) >= inode->i_size) - m->compressedblks = 1; + if (m->compressedblks < 0) + m->compressedblks = 1; - if (m->compressedblks) + if (m->compressedblks >= 0) goto out; err = z_erofs_load_lcluster_from_disk(m, lcn, false); @@ -329,19 +336,22 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m, DBG_BUGON(lcn == initial_lcn && m->type == Z_EROFS_LCLUSTER_TYPE_NONHEAD); - if (m->type == Z_EROFS_LCLUSTER_TYPE_NONHEAD && m->delta[0] != 1) { + if (m->type != Z_EROFS_LCLUSTER_TYPE_NONHEAD) { + /* + * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type + * rather than CBLKCNT, it's a 1 block-sized pcluster. + */ + if (m->compressedblks < 0) + m->compressedblks = 1; + } else if (m->delta[0] != 1 || m->compressedblks < 0) { erofs_err(sb, "bogus CBLKCNT @ lcn %llu of nid %llu", lcn, vi->nid); DBG_BUGON(1); return -EFSCORRUPTED; } - /* - * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type rather - * than CBLKCNT, it's a 1 block-sized pcluster. - */ - if (m->type != Z_EROFS_LCLUSTER_TYPE_NONHEAD || !m->compressedblks) - m->compressedblks = 1; out: + if (!m->compressedblks) + m->map->m_flags &= ~EROFS_MAP_MAPPED; m->map->m_plen = erofs_pos(sb, m->compressedblks); return 0; } @@ -395,6 +405,7 @@ static int z_erofs_map_blocks_fo(struct inode *inode, .inode = inode, .map = map, .in_mbox = erofs_inode_in_metabox(inode), + .compressedblks = -1, }; unsigned int endoff; unsigned long initial_lcn; diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h index cd0e3fd8c23f..0a178cb10fb1 100644 --- a/include/trace/events/erofs.h +++ b/include/trace/events/erofs.h @@ -90,7 +90,7 @@ TRACE_EVENT(erofs_read_folio, __field(erofs_nid_t, nid ) __field(int, dir ) __field(pgoff_t, index ) - __field(int, uptodate) + __field(unsigned int, order ) __field(bool, raw ) ), @@ -99,16 +99,15 @@ TRACE_EVENT(erofs_read_folio, __entry->nid = EROFS_I(inode)->nid; __entry->dir = S_ISDIR(inode->i_mode); __entry->index = folio->index; - __entry->uptodate = folio_test_uptodate(folio); + __entry->order = folio_order(folio); __entry->raw = raw; ), - TP_printk("dev = (%d,%d), nid = %llu, %s, index = %lu, uptodate = %d " - "raw = %d", + TP_printk("dev = (%d,%d), nid = %llu, %s, index = %lu, order = %u, raw = %d", show_dev_nid(__entry), show_file_type(__entry->dir), (unsigned long)__entry->index, - __entry->uptodate, + __entry->order, __entry->raw) ); |
