summaryrefslogtreecommitdiff
path: root/drivers/md/dm-pcache
AgeCommit message (Collapse)Author
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2025-12-11Merge tag 'for-6.19/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - convert crypto_shash users to direct crypto library use with simpler and faster code and reduced stack usage (Eric Biggers): - the dm-verity SHA-256 conversion also teaches it to do two-way interleaved hashing for added performance - dm-crypt MD5 conversion (used for Loop-AES compatibility) - added document for for takeover/reshape raid1 -> raid5 examples (Heinz Mauelshagen) - fix dm-vdo kerneldoc warnings (Matthew Sakai) - various random fixes and cleanups * tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits) dm pcache: fix segment info indexing dm pcache: fix cache info indexing dm-pcache: advance slot index before writing slot dm raid: add documentation for takeover/reshape raid1 -> raid5 table line examples dm log-writes: Add missing set_freezable() for freezable kthread dm-raid: fix possible NULL dereference with undefined raid type dm-snapshot: fix 'scheduling while atomic' on real-time kernels dm: ignore discard return value MAINTAINERS: add Benjamin Marzinski as a device mapper maintainer dm-mpath: Simplify the setup_scsi_dh code dm vdo: fix kerneldoc warnings dm-bufio: align write boundary on physical block size dm-crypt: enable DM_TARGET_ATOMIC_WRITES dm: test for REQ_ATOMIC in dm_accept_partial_bio() dm-verity: remove useless mempool dm-verity: disable recursive forward error correction dm-ebs: Mark full buffer dirty even on partial write dm mpath: enable DM_TARGET_ATOMIC_WRITES dm verity fec: Expose corrected block count via status dm: Don't warn if IMA_DISABLE_HTABLE is not enabled ...
2025-12-10dm pcache: fix segment info indexingLi Chen
Segment info indexing also used sizeof(struct) instead of the 4K metadata stride, so info_index could point between slots and subsequent writes would advance incorrectly. Derive info_index from the pointer returned by the segment meta search using PCACHE_SEG_INFO_SIZE and advance to the next slot for future updates. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-12-10dm pcache: fix cache info indexingLi Chen
The on-media cache_info index used sizeof(struct) instead of the 4K metadata stride, so gc_percent updates from dmsetup message were written between slots and lost after reboot. Use PCACHE_CACHE_INFO_SIZE in get_cache_info_addr() and align info_index with the slot returned by pcache_meta_find_latest(). Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-12-10dm-pcache: advance slot index before writing slotDongsheng Yang
In dm-pcache, in order to ensure crash-consistency, a dual-copy scheme is used to alternately update metadata, and there is a slot index that records the current slot. However, in the write path the current implementation writes directly to the current slot indexed by slot index, and then advances the slot — which ends up overwriting the existing slot, violating the crash-consistency guarantee. This patch fixes that behavior, preventing metadata from being overwritten incorrectly. In addition, this patch add a missing pmem_wmb() after memcpy_flushcache(). Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-11-18dm-pcache: zero cache_info before default initLi Chen
pcache_meta_find_latest() leaves whatever it last copied into the caller’s buffer even when it returns NULL. For cache_info_init(), that meant cache->cache_info could still contain CRC-bad garbage when no valid metadata exists, leading later initialization paths to read bogus flags. Explicitly memset cache->cache_info in cache_info_init_default() so new-cache paths start from a clean slate. The default sequence number assignment becomes redundant with this reset, so it drops out. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Reviewed-by: Zheng Gu <cengku@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-11-18dm-pcache: reuse meta_addr in pcache_meta_find_latestLi Chen
pcache_meta_find_latest() already computes the metadata address as meta_addr. Reuse that instead of recomputing. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-11-18dm-pcache: allow built-in build and rename flush helperLi Chen
CONFIG_BCACHE is tristate, so dm-pcache can also be built-in. Switch the Makefile to use obj-$(CONFIG_DM_PCACHE) so the target can be linked into vmlinux instead of always being a loadable module. Also rename cache_flush() to pcache_cache_flush() to avoid a global symbol clash with sunrpc/cache.c's cache_flush(). Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-02dm-pcache: use int type to store negative error codesQianfeng Rong
Change the 'ret' variable from u32 to int to store negative error codes or zero returned by cache_kset_close(). Storing the negative error codes in unsigned type, doesn't cause an issue at runtime but it's ugly. Additionally, assigning negative error codes to unsigned type may trigger a GCC warning when the -Wsign-conversion flag is enabled. No effect on runtime. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-01dm-pcache: cleanup: fix coding style report by checkpatch.plDongsheng Yang
A patch from a few days ago fixed the division issue on 32-bit machines, but it introduced a coding style problem. WARNING: Missing a blank line after declarations + u32 rem; + div_u64_rem(off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT, cache->n_ksets, &rem); total: 0 errors, 1 warnings, 634 lines checked Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-09-01dm-pcache: remove ctrl_lock for pcache_cache_segmentDongsheng Yang
The smatch checker reports a “scheduler in atomic context” problem in the following call chain: miss_read_end_req() -> cache_seg_put() -> cache_seg_invalidate() -> cache_seg_gen_increase() -> mutex_lock(&cache_seg->ctrl_lock); In practice, this `mutex_lock` will not actually schedule, because it is only called when `cache_seg_put()` drops the last reference, which is single-threaded. That is also why the issue never shows up during real testing. However, the code is still buggy. The original purpose of `ctrl_lock` was to prevent read/write conflicts on the cache segment control information. Looking at the current usage, all control information accesses are single-threaded: reads only occur during the init phase, where no conflicts are possible, and writes happen once in the init phase (also single-threaded) and once when `cache_seg_put()` drops the last reference (again single-threaded). Therefore, this patch removes `ctrl_lock` entirely and adds comments in the appropriate places to document this logic. Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-08-25dm-pcache: add persistent cache target in device-mapperDongsheng Yang
This patch introduces dm-pcache, a new DM target that places a DAX- capable persistent-memory device in front of any slower block device and uses it as a high-throughput, low-latency cache. Design highlights ----------------- - DAX data path – data is copied directly between DRAM and the pmem mapping, bypassing the block layer’s overhead. - Segmented, crash-consistent layout - all layout metadata are dual-replicated CRC-protected. - atomic kset flushes; key replay on mount guarantees cache integrity even after power loss. - Striped multi-tree index - Multi‑tree indexing for high parallelism. - overlap-resolution logic ensures non-intersecting cached extents. - Background services - write-back worker flushes dirty keys in order, preserving backing-device crash consistency. This is important for checkpoint in cloud storage. - garbage collector reclaims clean segments when utilisation exceeds a tunable threshold. - Data integrity – optional CRC32 on cached payload; metadata always protected. Comparison with existing block-level caches --------------------------------------------------------------------------------------------------------------------------------- | Feature | pcache (this patch) | bcache | dm-writecache | |----------------------------------|---------------------------------|------------------------------|---------------------------| | pmem access method | DAX | bio (block I/O) | DAX | | Write latency (4 K rand-write) | ~5 µs | ~20 µs | ~5 µs | | Concurrency | multi subtree index | global index tree | single tree + wc_lock | | IOPS (4K randwrite, 32 numjobs) | 2.1 M | 352 K | 283 K | | Read-cache support | YES | YES | NO | | Deployment | no re-format of backend | backend devices must be | no re-format of backend | | | | reformatted | | | Write-back ordering | log-structured; | no ordering guarantee | no ordering guarantee | | | preserves app-IO-order | | | | Data integrity checks | metadata + data CRC(optional) | metadata CRC only | none | --------------------------------------------------------------------------------------------------------------------------------- Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>