| Age | Commit message (Collapse) | Author |
|
Conversion performed via this Coccinelle script:
// SPDX-License-Identifier: GPL-2.0-only
// Options: --include-headers-for-types --all-includes --include-headers --keep-comments
virtual patch
@gfp depends on patch && !(file in "tools") && !(file in "samples")@
identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
kzalloc_obj,kzalloc_objs,kzalloc_flex,
kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
@@
ALLOC(...
- , GFP_KERNEL
)
$ make coccicheck MODE=patch COCCI=gfp.cocci
Build and boot tested x86_64 with Fedora 42's GCC and Clang:
Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- convert crypto_shash users to direct crypto library use with simpler
and faster code and reduced stack usage (Eric Biggers):
- the dm-verity SHA-256 conversion also teaches it to do two-way
interleaved hashing for added performance
- dm-crypt MD5 conversion (used for Loop-AES compatibility)
- added document for for takeover/reshape raid1 -> raid5 examples (Heinz Mauelshagen)
- fix dm-vdo kerneldoc warnings (Matthew Sakai)
- various random fixes and cleanups
* tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
dm pcache: fix segment info indexing
dm pcache: fix cache info indexing
dm-pcache: advance slot index before writing slot
dm raid: add documentation for takeover/reshape raid1 -> raid5 table line examples
dm log-writes: Add missing set_freezable() for freezable kthread
dm-raid: fix possible NULL dereference with undefined raid type
dm-snapshot: fix 'scheduling while atomic' on real-time kernels
dm: ignore discard return value
MAINTAINERS: add Benjamin Marzinski as a device mapper maintainer
dm-mpath: Simplify the setup_scsi_dh code
dm vdo: fix kerneldoc warnings
dm-bufio: align write boundary on physical block size
dm-crypt: enable DM_TARGET_ATOMIC_WRITES
dm: test for REQ_ATOMIC in dm_accept_partial_bio()
dm-verity: remove useless mempool
dm-verity: disable recursive forward error correction
dm-ebs: Mark full buffer dirty even on partial write
dm mpath: enable DM_TARGET_ATOMIC_WRITES
dm verity fec: Expose corrected block count via status
dm: Don't warn if IMA_DISABLE_HTABLE is not enabled
...
|
|
Segment info indexing also used sizeof(struct) instead of the
4K metadata stride, so info_index could point between slots and
subsequent writes would advance incorrectly. Derive info_index
from the pointer returned by the segment meta search using
PCACHE_SEG_INFO_SIZE and advance to the next slot for future
updates.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org # 6.18
|
|
The on-media cache_info index used sizeof(struct) instead of the
4K metadata stride, so gc_percent updates from dmsetup message
were written between slots and lost after reboot. Use
PCACHE_CACHE_INFO_SIZE in get_cache_info_addr() and align
info_index with the slot returned by pcache_meta_find_latest().
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org # 6.18
|
|
In dm-pcache, in order to ensure crash-consistency, a dual-copy scheme
is used to alternately update metadata, and there is a slot index that
records the current slot. However, in the write path the current
implementation writes directly to the current slot indexed by slot
index, and then advances the slot — which ends up overwriting the
existing slot, violating the crash-consistency guarantee.
This patch fixes that behavior, preventing metadata from being
overwritten incorrectly.
In addition, this patch add a missing pmem_wmb() after memcpy_flushcache().
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Cc: stable@vger.kernel.org # 6.18
|
|
pcache_meta_find_latest() leaves whatever it last copied into the
caller’s buffer even when it returns NULL. For cache_info_init(),
that meant cache->cache_info could still contain CRC-bad garbage when
no valid metadata exists, leading later initialization paths to read
bogus flags.
Explicitly memset cache->cache_info in cache_info_init_default()
so new-cache paths start from a clean slate. The default sequence
number assignment becomes redundant with this reset, so it drops out.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
pcache_meta_find_latest() already computes the metadata address as
meta_addr. Reuse that instead of recomputing.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
CONFIG_BCACHE is tristate, so dm-pcache can also be built-in.
Switch the Makefile to use obj-$(CONFIG_DM_PCACHE) so the target can be
linked into vmlinux instead of always being a loadable module.
Also rename cache_flush() to pcache_cache_flush() to avoid a global
symbol clash with sunrpc/cache.c's cache_flush().
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Change the 'ret' variable from u32 to int to store negative error codes or
zero returned by cache_kset_close().
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but it's ugly. Additionally, assigning negative error codes to
unsigned type may trigger a GCC warning when the -Wsign-conversion flag
is enabled.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
A patch from a few days ago fixed the division issue on 32-bit machines,
but it introduced a coding style problem.
WARNING: Missing a blank line after declarations
+ u32 rem;
+ div_u64_rem(off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT,
cache->n_ksets, &rem);
total: 0 errors, 1 warnings, 634 lines checked
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
The smatch checker reports a “scheduler in atomic context” problem in
the following call chain:
miss_read_end_req()
-> cache_seg_put()
-> cache_seg_invalidate()
-> cache_seg_gen_increase()
-> mutex_lock(&cache_seg->ctrl_lock);
In practice, this `mutex_lock` will not actually schedule, because it is
only called when `cache_seg_put()` drops the last reference, which is
single-threaded. That is also why the issue never shows up during real
testing.
However, the code is still buggy. The original purpose of `ctrl_lock`
was to prevent read/write conflicts on the cache segment control
information. Looking at the current usage, all control information
accesses are single-threaded: reads only occur during the init phase,
where no conflicts are possible, and writes happen once in the init
phase (also single-threaded) and once when `cache_seg_put()` drops the
last reference (again single-threaded).
Therefore, this patch removes `ctrl_lock` entirely and adds comments in
the appropriate places to document this logic.
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
This patch introduces dm-pcache, a new DM target that places a DAX-
capable persistent-memory device in front of any slower block device and
uses it as a high-throughput, low-latency cache.
Design highlights
-----------------
- DAX data path – data is copied directly between DRAM and the pmem
mapping, bypassing the block layer’s overhead.
- Segmented, crash-consistent layout
- all layout metadata are dual-replicated CRC-protected.
- atomic kset flushes; key replay on mount guarantees cache integrity
even after power loss.
- Striped multi-tree index
- Multi‑tree indexing for high parallelism.
- overlap-resolution logic ensures non-intersecting cached extents.
- Background services
- write-back worker flushes dirty keys in order, preserving backing-device
crash consistency. This is important for checkpoint in cloud storage.
- garbage collector reclaims clean segments when utilisation exceeds a
tunable threshold.
- Data integrity – optional CRC32 on cached payload; metadata always protected.
Comparison with existing block-level caches
---------------------------------------------------------------------------------------------------------------------------------
| Feature | pcache (this patch) | bcache | dm-writecache |
|----------------------------------|---------------------------------|------------------------------|---------------------------|
| pmem access method | DAX | bio (block I/O) | DAX |
| Write latency (4 K rand-write) | ~5 µs | ~20 µs | ~5 µs |
| Concurrency | multi subtree index | global index tree | single tree + wc_lock |
| IOPS (4K randwrite, 32 numjobs) | 2.1 M | 352 K | 283 K |
| Read-cache support | YES | YES | NO |
| Deployment | no re-format of backend | backend devices must be | no re-format of backend |
| | | reformatted | |
| Write-back ordering | log-structured; | no ordering guarantee | no ordering guarantee |
| | preserves app-IO-order | | |
| Data integrity checks | metadata + data CRC(optional) | metadata CRC only | none |
---------------------------------------------------------------------------------------------------------------------------------
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|