| Age | Commit message (Collapse) | Author |
|
The kernel's AES library currently has the following issues:
- It doesn't take advantage of the architecture-optimized AES code,
including the implementations using AES instructions.
- It's much slower than even the other software AES implementations: 2-4
times slower than "aes-generic", "aes-arm", and "aes-arm64".
- It requires that both the encryption and decryption round keys be
computed and cached. This is wasteful for users that need only the
forward (encryption) direction of the cipher: the key struct is 484
bytes when only 244 are actually needed. This missed optimization is
very common, as many AES modes (e.g. GCM, CFB, CTR, CMAC, and even the
tweak key in XTS) use the cipher only in the forward (encryption)
direction even when doing decryption.
- It doesn't provide the flexibility to customize the prepared key
format. The API is defined to do key expansion, and several callers
in drivers/crypto/ use it specifically to expand the key. This is an
issue when integrating the existing powerpc, s390, and sparc code,
which is necessary to provide full parity with the traditional API.
To resolve these issues, I'm proposing the following changes:
1. New structs 'aes_key' and 'aes_enckey' are introduced, with
corresponding functions aes_preparekey() and aes_prepareenckey().
Generally these structs will include the encryption+decryption round
keys and the encryption round keys, respectively. However, the exact
format will be under control of the architecture-specific AES code.
(The verb "prepare" is chosen over "expand" since key expansion isn't
necessarily done. It's also consistent with hmac*_preparekey().)
2. aes_encrypt() and aes_decrypt() will be changed to operate on the new
structs instead of struct crypto_aes_ctx.
3. aes_encrypt() and aes_decrypt() will use architecture-optimized code
when available, or else fall back to a new generic AES implementation
that unifies the existing two fragmented generic AES implementations.
The new generic AES implementation uses tables for both SubBytes and
MixColumns, making it almost as fast as "aes-generic". However,
instead of aes-generic's huge 8192-byte tables per direction, it uses
only 1024 bytes for encryption and 1280 bytes for decryption (similar
to "aes-arm"). The cost is just some extra rotations.
The new generic AES implementation also includes table prefetching,
making it have some "constant-time hardening". That's an improvement
from aes-generic which has no constant-time hardening.
It does slightly regress in constant-time hardening vs. the old
lib/crypto/aes.c which had smaller tables, and from aes-fixed-time
which disabled IRQs on top of that. But I think this is tolerable.
The real solutions for constant-time AES are AES instructions or
bit-slicing. The table-based code remains a best-effort fallback for
the increasingly-rare case where a real solution is unavailable.
4. crypto_aes_ctx and aes_expandkey() will remain for now, but only for
callers that are using them specifically for the AES key expansion
(as opposed to en/decrypting data with the AES library).
This commit begins the migration process by introducing the new structs
and functions, backed by the new generic AES implementation.
To allow callers to be incrementally converted, aes_encrypt() and
aes_decrypt() are temporarily changed into macros that use a _Generic
expression to call either the old functions (which take crypto_aes_ctx)
or the new functions (which take the new types). Once all callers have
been updated, these macros will go away, the old functions will be
removed, and the "_new" suffix will be dropped from the new functions.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Since ML-DSA is FIPS-approved, add the boot-time self-test which is
apparently required.
Just add a test vector manually for now, borrowed from
lib/crypto/tests/mldsa-testvecs.h (where in turn it's borrowed from
leancrypto). The SHA-* FIPS test vectors are generated by
scripts/crypto/gen-fips-testvecs.py instead, but the common Python
libraries don't support ML-DSA yet.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20260107044215.109930-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Since the architecture-specific implementations of NH initialize memory
in assembly code, they aren't compatible with KMSAN as-is.
Fixes: 382de740759a ("lib/crypto: nh: Add NH library")
Link: https://lore.kernel.org/r/20260105053652.1708299-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
For the bitwise left rotation in MD5STEP, use rol32() from
<linux/bitops.h> instead of open-coding it.
Signed-off-by: Rusydi H. Makarim <rusydi.makarim@kriptograf.id>
Link: https://lore.kernel.org/r/20251214-rol32_in_md5-v1-1-20f5f11a92b2@kriptograf.id
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the x86_64 implementations of NH into lib/crypto/. This makes
the nh() function be optimized on x86_64 kernels.
Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code. This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the arm64 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm64 kernels.
Note: this temporarily makes the adiantum template not utilize the arm64
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Migrate the arm32 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm32 kernels.
Note: this temporarily makes the adiantum template not utilize the arm32
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add some simple KUnit tests for the nh() function.
These replace the test coverage which will be lost by removing the
nhpoly1305 crypto_shash.
Note that the NH code also continues to be tested indirectly as well,
via the tests for the "adiantum(xchacha12,aes)" crypto_skcipher.
Link: https://lore.kernel.org/r/20251211011846.8179-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add support for the NH "almost-universal hash function" to lib/crypto/,
specifically the variant of NH used in Adiantum.
This will replace the need for the "nhpoly1305" crypto_shash algorithm.
All the implementations of "nhpoly1305" use architecture-optimized code
only for the NH stage; they just use the generic C Poly1305 code for the
Poly1305 stage. We can achieve the same result in a simpler way using
an (architecture-optimized) nh() function combined with code in
crypto/adiantum.c that passes the results to the Poly1305 library.
This commit begins this cleanup by adding the nh() function. The code
is derived from crypto/nhpoly1305.c and include/crypto/nhpoly1305.h.
Link: https://lore.kernel.org/r/20251211011846.8179-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a KUnit test suite for ML-DSA verification, including the following
for each ML-DSA parameter set (ML-DSA-44, ML-DSA-65, and ML-DSA-87):
- Positive test (valid signature), using vector imported from leancrypto
- Various negative tests:
- Wrong length for signature, message, or public key
- Out-of-range coefficients in z vector
- Invalid encoded hint vector
- Any bit flipped in signature, message, or public key
- Unit test for the internal function use_hint()
- A benchmark
ML-DSA inputs and outputs are very large. To keep the size of the tests
down, use just one valid test vector per parameter set, and generate the
negative tests at runtime by mutating the valid test vector.
I also considered importing the test vectors from Wycheproof. I've
tested that mldsa_verify() indeed passes all of Wycheproof's ML-DSA test
vectors that use an empty context string. However, importing these
permanently would add over 6 MB of source. That's too much to be a
reasonable addition to the Linux kernel tree for one algorithm. It also
wouldn't actually provide much better test coverage than this commit.
Another potential issue is that Wycheproof uses the Apache license.
Similarly, this also differs from the earlier proposal to import a long
list of test vectors from leancrypto. I retained only one valid
signature for each algorithm, and I also added (runtime-generated)
negative tests which were missing. I think this is a better tradeoff.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add support for verifying ML-DSA signatures.
ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is specified
in FIPS 204 and is the standard version of Dilithium. Unlike RSA and
elliptic-curve cryptography, ML-DSA is believed to be secure even
against adversaries in possession of a large-scale quantum computer.
Compared to the earlier patch
(https://lore.kernel.org/r/20251117145606.2155773-3-dhowells@redhat.com/)
that was based on "leancrypto", this implementation:
- Is about 700 lines of source code instead of 4800.
- Generates about 4 KB of object code instead of 28 KB.
- Uses 9-13 KB of memory to verify a signature instead of 31-84 KB.
- Is at least about the same speed, with a microbenchmark showing 3-5%
improvements on one x86_64 CPU and -1% to 1% changes on another.
When memory is a bottleneck, it's likely much faster.
- Correctly implements the RejNTTPoly step of the algorithm.
The API just consists of a single function mldsa_verify(), supporting
pure ML-DSA with any standard parameter set (ML-DSA-44, ML-DSA-65, or
ML-DSA-87) as selected by an enum. That's all that's actually needed.
The following four potential features are unneeded and aren't included.
However, any that ever become needed could fairly easily be added later,
as they only affect how the message representative mu is calculated:
- Nonempty context strings
- Incremental message hashing
- HashML-DSA
- External mu
Signing support would, of course, be a larger and more complex addition.
However, the kernel doesn't, and shouldn't, need ML-DSA signing support.
Note that mldsa_verify() allocates memory, so it can sleep and can fail
with ENOMEM. Unfortunately we don't have much choice about that, since
ML-DSA needs a lot of memory. At least callers have to check for errors
anyway, since the signature could be invalid.
Note that verification doesn't require constant-time code, and in fact
some steps are inherently variable-time. I've used constant-time
patterns in some places anyway, but technically they're not needed.
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20251214181712.29132-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
- A couple more fixes for the lib/crypto KUnit tests
- Fix missing MMU protection for the AES S-box
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: aes: Fix missing MMU protection for AES S-box
MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
lib/crypto: tests: Fix syntax error for old python versions
lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
|
|
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sometimes the user needs to split each entry on the mapped scatter list
due to DMA length constrains. This helper returns a number of entities
assuming that each of them is not bigger than supplied maximum length.
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260108105619.3513561-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
__cacheline_aligned puts the data in the ".data..cacheline_aligned"
section, which isn't marked read-only i.e. it doesn't receive MMU
protection. Replace it with ____cacheline_aligned which does the right
thing and just aligns the data while keeping it in ".rodata".
Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher")
Cc: stable@vger.kernel.org
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://lore.kernel.org/r/20260105074712.498-1-dqfext@gmail.com/
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260107052023.174620-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
On my development machine the generic, memcpy()-only implementation of
polyval_preparekey() is too fast for the IRQ workers to actually fire.
The test fails.
Increase the iterations to make the test more robust.
The test will run for a maximum of one second in any case.
[EB: This failure was already fixed by commit c31f4aa8fed0 ("kunit:
Enforce task execution in {soft,hard}irq contexts"). I'm still applying
this patch too, since the iteration count in this test made its running
time much shorter than the other similar ones.]
Fixes: b3aed551b3fc ("lib/crypto: tests: Add KUnit tests for POLYVAL")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20260102-kunit-polyval-fix-v1-1-5313b5a65f35@linutronix.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
There is no indication in the history that such an option was merged to
mainline.
Fixes: c637693b20da ("ubsan: remove UBSAN_MISC in favor of individual options")
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://patch.msgid.link/20260107114833.2030995-1-stefan.wiehler@nokia.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
In the internal cmp function, a const pointer is cast out to a non-const
pointer by using container_of(). This is probably not what is intended
at all, so fix up the const marking to properly preserve what is really
happening (i.e. the const should flow through the container_of() call)
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/all/2025121751-backtrack-manifesto-7c57@gregkh/#r
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
In many kunit assert functions a const pointer is passed to
container_of() and out pops a non-const pointer, which really isn't the
correct thing to do at all. Fix this up by correctly marking the
casted-to pointer as const to preserve the marking.
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <raemoar63@gmail.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Link: https://lore.kernel.org/r/2025121746-result-staleness-5a68@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Enable context analysis for rhashtable, which was used as an initial
test as it contains a combination of RCU, mutex, and bit_spinlock usage.
Users of rhashtable now also benefit from annotations on the API, which
will now warn if the RCU read lock is not held where required.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-33-elver@google.com
|
|
Enable context analysis for stackdepot.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-32-elver@google.com
|
|
Many patterns that involve data-racy accesses often deliberately ignore
normal synchronization rules to avoid taking a lock.
If we have a lock-guarded variable on which we do a lock-less data-racy
access, rather than having to write context_unsafe(data_race(..)),
simply make the data_race(..) macro imply context-unsafety. The
data_race() macro already denotes the intent that something subtly
unsafe is about to happen, so it should be clear enough as-is.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-27-elver@google.com
|
|
As discussed in [1], removing __cond_lock() will improve the readability
of trylock code. Now that Sparse context tracking support has been
removed, we can also remove __cond_lock().
Change existing APIs to either drop __cond_lock() completely, or make
use of the __cond_acquires() function attribute instead.
In particular, spinlock and rwlock implementations required switching
over to inline helpers rather than statement-expressions for their
trylock_* variants.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250207082832.GU7145@noisy.programming.kicks-ass.net/ [1]
Link: https://patch.msgid.link/20251219154418.3592607-25-elver@google.com
|
|
Add support for Clang's context analysis for ww_mutex.
The programming model for ww_mutex is subtly more complex than other
locking primitives when using ww_acquire_ctx. Encoding the respective
pre-conditions for ww_mutex lock/unlock based on ww_acquire_ctx state
using Clang's context analysis makes incorrect use of the API harder.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-21-elver@google.com
|
|
Add support for Clang's context analysis for local_lock_t and
local_trylock_t.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-20-elver@google.com
|
|
Add support for Clang's context analysis for rw_semaphore.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-18-elver@google.com
|
|
Add support for Clang's context analysis for SRCU.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-16-elver@google.com
|
|
Improve the existing annotations to properly support Clang's context
analysis.
The old annotations distinguished between RCU, RCU_BH, and RCU_SCHED;
however, to more easily be able to express that "hold the RCU read lock"
without caring if the normal, _bh(), or _sched() variant was used we'd
have to remove the distinction of the latter variants: change the _bh()
and _sched() variants to also acquire "RCU".
When (and if) we introduce context locks to denote more generally that
"IRQ", "BH", "PREEMPT" contexts are disabled, it would make sense to
acquire these instead of RCU_BH and RCU_SCHED respectively.
The above change also simplified introducing __guarded_by support, where
only the "RCU" context lock needs to be held: introduce __rcu_guarded,
where Clang's context analysis warns if a pointer is dereferenced
without any of the RCU locks held, or updated without the appropriate
helpers.
The primitives rcu_assign_pointer() and friends are wrapped with
context_unsafe(), which enforces using them to update RCU-protected
pointers marked with __rcu_guarded.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-15-elver@google.com
|
|
The annotations for bit_spinlock.h have simply been using "bitlock" as
the token. For Sparse, that was likely sufficient in most cases. But
Clang's context analysis is more precise, and we need to ensure we
can distinguish different bitlocks.
To do so, add a token context, and a macro __bitlock(bitnum, addr)
that is used to construct unique per-bitlock tokens.
Add the appropriate test.
<linux/list_bl.h> is implicitly included through other includes, and
requires 2 annotations to indicate that acquisition (without release)
and release (without prior acquisition) of its bitlock is intended.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-14-elver@google.com
|
|
Add support for Clang's context analysis for seqlock_t.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-12-elver@google.com
|
|
Add support for Clang's context analysis for mutex.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-11-elver@google.com
|
|
Add support for Clang's context analysis for raw_spinlock_t,
spinlock_t, and rwlock. This wholesale conversion is required because
all three of them are interdependent.
To avoid warnings in constructors, the initialization functions mark a
lock as acquired when initialized before guarded variables.
The test verifies that common patterns do not generate false positives.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-9-elver@google.com
|
|
Add a simple test stub where we will add common supported patterns that
should not generate false positives for each new supported context lock.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-4-elver@google.com
|
|
Context Analysis is a language extension, which enables statically
checking that required contexts are active (or inactive), by acquiring
and releasing user-definable "context locks". An obvious application is
lock-safety checking for the kernel's various synchronization primitives
(each of which represents a "context lock"), and checking that locking
rules are not violated.
Clang originally called the feature "Thread Safety Analysis" [1]. This
was later changed and the feature became more flexible, gaining the
ability to define custom "capabilities". Its foundations can be found in
"Capability Systems" [2], used to specify the permissibility of
operations to depend on some "capability" being held (or not held).
Because the feature is not just able to express "capabilities" related
to synchronization primitives, and "capability" is already overloaded in
the kernel, the naming chosen for the kernel departs from Clang's
"Thread Safety" and "capability" nomenclature; we refer to the feature
as "Context Analysis" to avoid confusion. The internal implementation
still makes references to Clang's terminology in a few places, such as
`-Wthread-safety` being the warning option that also still appears in
diagnostic messages.
[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf
See more details in the kernel-doc documentation added in this and
subsequent changes.
Clang version 22+ is required.
[peterz: disable the thing for __CHECKER__ builds]
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-3-elver@google.com
|
|
If you use an IDR with a non-zero base, and specify a range that lies
entirely below the base, 'max - base' becomes very large and
idr_get_free() can return an ID that lies outside of the requested range.
Link: https://lkml.kernel.org/r/20251128161853.3200058-1-willy@infradead.org
Fixes: 6ce711f27500 ("idr: Make 1-based IDRs more efficient")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6449
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"Drop unused parameter from kunit_device_register_internal and make
FAULT_TEST default to n when PANIC_ON_OOPS"
* tag 'linux_kselftest-kunit-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: make FAULT_TEST default to n when PANIC_ON_OOPS
kunit: Drop unused parameter from kunit_device_register_internal
|
|
Subsequent patches in the series change vmlinux linking scripts to
unconditionally pass --btf_encode_detached to pahole, which was
introduced in v1.22 [1][2].
This change allows to remove PAHOLE_HAS_SPLIT_BTF Kconfig option and
other checks of older pahole versions.
[1] https://github.com/acmel/dwarves/releases/tag/v1.22
[2] https://lore.kernel.org/bpf/cbafbf4e-9073-4383-8ee6-1353f9e5869c@oracle.com/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-1-ihor.solodrai@linux.dev
|
|
The IRQ timing tracking infrastructure was merged in 2019, but was never
plumbed in, is not selectable, and is therefore never used.
As Daniel agrees that there is little hope for this infrastructure to be
completed in the near term, drop it altogether.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/87zf7vex6h.wl-maz@kernel.org
Link: https://patch.msgid.link/20251210082242.360936-2-maz@kernel.org
|
|
As describe in the help string, the user might want to disable these
tests if they don't like to see stacktraces/BUG etc in their kernel log.
However, if they enable PANIC_ON_OOPS, these tests also crash the
machine, which it's safe to assume _almost_ nobody wants.
One might argue that _absolutely_ nobody ever wants their kernel to
crash so this should just be a hard dependency instead of a default.
However, since this is rather special code that's anyway concerned with
deliberately doing "bad" things, the normal rules don't seem to apply,
hence prefer flexibility and allow users to set up a crashing Kconfig if
they so choose.
Link: https://lore.kernel.org/r/20251207-kunit-fault-no-panic-v1-1-2ac932f26864@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The passed driver isn't used, so just drop this parameter.
Link: https://lore.kernel.org/r/20251210065839.482608-2-u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
poly1305-core.S is an auto-generated file, so it should be ignored.
Fixes: bef9c7559869 ("lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Link: https://lore.kernel.org/r/20251212184717.133701-1-cmirabil@redhat.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Improve bug reporting
- Suppress W=1 format warning
- Improve rseq scalability on Clang builds
* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Always inline rseq_debug_syscall_return()
bug: Hush suggest-attribute=format for __warn_printf()
bug: Let report_bug_entry() provide the correct bugaddr
|
|
Recent additions to this function cause GCC 14.3.0 to get excited
(W=1) and suggest a missing attribute:
lib/bug.c: In function '__warn_printf':
lib/bug.c:187:25: error: function '__warn_printf' be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
187 | vprintk(fmt, *args);
| ^~~~~~~
Disable the diagnostic locally, following the pattern used for stuff
like va_format().
Fixes: 5c47b7f3d1a9 ("bug: Add BUG_FORMAT_ARGS infrastructure")
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251207-warn-printf-gcc-v1-1-b597d612b94b@google.com
|
|
report_bug_entry() always provides zero for bugaddr but could easily
extract the correct address from the provided bug_entry. Just do that to
have proper warning messages.
E.g. adding an artificial:
void foo(void) { WARN_ONCE(1, "bar"); }
function generates this warning message:
WARNING: arch/s390/kernel/setup.c:1017 at 0x0, CPU#0: swapper/0/0
^^^
With the correct bug address this changes to:
WARNING: arch/s390/kernel/setup.c:1017 at foo+0x1c/0x40, CPU#0: swapper/0/0
^^^^^^^^^^^^^
Fixes: 7d2c27a0ec5e ("bug: Add report_bug_entry()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251208200658.3431511-1-hca@linux.ibm.com
|
|
As we're doing in the BLAKE2b code, use unrolled_full to make the
compiler handle the loop unrolling. This simplifies the code slightly.
The generated object code is nearly the same with both gcc and clang.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205051155.25274-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
BLAKE2b has a state of 16 64-bit words. Add the message data in and
there are 32 64-bit words. With the current code where all the rounds
are unrolled to enable constant-folding of the blake2b_sigma values,
this results in a very large code size on 32-bit kernels, including a
recurring issue where gcc uses a large amount of stack.
There's just not much benefit to this unrolling when the code is already
so large. Let's roll up the rounds when !CONFIG_64BIT.
To avoid having to duplicate the code, just write the code once using a
loop, and conditionally use 'unrolled_full' from <linux/unroll.h>.
Then, fold the now-unneeded ROUND() macro into the loop. Finally, also
remove the now-unneeded override of the stack frame size warning.
Code size improvements for blake2b_compress_generic():
Size before (bytes) Size after (bytes)
------------------- ------------------
i386, gcc 27584 3632
i386, clang 18208 3248
arm32, gcc 19912 2860
arm32, clang 21336 3344
Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64
processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or
442 MB/s => 375 MB/s (clang). So clearly not much of a slowdown either.
But also that microbenchmark also effectively disregards cache usage,
which is important in practice and is far better in the smaller code.
Note: If we rolled up the loop on x86_64 too, the change would be
7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or
6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang).
Maybe still worth it, though not quite as clearly beneficial.
Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205050330.89704-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.
This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient. (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)
This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet. Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option. The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.
Fixes: eb24af5d7a05 ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853dfa0 ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470ffe ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a08b ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255afa2 ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bbf3 ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable@vger.kernel.org
Reported-by: Vivian Wang <wangruikang@iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih@sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
In chacha_zvkb, avoid using the s0 register, which is the frame pointer,
by reallocating KEY0 to t5. This makes stack traces available if e.g. a
crash happens in chacha_zvkb.
No frame pointer maintenance is otherwise required since this is a leaf
function.
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@iscas.ac.cn
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Add a cond_lock annotation for lockref_put_or_lock to make sparse
happy with using it. Note that for this the return value has to be
double-inverted as the return value convention of lockref_put_or_lock
is inverted compared to _trylock conventions expected by __cond_lock,
as lockref_put_or_lock returns true when it did not need to take the
lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
"The Termius 10x18 console bitmap font has been added. It is good
match for modern 13-16 inch laptop displays with resolutions like
1280x800 and 1440x900 pixels.
The gbefb and tcx.c drivers got some fixes to restore X11 support,
pxafb was not actually clamping input values and the ssd1307fb driver
leaked memory in the failure path.
The other patches convert some common drivers to use dev_info() and
dev_dbg() instead of printk(). Summary:
Framework updates:
- fonts: Add Terminus 10x18 console font [Neilay Kharwadkar]
Driver fixes:
- gbefb: fix to use physical address instead of dma address [René Rebe]
- tcx.c fix mem_map to correct smem_start offset [René Rebe]
- pxafb: Fix multiple clamped values in pxafb_adjust_timing [Thorsten Blum]
- ssd1307fb: fix potential page leak in ssd1307fb_probe() [Abdun Nihaal]
Cleanups:
- vga16fb: Request memory region [Javier Garcia]
- vga16fb: replace printk() with dev_*() in probe [Vivek BalachandharTN]
- vesafb, gxt4500fb, tridentfb: Use dev_dbg() instead of printk() [Javier Garcia]
- i810: use dev_info() [Shi Hao]"
* tag 'fbdev-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: ssd1307fb: fix potential page leak in ssd1307fb_probe()
fbdev: i810: use appopriate log interface dev_info
fbdev: tridentfb: replace printk() with dev_*() in probe
lib/fonts: Add Terminus 10x18 console font
fbdev: pxafb: Fix multiple clamped values in pxafb_adjust_timing
fbdev: tcx.c fix mem_map to correct smem_start offset
fbdev: gxt4500fb: Use dev_err instead of printk
fbdev: gbefb: fix to use physical address instead of dma address
fbdev: vesafb: Use dev_* fn's instead printk
fbdev: vga16fb: Request memory region
fbdev: vga16fb: replace printk() with dev_*() in probe
|