<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/x86/crypto/Makefile, branch master</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge tag 'v7.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6</title>
<updated>2026-04-15T22:22:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-15T22:22:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aec2f682d47c54ef434b2d440992626d80b1ebdc'/>
<id>aec2f682d47c54ef434b2d440992626d80b1ebdc</id>
<content type='text'>
Pull crypto update from Herbert Xu:
 "API:
   - Replace crypto_get_default_rng with crypto_stdrng_get_bytes
   - Remove simd skcipher support
   - Allow algorithm types to be disabled when CRYPTO_SELFTESTS is off

  Algorithms:
   - Remove CPU-based des/3des acceleration
   - Add test vectors for authenc(hmac(md5),cbc({aes,des})) and
     authenc(hmac({md5,sha1,sha224,sha256,sha384,sha512}),rfc3686(ctr(aes)))
   - Replace spin lock with mutex in jitterentropy

  Drivers:
   - Add authenc algorithms to safexcel
   - Add support for zstd in qat
   - Add wireless mode support for QAT GEN6
   - Add anti-rollback support for QAT GEN6
   - Add support for ctr(aes), gcm(aes), and ccm(aes) in dthev2"

* tag 'v7.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (129 commits)
  crypto: af_alg - use sock_kmemdup in alg_setkey_by_key_serial
  crypto: vmx - remove CRYPTO_DEV_VMX from Kconfig
  crypto: omap - convert reqctx buffer to fixed-size array
  crypto: atmel-sha204a - add Thorsten Blum as maintainer
  crypto: atmel-ecc - add Thorsten Blum as maintainer
  crypto: qat - fix IRQ cleanup on 6xxx probe failure
  crypto: geniv - Remove unused spinlock from struct aead_geniv_ctx
  crypto: qce - simplify qce_xts_swapiv()
  crypto: hisilicon - Fix dma_unmap_single() direction
  crypto: talitos - rename first/last to first_desc/last_desc
  crypto: talitos - fix SEC1 32k ahash request limitation
  crypto: jitterentropy - replace long-held spinlock with mutex
  crypto: hisilicon - remove unused and non-public APIs for qm and sec
  crypto: hisilicon/qm - drop redundant variable initialization
  crypto: hisilicon/qm - remove else after return
  crypto: hisilicon/qm - add const qualifier to info_name in struct qm_cmd_dump_item
  crypto: hisilicon - fix the format string type error
  crypto: ccree - fix a memory leak in cc_mac_digest()
  crypto: qat - add support for zstd
  crypto: qat - use swab32 macro
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull crypto update from Herbert Xu:
 "API:
   - Replace crypto_get_default_rng with crypto_stdrng_get_bytes
   - Remove simd skcipher support
   - Allow algorithm types to be disabled when CRYPTO_SELFTESTS is off

  Algorithms:
   - Remove CPU-based des/3des acceleration
   - Add test vectors for authenc(hmac(md5),cbc({aes,des})) and
     authenc(hmac({md5,sha1,sha224,sha256,sha384,sha512}),rfc3686(ctr(aes)))
   - Replace spin lock with mutex in jitterentropy

  Drivers:
   - Add authenc algorithms to safexcel
   - Add support for zstd in qat
   - Add wireless mode support for QAT GEN6
   - Add anti-rollback support for QAT GEN6
   - Add support for ctr(aes), gcm(aes), and ccm(aes) in dthev2"

* tag 'v7.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (129 commits)
  crypto: af_alg - use sock_kmemdup in alg_setkey_by_key_serial
  crypto: vmx - remove CRYPTO_DEV_VMX from Kconfig
  crypto: omap - convert reqctx buffer to fixed-size array
  crypto: atmel-sha204a - add Thorsten Blum as maintainer
  crypto: atmel-ecc - add Thorsten Blum as maintainer
  crypto: qat - fix IRQ cleanup on 6xxx probe failure
  crypto: geniv - Remove unused spinlock from struct aead_geniv_ctx
  crypto: qce - simplify qce_xts_swapiv()
  crypto: hisilicon - Fix dma_unmap_single() direction
  crypto: talitos - rename first/last to first_desc/last_desc
  crypto: talitos - fix SEC1 32k ahash request limitation
  crypto: jitterentropy - replace long-held spinlock with mutex
  crypto: hisilicon - remove unused and non-public APIs for qm and sec
  crypto: hisilicon/qm - drop redundant variable initialization
  crypto: hisilicon/qm - remove else after return
  crypto: hisilicon/qm - add const qualifier to info_name in struct qm_cmd_dump_item
  crypto: hisilicon - fix the format string type error
  crypto: ccree - fix a memory leak in cc_mac_digest()
  crypto: qat - add support for zstd
  crypto: qat - use swab32 macro
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: x86 - Remove des and des3_ede code</title>
<updated>2026-04-03T00:56:12+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-26T20:12:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9a73869cb55051a2cdd4b039d75298e32014b25f'/>
<id>9a73869cb55051a2cdd4b039d75298e32014b25f</id>
<content type='text'>
Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them.  Remove it.

Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since DES and Triple DES are obsolete, there is very little point in
maintining architecture-optimized code for them.  Remove it.

Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crypto: x86/sm3: Migrate optimized code into library</title>
<updated>2026-03-24T00:50:59+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-21T04:09:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=17ba6108d3e084652807826cc49c851c00976f1a'/>
<id>17ba6108d3e084652807826cc49c851c00976f1a</id>
<content type='text'>
Instead of exposing the x86-optimized SM3 code via an x86-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function.  This is much simpler, it makes the SM3 library functions be
x86-optimized, and it fixes the longstanding issue where the
x86-optimized SM3 code was disabled by default.  SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Tweak the prototype of sm3_transform_avx() to match what the library
expects, including changing the block count to size_t.  Note that the
assembly code actually already treated this argument as size_t.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of exposing the x86-optimized SM3 code via an x86-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function.  This is much simpler, it makes the SM3 library functions be
x86-optimized, and it fixes the longstanding issue where the
x86-optimized SM3 code was disabled by default.  SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.

Tweak the prototype of sm3_transform_avx() to match what the library
expects, including changing the block count to size_t.  Note that the
assembly code actually already treated this argument as size_t.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crypto: x86/ghash: Migrate optimized code into library</title>
<updated>2026-03-23T23:44:29+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2026-03-19T06:17:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e79c8ec49596288c4460029c4971b9c838103b9'/>
<id>3e79c8ec49596288c4460029c4971b9c838103b9</id>
<content type='text'>
Remove the "ghash-pclmulqdqni" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.

This makes the GHASH library be optimized with x86's carryless
multiplication instructions.  It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.

Rename and adjust the prototypes of the assembly functions to make them
fit better with the library.  Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.

Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication.  Wire it up to both
ghash_mul_arch() and polyval_mul_arch().

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the "ghash-pclmulqdqni" crypto_shash algorithm.  Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.

This makes the GHASH library be optimized with x86's carryless
multiplication instructions.  It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.

Rename and adjust the prototypes of the assembly functions to make them
fit better with the library.  Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.

Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication.  Wire it up to both
ghash_mul_arch() and polyval_mul_arch().

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crypto: x86/nh: Migrate optimized code into library</title>
<updated>2026-01-12T19:07:50+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-12-11T01:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a229d83235c7627c490deb7dd4744a72567cea12'/>
<id>a229d83235c7627c490deb7dd4744a72567cea12</id>
<content type='text'>
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Migrate the x86_64 implementations of NH into lib/crypto/.  This makes
the nh() function be optimized on x86_64 kernels.

Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code.  This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".

Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux</title>
<updated>2025-12-03T02:24:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-12-03T02:24:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8f4c9978de91a9a3b37df1e74d6201acfba6cefd'/>
<id>8f4c9978de91a9a3b37df1e74d6201acfba6cefd</id>
<content type='text'>
Pull AES-GCM optimizations from Eric Biggers:
 "More optimizations and cleanups for the x86_64 AES-GCM code:

   - Add a VAES+AVX2 optimized implementation of AES-GCM. This is very
     helpful on CPUs that have VAES but not AVX512, such as AMD Zen 3.

   - Make the VAES+AVX512 optimized implementation of AES-GCM handle
     large amounts of associated data efficiently.

   - Remove the "avx10_256" implementation of AES-GCM. It's superseded
     by the VAES+AVX2 optimized implementation.

   - Rename the "avx10_512" implementation to "avx512"

  Overall, this fills in a gap where AES-GCM wasn't fully optimized on
  some recent CPUs. It also drops code that won't be as useful as
  initially expected due to AVX10/256 being dropped from the AVX10 spec"

* tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  crypto: x86/aes-gcm-vaes-avx2 - initialize full %rax return register
  crypto: x86/aes-gcm - optimize long AAD processing with AVX512
  crypto: x86/aes-gcm - optimize AVX512 precomputation of H^2 from H^1
  crypto: x86/aes-gcm - revise some comments in AVX512 code
  crypto: x86/aes-gcm - reorder AVX512 precompute and aad_update functions
  crypto: x86/aes-gcm - clean up AVX512 code to assume 512-bit vectors
  crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512
  crypto: x86/aes-gcm - remove VAES+AVX10/256 optimized code
  crypto: x86/aes-gcm - add VAES+AVX2 optimized code
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull AES-GCM optimizations from Eric Biggers:
 "More optimizations and cleanups for the x86_64 AES-GCM code:

   - Add a VAES+AVX2 optimized implementation of AES-GCM. This is very
     helpful on CPUs that have VAES but not AVX512, such as AMD Zen 3.

   - Make the VAES+AVX512 optimized implementation of AES-GCM handle
     large amounts of associated data efficiently.

   - Remove the "avx10_256" implementation of AES-GCM. It's superseded
     by the VAES+AVX2 optimized implementation.

   - Rename the "avx10_512" implementation to "avx512"

  Overall, this fills in a gap where AES-GCM wasn't fully optimized on
  some recent CPUs. It also drops code that won't be as useful as
  initially expected due to AVX10/256 being dropped from the AVX10 spec"

* tag 'aes-gcm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  crypto: x86/aes-gcm-vaes-avx2 - initialize full %rax return register
  crypto: x86/aes-gcm - optimize long AAD processing with AVX512
  crypto: x86/aes-gcm - optimize AVX512 precomputation of H^2 from H^1
  crypto: x86/aes-gcm - revise some comments in AVX512 code
  crypto: x86/aes-gcm - reorder AVX512 precompute and aad_update functions
  crypto: x86/aes-gcm - clean up AVX512 code to assume 512-bit vectors
  crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512
  crypto: x86/aes-gcm - remove VAES+AVX10/256 optimized code
  crypto: x86/aes-gcm - add VAES+AVX2 optimized code
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/crypto: x86/polyval: Migrate optimized code into library</title>
<updated>2025-11-11T19:03:38+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-11-09T23:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4d8da35579daad0392d238460ed7e9629d49ca35'/>
<id>4d8da35579daad0392d238460ed7e9629d49ca35</id>
<content type='text'>
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it
up to the POLYVAL library interface.  This makes the POLYVAL library be
properly optimized on x86_64.

This drops the x86_64 optimizations of polyval in the crypto_shash API.
That's fine, since polyval will be removed from crypto_shash entirely
since it is unneeded there.  But even if it comes back, the crypto_shash
API could just be implemented on top of the library API, as usual.

Adjust the names and prototypes of the assembly functions to align more
closely with the rest of the library code.

Also replace a movaps instruction with movups to remove the assumption
that the key struct is 16-byte aligned.  Users can still align the key
if they want (and at least in this case, movups is just as fast as
movaps), but it's inconvenient to require it.

Reviewed-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: x86/aes-gcm - rename avx10 and avx10_512 to avx512</title>
<updated>2025-10-27T03:37:40+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-10-02T02:31:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=12beec21c50950cc9a1907750200af4eb99a8aca'/>
<id>12beec21c50950cc9a1907750200af4eb99a8aca</id>
<content type='text'>
With the "avx10_256" code removed and the AVX10 specification having
been changed to basically just be a re-packaged AVX512, the "avx10_512"
name no longer makes sense.  Replace it with "avx512".

While doing this, also add the "vaes_" prefix in places that didn't
already have it.  The result is that the two VAES optimized
implementations are consistently called vaes_avx2 and vaes_avx512.
(Also drop the "-x86_64" part of the assembly filename, to keep it from
getting too long.  There's no 32-bit version of this code, and the fact
that it's 64-bit is unremarkable; it's the norm for new code.)

Note: although aes_gcm_aad_update_vaes_avx512() (previously called
aes_gcm_aad_update_vaes_avx10()) uses at most 256-bit vectors, it still
depends on the AVX512 CPU feature.  So its new name is still accurate.
Also, a later commit will make it sometimes use 512-bit vectors anyway.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251002023117.37504-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the "avx10_256" code removed and the AVX10 specification having
been changed to basically just be a re-packaged AVX512, the "avx10_512"
name no longer makes sense.  Replace it with "avx512".

While doing this, also add the "vaes_" prefix in places that didn't
already have it.  The result is that the two VAES optimized
implementations are consistently called vaes_avx2 and vaes_avx512.
(Also drop the "-x86_64" part of the assembly filename, to keep it from
getting too long.  There's no 32-bit version of this code, and the fact
that it's 64-bit is unremarkable; it's the norm for new code.)

Note: although aes_gcm_aad_update_vaes_avx512() (previously called
aes_gcm_aad_update_vaes_avx10()) uses at most 256-bit vectors, it still
depends on the AVX512 CPU feature.  So its new name is still accurate.
Also, a later commit will make it sometimes use 512-bit vectors anyway.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251002023117.37504-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: x86/aes-gcm - add VAES+AVX2 optimized code</title>
<updated>2025-10-27T03:37:40+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@kernel.org</email>
</author>
<published>2025-10-02T02:31:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fae3b96ba6015c35a973da09bf313d90e4e4bb94'/>
<id>fae3b96ba6015c35a973da09bf313d90e4e4bb94</id>
<content type='text'>
Add an implementation of AES-GCM that uses 256-bit vectors and the
following CPU features: Vector AES (VAES), Vector Carryless
Multiplication (VPCLMULQDQ), and AVX2.

It doesn't require AVX512.  So unlike the existing VAES+AVX512 code, it
works on CPUs that support VAES but not AVX512, specifically:

    - AMD Zen 3, both client and server
    - Intel Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, and Lunar
      Lake.  (These are client CPUs.)
    - Intel Sierra Forest.  (This is a server CPU.)

On these CPUs, this VAES+AVX2 code is much faster than the existing
AES-NI code.  The AES-NI code uses only 128-bit vectors.

These CPUs are widely deployed, making VAES+AVX2 code worthwhile even
though hopefully future x86_64 CPUs will uniformly support AVX512.

This implementation will also serve as the fallback 256-bit
implementation for older Intel CPUs (Ice Lake and Tiger Lake) that
support AVX512 but downclock too eagerly when 512-bit vectors are used.
Currently, the VAES+AVX10/256 implementation serves that purpose.  A
later commit will remove that and just use the VAES+AVX2 one.  (Note
that AES-XTS and AES-CTR already successfully use this approach.)

I originally wrote this AES-GCM implementation for BoringSSL.  It's been
in BoringSSL for a while now, including in Chromium.  This is a port of
it to the Linux kernel.  The main changes in the Linux version include:

- Port from "perlasm" to a standard .S file.
- Align all assembly functions with what aesni-intel_glue.c expects,
  including adding support for lengths not a multiple of 16 bytes.
- Rework the en/decryption of the final 1 to 127 bytes.

This commit increases AES-256-GCM throughput on AMD Milan (Zen 3) by up
to 74%, as shown by the following tables:

Table 1: AES-256-GCM encryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   67% |   59% |   61% |   39% |   23% |   27% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   14% |   12% |    7% |    7% |    0% |

Table 2: AES-256-GCM decryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   74% |   65% |   65% |   44% |   23% |   26% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   12% |   11% |    3% |    2% |   -3% |

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251002023117.37504-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an implementation of AES-GCM that uses 256-bit vectors and the
following CPU features: Vector AES (VAES), Vector Carryless
Multiplication (VPCLMULQDQ), and AVX2.

It doesn't require AVX512.  So unlike the existing VAES+AVX512 code, it
works on CPUs that support VAES but not AVX512, specifically:

    - AMD Zen 3, both client and server
    - Intel Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, and Lunar
      Lake.  (These are client CPUs.)
    - Intel Sierra Forest.  (This is a server CPU.)

On these CPUs, this VAES+AVX2 code is much faster than the existing
AES-NI code.  The AES-NI code uses only 128-bit vectors.

These CPUs are widely deployed, making VAES+AVX2 code worthwhile even
though hopefully future x86_64 CPUs will uniformly support AVX512.

This implementation will also serve as the fallback 256-bit
implementation for older Intel CPUs (Ice Lake and Tiger Lake) that
support AVX512 but downclock too eagerly when 512-bit vectors are used.
Currently, the VAES+AVX10/256 implementation serves that purpose.  A
later commit will remove that and just use the VAES+AVX2 one.  (Note
that AES-XTS and AES-CTR already successfully use this approach.)

I originally wrote this AES-GCM implementation for BoringSSL.  It's been
in BoringSSL for a while now, including in Chromium.  This is a port of
it to the Linux kernel.  The main changes in the Linux version include:

- Port from "perlasm" to a standard .S file.
- Align all assembly functions with what aesni-intel_glue.c expects,
  including adding support for lengths not a multiple of 16 bytes.
- Rework the en/decryption of the final 1 to 127 bytes.

This commit increases AES-256-GCM throughput on AMD Milan (Zen 3) by up
to 74%, as shown by the following tables:

Table 1: AES-256-GCM encryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   67% |   59% |   61% |   39% |   23% |   27% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   14% |   12% |    7% |    7% |    0% |

Table 2: AES-256-GCM decryption throughput change,
         CPU vs. message length in bytes:

                      | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
----------------------+-------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   74% |   65% |   65% |   44% |   23% |   26% |

                      |   300 |   200 |    64 |    63 |    16 |
----------------------+-------+-------+-------+-------+-------+
AMD Milan (Zen 3)     |   12% |   11% |    3% |    2% |   -3% |

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20251002023117.37504-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-10-11T17:51:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-10-11T17:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2f0a7504530c24f55daec7d2364d933bb1a1fa68'/>
<id>2f0a7504530c24f55daec7d2364d933bb1a1fa68</id>
<content type='text'>
Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in &lt;kernel/cpu/sgx/encls.h&gt;
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in &lt;asm/special_insns.h&gt;
  x86/idle: Use MONITORX and MWAITX mnemonics in &lt;asm/mwait.h&gt;
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 cleanups from Borislav Petkov:

 - Simplify inline asm flag output operands now that the minimum
   compiler version supports the =@ccCOND syntax

 - Remove a bunch of AS_* Kconfig symbols which detect assembler support
   for various instruction mnemonics now that the minimum assembler
   version supports them all

 - The usual cleanups all over the place

* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
  x86/sgx: Use ENCLS mnemonic in &lt;kernel/cpu/sgx/encls.h&gt;
  x86/mtrr: Remove license boilerplate text with bad FSF address
  x86/asm: Use RDPKRU and WRPKRU mnemonics in &lt;asm/special_insns.h&gt;
  x86/idle: Use MONITORX and MWAITX mnemonics in &lt;asm/mwait.h&gt;
  x86/entry/fred: Push __KERNEL_CS directly
  x86/kconfig: Remove CONFIG_AS_AVX512
  crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
  crypto: X86 - Remove CONFIG_AS_VAES
  crypto: x86 - Remove CONFIG_AS_GFNI
  x86/kconfig: Drop unused and needless config X86_64_SMP
</pre>
</div>
</content>
</entry>
</feed>
