<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/arm64/crypto, branch v5.18</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>crypto: arm64 - cleanup comments</title>
<updated>2022-03-09T03:12:32+00:00</updated>
<author>
<name>Tom Rix</name>
<email>trix@redhat.com</email>
</author>
<published>2022-03-06T13:34:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cd6714f94091308f07a32b79e0a43f983587da3f'/>
<id>cd6714f94091308f07a32b79e0a43f983587da3f</id>
<content type='text'>
For spdx, use // for *.c files

Replacements
significanty to significantly

Signed-off-by: Tom Rix &lt;trix@redhat.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For spdx, use // for *.c files

Replacements
significanty to significantly

Signed-off-by: Tom Rix &lt;trix@redhat.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-neonbs-xts - use plain NEON for non-power-of-2 input sizes</title>
<updated>2022-02-05T04:10:51+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2022-01-27T11:35:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dfc6031ec917b7c34a7549c3120f841b2e2be6db'/>
<id>dfc6031ec917b7c34a7549c3120f841b2e2be6db</id>
<content type='text'>
Even though the kernel's implementations of AES-XTS were updated to
implement ciphertext stealing and can operate on inputs of any size
larger than or equal to the AES block size, this feature is rarely used
in practice.

In fact, in the kernel, AES-XTS is only used to operate on 4096 or 512
byte blocks, which means that not only the ciphertext stealing is
effectively dead code, the logic in the bit sliced NEON implementation
to deal with fewer than 8 blocks at a time is also never used.

Since the bit-sliced NEON driver already depends on the plain NEON
version, which is slower but can operate on smaller data quantities more
straightforwardly, let's fallback to the plain NEON implementation of
XTS for any residual inputs that are not multiples of 128 bytes. This
allows us to remove a lot of complicated logic that rarely gets
exercised in practice.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even though the kernel's implementations of AES-XTS were updated to
implement ciphertext stealing and can operate on inputs of any size
larger than or equal to the AES block size, this feature is rarely used
in practice.

In fact, in the kernel, AES-XTS is only used to operate on 4096 or 512
byte blocks, which means that not only the ciphertext stealing is
effectively dead code, the logic in the bit sliced NEON implementation
to deal with fewer than 8 blocks at a time is also never used.

Since the bit-sliced NEON driver already depends on the plain NEON
version, which is slower but can operate on smaller data quantities more
straightforwardly, let's fallback to the plain NEON implementation of
XTS for any residual inputs that are not multiples of 128 bytes. This
allows us to remove a lot of complicated logic that rarely gets
exercised in practice.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk</title>
<updated>2022-02-05T04:10:51+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2022-01-27T11:35:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fc074e130051015e39245a4241956ff122e2f465'/>
<id>fc074e130051015e39245a4241956ff122e2f465</id>
<content type='text'>
Instead of processing the entire input with the 8-way bit sliced
algorithm, which is sub-optimal for inputs that are not a multiple of
128 bytes in size, invoke the plain NEON version of CTR for the
remainder of the input after processing the bulk using 128 byte strides.

This allows us to greatly simplify the asm code that implements CTR, and
get rid of all the branches and special code paths. It also gains us a
couple of percent of performance.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of processing the entire input with the 8-way bit sliced
algorithm, which is sub-optimal for inputs that are not a multiple of
128 bytes in size, invoke the plain NEON version of CTR for the
remainder of the input after processing the bulk using 128 byte strides.

This allows us to greatly simplify the asm code that implements CTR, and
get rid of all the branches and special code paths. It also gains us a
couple of percent of performance.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-neon-ctr - improve handling of single tail block</title>
<updated>2022-02-05T04:10:51+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2022-01-27T09:52:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8daa399edeed4cfa792ccea12beda50d445ab6a0'/>
<id>8daa399edeed4cfa792ccea12beda50d445ab6a0</id>
<content type='text'>
Instead of falling back to C code to do a memcpy of the output of the
last block, handle this in the asm code directly if possible, which is
the case if the entire input is longer than 16 bytes.

Cc: Nathan Huckleberry &lt;nhuck@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of falling back to C code to do a memcpy of the output of the
last block, handle this in the asm code directly if possible, which is
the case if the entire input is longer than 16 bytes.

Cc: Nathan Huckleberry &lt;nhuck@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/sm3-ce - make dependent on sm3 library</title>
<updated>2022-01-28T05:51:10+00:00</updated>
<author>
<name>Tianjia Zhang</name>
<email>tianjia.zhang@linux.alibaba.com</email>
</author>
<published>2022-01-07T12:06:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f3a03d319dbdbb206530ebfce977c334ee2f8765'/>
<id>f3a03d319dbdbb206530ebfce977c334ee2f8765</id>
<content type='text'>
SM3 generic library is stand-alone implementation, sm3-ce can depend
on the SM3 library instead of sm3-generic.

Signed-off-by: Tianjia Zhang &lt;tianjia.zhang@linux.alibaba.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SM3 generic library is stand-alone implementation, sm3-ce can depend
on the SM3 library instead of sm3-generic.

Signed-off-by: Tianjia Zhang &lt;tianjia.zhang@linux.alibaba.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: Add macro version of the BTI instruction</title>
<updated>2021-12-14T18:12:58+00:00</updated>
<author>
<name>Mark Brown</name>
<email>broonie@kernel.org</email>
</author>
<published>2021-12-14T15:27:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9be34be87cc8d1afe3c3bc2e645b4dee512d9eda'/>
<id>9be34be87cc8d1afe3c3bc2e645b4dee512d9eda</id>
<content type='text'>
BTI is only available from v8.5 so we need to encode it using HINT in
generic code and for older toolchains. Add an assembler macro based on
one written by Mark Rutland which lets us use the mnemonic and update
the existing users.

Suggested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Mark Brown &lt;broonie@kernel.org&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20211214152714.2380849-2-broonie@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BTI is only available from v8.5 so we need to encode it using HINT in
generic code and for older toolchains. Add an assembler macro based on
one written by Mark Rutland which lets us use the mnemonic and update
the existing users.

Suggested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Mark Brown &lt;broonie@kernel.org&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20211214152714.2380849-2-broonie@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-ccm - avoid by-ref argument for ce_aes_ccm_auth_data</title>
<updated>2021-09-17T03:05:11+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2021-08-27T07:03:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=898387e40cf538b7d1605e05d456699fe418a77f'/>
<id>898387e40cf538b7d1605e05d456699fe418a77f</id>
<content type='text'>
With the SIMD code path removed, we can clean up the CCM auth-only path
a bit further, by passing the 'macp' input buffer pointer by value,
rather than by reference, and taking the output value from the
function's return value.

This way, the compiler is no longer forced to allocate macp on the
stack. This is not expected to make any difference in practice, it just
makes for slightly cleaner code.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the SIMD code path removed, we can clean up the CCM auth-only path
a bit further, by passing the 'macp' input buffer pointer by value,
rather than by reference, and taking the output value from the
function's return value.

This way, the compiler is no longer forced to allocate macp on the
stack. This is not expected to make any difference in practice, it just
makes for slightly cleaner code.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-ccm - reduce NEON begin/end calls for common case</title>
<updated>2021-09-17T03:05:11+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2021-08-27T07:03:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=741691c44606b1903e674d12f3e4a4b68ade69ad'/>
<id>741691c44606b1903e674d12f3e4a4b68ade69ad</id>
<content type='text'>
AES-CCM (as used in WPA2 CCMP, for instance) typically involves
authenticate-only data, and operates on a single network packet, and so
the common case is for the authenticate, en/decrypt and finalize SIMD
helpers to all be called exactly once in sequence. Since
kernel_neon_end() now involves manipulation of the preemption state as
well as the softirq mask state, let's reduce the number of times we are
forced to call it to only once if we are handling this common case.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
AES-CCM (as used in WPA2 CCMP, for instance) typically involves
authenticate-only data, and operates on a single network packet, and so
the common case is for the authenticate, en/decrypt and finalize SIMD
helpers to all be called exactly once in sequence. Since
kernel_neon_end() now involves manipulation of the preemption state as
well as the softirq mask state, let's reduce the number of times we are
forced to call it to only once if we are handling this common case.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-ccm - remove non-SIMD fallback path</title>
<updated>2021-09-17T03:05:11+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2021-08-27T07:03:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3482635e5d69c8a40288bd025f61a994b3b1126'/>
<id>b3482635e5d69c8a40288bd025f61a994b3b1126</id>
<content type='text'>
AES/CCM on arm64 is implemented as a synchronous AEAD, and so it is
guaranteed by the API that it is only invoked in task or softirq
context. Since softirqs are now only handled when the SIMD is not
being used in the task context that was interrupted to service the
softirq, we no longer need a fallback path. Let's remove it.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
AES/CCM on arm64 is implemented as a synchronous AEAD, and so it is
guaranteed by the API that it is only invoked in task or softirq
context. Since softirqs are now only handled when the SIMD is not
being used in the task context that was interrupted to service the
softirq, we no longer need a fallback path. Let's remove it.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crypto: arm64/aes-ccm - yield NEON when processing auth-only data</title>
<updated>2021-09-17T03:05:10+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2021-08-27T07:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=36a916af641dc71ef7d4b98417bf4019ddeb4ebe'/>
<id>36a916af641dc71ef7d4b98417bf4019ddeb4ebe</id>
<content type='text'>
In SIMD accelerated crypto drivers, we typically yield the SIMD unit
after processing 4 KiB of input, to avoid scheduling blackouts caused by
the fact that claiming the SIMD unit disables preemption as well as
softirq processing.

The arm64 CCM driver does this implicitly for the ciphertext, due to the
fact that the skcipher API never processes more than a single page at a
time. However, the scatterwalk performed by this driver when processing
the authenticate-only data will keep the SIMD unit occupied until it
completes.

So cap the scatterwalk steps to 4 KiB.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In SIMD accelerated crypto drivers, we typically yield the SIMD unit
after processing 4 KiB of input, to avoid scheduling blackouts caused by
the fact that claiming the SIMD unit disables preemption as well as
softirq processing.

The arm64 CCM driver does this implicitly for the ciphertext, due to the
fact that the skcipher API never processes more than a single page at a
time. However, the scatterwalk performed by this driver when processing
the authenticate-only data will keep the SIMD unit occupied until it
completes.

So cap the scatterwalk steps to 4 KiB.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
