[PATCH 00/36] AES library improvements
Ard Biesheuvel
ardb at kernel.org
Fri Jan 9 01:08:02 PST 2026
On Fri, 9 Jan 2026 at 02:27, Eric Biggers <ebiggers at kernel.org> wrote:
>
> On Thu, Jan 08, 2026 at 12:26:18PM -0800, Eric Biggers wrote:
> > On Thu, Jan 08, 2026 at 12:32:00PM +0100, Ard Biesheuvel wrote:
> > > On Mon, 5 Jan 2026 at 06:14, Eric Biggers <ebiggers at kernel.org> wrote:
> > > >
> > > > This series applies to libcrypto-next. It can also be retrieved from:
> > > >
> > > > git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git aes-lib-v1
> > > >
> > > > This series makes three main improvements to the kernel's AES library:
> > > >
> > > > 1. Make it use the kernel's existing architecture-optimized AES code,
> > > > including AES instructions, when available. Previously, only the
> > > > traditional crypto API gave access to the optimized AES code.
> > > > (As a reminder, AES instructions typically make AES over 10 times
> > > > as fast as the generic code. They also make it constant-time.)
> > > >
> > > > 2. Support preparing an AES key for only the forward direction of the
> > > > block cipher, using about half as much memory. This is a helpful
> > > > optimization for many common AES modes of operation. It also helps
> > > > keep structs small enough to be allocated on the stack, especially
> > > > considering potential future library APIs for AES modes.
> > > >
> > > > 3. Replace the library's generic AES implementation with a much faster
> > > > one that is almost as fast as "aes-generic", while still keeping
> > > > the table size reasonably small and maintaining some constant-time
> > > > hardening. This allows removing "aes-generic", unifying the
> > > > current two generic AES implementations in the kernel tree.
> > > >
> > >
> > > Architectures that support memory operands will be impacted by
> > > dropping the pre-rotated lookup tables, especially if they have few
> > > GPRs.
> > >
> > > I suspect that doesn't really matter in practice: if your pre-AESNI
> > > IA-32 workload has a bottleneck on "aes-generic", you would have
> > > probably moved it to a different machine by now. But the performance
> > > delta will likely be noticeable so it is something that deserves a
> > > mention.
> >
> > Sure. I only claimed that the new implementation is "almost as fast" as
> > aes-generic, not "as fast".
> >
> > By the way, these are the results I get for crypto_cipher_encrypt_one()
> > and crypto_cipher_decrypt_one() (averaged together) in a loop on an i386
> > kernel patched to not use AES-NI:
> >
> > aes-fixed-time: 77 MB/s
> > aes-generic: 192 MB/s
> > aes-lib: 185 MB/s
> >
> > I'm not sure how relevant these are, considering that this was collected
> > on a modern CPU, not one of the (very) old ones that would actually be
> > running i386 non-AESNI code. But if they are even vaguely
> > representative, this suggests the new code does quite well: little
> > slowdown over aes-generic, while adding some constant-time hardening
> > (which arguably was an undeserved shortcut to not include before) and
> > also using a lot less dcache.
> >
> > At the same time, there's clearly a large speedup vs. aes-fixed-time.
> > So this will actually be a significant performance improvement on
> > systems that were using aes-fixed-time. Many people may have been doing
> > that unintentionally, due to it being set to a higher priority than
> > aes-generic in the crypto_cipher API.
> >
> > I'll also note that the state of the art for parallelizable AES modes on
> > CPUs without AES instructions is bit-slicing with vector registers. The
> > kernel has such code for arm and arm64, but not for x86. If x86 without
> > AES-NI was actually important, we should be adding that. But it seems
> > clear that x86 CPUs have moved on, and hardly anyone cares anymore. If
> > for now we can just provide something that's almost as fast as before
> > (and maybe even a lot faster in some cases!), that seems fine.
>
> It's also worth emphasizing that there are likely to be systems that
> support AES instructions but are not using them due to the corresponding
> kconfig options (e.g. CONFIG_CRYPTO_AES_NI_INTEL) not being set to 'y'.
> As we know, missing the crypto optimization kconfig options is a common
> mistake. This series fixes that for single-block AES.
>
> So (in addition to the aes-fixed-time case) that's another case that
> just gets faster, and where the difference between aes-generic and the
> new generic code isn't actually relevant.
>
Fair enough. Thanks for the elaboration.
More information about the linux-riscv
mailing list