[PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave of aes and ghash

Xiaokang Qian Xiaokang.Qian at arm.com
Mon Dec 13 17:39:48 PST 2021


Hi Will:
I will post the update version 2 of this patch today or tomorrow.
Sorry for the delay.

> -----Original Message-----
> From: Will Deacon <will at kernel.org>
> Sent: Tuesday, December 14, 2021 2:29 AM
> To: Ard Biesheuvel <ardb at kernel.org>
> Cc: Eric Biggers <ebiggers at kernel.org>; Xiaokang Qian
> <Xiaokang.Qian at arm.com>; Herbert Xu <herbert at gondor.apana.org.au>;
> David S. Miller <davem at davemloft.net>; Catalin Marinas
> <Catalin.Marinas at arm.com>; nd <nd at arm.com>; Linux Crypto Mailing List
> <linux-crypto at vger.kernel.org>; Linux ARM <linux-arm-
> kernel at lists.infradead.org>; Linux Kernel Mailing List <linux-
> kernel at vger.kernel.org>
> Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way
> interleave of aes and ghash
> 
> On Tue, Sep 28, 2021 at 11:04:03PM +0200, Ard Biesheuvel wrote:
> > On Tue, 28 Sept 2021 at 08:27, Eric Biggers <ebiggers at kernel.org> wrote:
> > >
> > > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote:
> > > > To improve performance on cores with deep piplines such as A72,N1,
> > > > implement gcm(aes) using a 4-way interleave of aes and ghash
> > > > (totally
> > > > 8 blocks in parallel), which can make full utilize of pipelines
> > > > rather than the 4-way interleave we used currently. It can gain
> > > > about 20% for big data sizes such that 8k.
> > > >
> > > > This is a complete new version of the GCM part of the combined
> > > > GCM/GHASH driver, it will co-exist with the old driver, only serve
> > > > for big data sizes. Instead of interleaving four invocations of
> > > > AES where each chunk of 64 bytes is encrypted first and then
> > > > ghashed, the new version uses a more coarse grained approach where
> > > > a chunk of 64 bytes is encrypted and at the same time, one chunk
> > > > of 64 bytes is ghashed (or ghashed and decrypted in the converse case).
> > > >
> > > > The table below compares the performance of the old driver and the
> > > > new one on various micro-architectures and running in various
> > > > modes with various data sizes.
> > > >
> > > >             |     AES-128       |     AES-192       |     AES-256       |
> > > >      #bytes | 1024 | 1420 |  8k | 1024 | 1420 |  8k | 1024 | 1420 |  8k |
> > > >      -------+------+------+-----+------+------+-----+------+------+-----+
> > > >         A72 | 5.5% |  12% | 25% | 2.2% |  9.5%|  23%| -1%  |  6.7%| 19% |
> > > >         A57 |-0.5% |  9.3%| 32% | -3%  |  6.3%|  26%| -6%  |  3.3%| 21% |
> > > >         N1  | 0.4% |  7.6%|24.5%| -2%  |  5%  |  22%| -4%  |
> > > > 2.7%| 20% |
> > > >
> > > > Signed-off-by: XiaokangQian <xiaokang.qian at arm.com>
> > >
> > > Does this pass the self-tests, including the fuzz tests which are
> > > enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
> > >
> >
> > Please test both little-endian and big-endian. (Note that you don't
> > need a big-endian user space for this - the self tests are executed
> > before the rootfs is mounted)
> >
> > Also, you will have to rebase this onto the latest cryptodev tree,
> > which carries some changes I made recently to this driver.
> 
> XiaokangQian -- did you post an updated version of this? It would end up
> going via Herbert, but I was keeping half an eye on it and it all seems to have
> gone quiet.
> 
> Thanks,
> 
> Will



More information about the linux-arm-kernel mailing list