[PATCH v2 00/15] SHA-3 library

Eric Biggers ebiggers at kernel.org
Wed Nov 5 20:33:40 PST 2025


On Wed, Nov 05, 2025 at 04:39:01PM +0100, Harald Freudenberger wrote:
> On 2025-11-03 18:34, Eric Biggers wrote:
> > On Sat, Oct 25, 2025 at 10:50:17PM -0700, Eric Biggers wrote:
> > > This series is targeting libcrypto-next.  It can also be retrieved
> > > from:
> > > 
> > >     git fetch
> > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> > > sha3-lib-v2
> > > 
> > > This series adds SHA-3 support to lib/crypto/.  This includes support
> > > for the digest algorithms SHA3-224, SHA3-256, SHA3-384, and SHA3-512,
> > > and also support for the extendable-output functions SHAKE128 and
> > > SHAKE256.  The SHAKE128 and SHAKE256 support will be needed by ML-DSA.
> > > 
> > > The architecture-optimized SHA-3 code for arm64 and s390 is migrated
> > > into lib/crypto/.  (The existing s390 code couldn't really be
> > > reused, so
> > > really I rewrote it from scratch.)  This makes the SHA-3 library
> > > functions be accelerated on these architectures.
> > > 
> > > Finally, the sha3-224, sha3-256, sha3-384, and sha3-512 crypto_shash
> > > algorithms are reimplemented on top of the library API.
> > 
> > I've applied this series to
> > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next,
> > excluding the following 2 patches which are waiting on benchmark results
> > from the s390 folks:
> > 
> >     lib/crypto: sha3: Support arch overrides of one-shot digest
> > functions
> >     lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
> > 
> > I'd be glad to apply those too if they're shown to be worthwhile.
> > 
> > Note: I also reordered the commits in libcrypto-next to put the new
> > KUnit test suites (blake2b and sha3) last, and to put the AES-GCM
> > improvements on a separate branch that's merged in.  This will allow
> > making separate pull requests for the tests and the AES-GCM
> > improvements, which I think aligns with what Linus had requested before
> > (https://lore.kernel.org/linux-crypto/CAHk-=wi5d4K+sF2L=tuRW6AopVxO1DDXzstMQaECmU2QHN13KA@mail.gmail.com/).
> > 
> > - Eric
> 
> Here are now some measurements on a LPAR with 500 runs once with
> sha3-lib-v2 branch full ("with") and once with reverting only the
> b2e169dd8ca5 lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest
> functions
> patch ("without"). With the help of gnuplot I generated distribution
> charts over the results of the len=16, 64, 256, 1024 and 4096 benchmark.
> See attached pictures - Sorry but I see no other way to provide this data
> than using an attachment.
> 
> Clearly the patch brings a boost - especially for the 256 byte case.
> 
> Harald Freudenberger

Thanks.  I applied "lib/crypto: sha3: Support arch overrides of one-shot
digest functions" and "lib/crypto: s390/sha3: Add optimized one-shot
SHA-3 digest functions" to libcrypto-next.  For the latter, I improved
the commit message to mention your benchmark results:

commit 862445d3b9e74f58360a7a89787da4dca783e6dd
Author: Eric Biggers <ebiggers at kernel.org>
Date:   Sat Oct 25 22:50:29 2025 -0700

    lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions
    
    Some z/Architecture processors can compute a SHA-3 digest in a single
    instruction.  arch/s390/crypto/ already uses this capability to optimize
    the SHA-3 crypto_shash algorithms.
    
    Use this capability to implement the sha3_224(), sha3_256(), sha3_384(),
    and sha3_512() library functions too.
    
    SHA3-256 benchmark results provided by Harald Freudenberger
    (https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/)
    on a z/Architecture machine with "facility 86" (MSA level 12):
    
        Length (bytes)    Before (MB/s)   After (MB/s)
        ==============    =============   ============
              16                212             225
              64                820             915
             256               1850            3350
            1024               5400            8300
            4096              11200           11300
    
    Note: the original data from Harald was given in the form of a graph for
    each length, showing the distribution of throughputs from 500 runs.  I
    guesstimated the peak of each one.
    
    Harald also reported that the generic SHA-3 code was at most 259 MB/s
    (https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/).
    So as expected, the earlier commit that optimized sha3_absorb_blocks()
    and sha3_keccakf() is the more important one; it optimized the Keccak
    permutation which is the most performance-critical part of SHA-3.
    Still, this additional commit does notably improve performance further
    on some lengths.
    
    Reviewed-by: Ard Biesheuvel <ardb at kernel.org>
    Tested-by: Harald Freudenberger <freude at linux.ibm.com>
    Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers at kernel.org>



More information about the linux-arm-kernel mailing list