[PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1

Ard Biesheuvel ard.biesheuvel at linaro.org
Wed Apr 8 06:40:56 PDT 2015


On 8 April 2015 at 15:30, Herbert Xu <herbert at gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:25:14PM +0200, Ard Biesheuvel wrote:
>>
>> Not having to call the function twice is the whole point. In the arm64
>> case, all the SHA-256 round keys can be kept in registers (it has 32
>> 16-byte SIMD registers), and that is what motivates this pattern. By
>> passing a head block, a pointer to the source and the generic pointer
>> (which arm64 uses to finalize the block, we can process all data in a
>> single invocation of the block transform)
>
> Does this really make any difference? With IPsec the partial code
> path is never even going to get executed.
>

This is not the partial code path, it is the .finup path, in fact.
Anything that hashes data that is often a multiple of the block size
(which is more likely for block based applications than for IPsec, I
think) should benefit from this. But even if it is not, using a head
block and a pointer to the src eliminates one call of the block
transform.

Note that, in the arm64 case, calling a SHA-256 block transform in
non-process context involves:
- stacking the contents of 28 SIMD registers (28 x 16 = 448 bytes)
- loading the SHA-256 constants (16 x 16 = 256 bytes)
- processing the data
- unstacking the contents of 28 SIMD registers (448 bytes)

so anything that can prevent needlessly calling these functions
multiple times in quick successsion is going to help, and 'just
calling it twice' just doesn't cut it.

>> Do note that these are only used by static inline functions, so the
>> unused arguments are all eliminated from the binary anyway. In fact,
>> looking at the generated code, the function calls don't use function
>> pointers at all anymore,
>> but just call the block transform directly, so the typedef is only
>> used as a prototype, really.
>
> It's not just the generated code.  The next guy that comes along
> and writes a SHA implementation is going to go WTH is this p
> argument.  I'm not going to add crap to the generic layer just
> because ARM needs it.  In fact ARM doesn't even need it.
>

OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
- the head block
- the generic pointer

The generic pointer is used in the arm64 case to convey the
information that the current invocation of the block transform is the
final one, and the core code can apply the padding and finalize /and/
pass back whether it has done so or not. (the latter can easily be
done in the C code as well)  I used a generic pointer to allow other
uses, but if you have a better idea for this particular use case, I'd
be happy to hear it.



More information about the linux-arm-kernel mailing list