[PATCH v3 01/16] crypto: sha1: implement base layer for SHA-1

Wed Apr 8 07:18:56 PDT 2015

On 8 April 2015 at 16:06, Herbert Xu <herbert at gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 03:40:56PM +0200, Ard Biesheuvel wrote:
>>
>> This is not the partial code path, it is the .finup path, in fact.
>> Anything that hashes data that is often a multiple of the block size
>> (which is more likely for block based applications than for IPsec, I
>> think) should benefit from this. But even if it is not, using a head
>> block and a pointer to the src eliminates one call of the block
>> transform.
>
> This would only ever happen if you call update on a partial block
> which should be the slow path anyway.
>
> IPsec only calls digest so it can never do a partial block update
> unless you have a badly fragmented skb.
>

OK, so using digest() (which uses finup()), there will never be any
data stashed in the struct::buf[], so the possible invocations of the
block transform are
- first call with all data
- second call with the unaligned chunk at the end + padding
- third call with the end of the padding + count

where the second call is only made in a minority of cases where there
is no room for the count and single padding byte after the data.
And the head block only helps with partial data, not with reducing the
potential number of invocations done for padding and count etc.

> Incidentally if you did want to deal with such cases in light
> of the high overhead that you have you may want to switch over
> to ahash which would allow you to do the saving and restoring
> once per operation rather than once per SG entry.
>

OK I will look into that.

>> OK, so there are 2 pieces of crap [sic] in this proposed generic layer:
>> - the head block
>> - the generic pointer
>
> Yes your patches look great otherwise.  I just want to keep things
> simple because you're creating a template for all future SHA authors.
>
>> The generic pointer is used in the arm64 case to convey the
>> information that the current invocation of the block transform is the
>> final one, and the core code can apply the padding and finalize /and/
>> pass back whether it has done so or not. (the latter can easily be
>> done in the C code as well)  I used a generic pointer to allow other
>> uses, but if you have a better idea for this particular use case, I'd
>> be happy to hear it.
>
> As I said in the original email, this appears to be equivalent to
>
>         (sctx->count + len) % SHA1_BLOCK_SIZE != 0
>
> So why not just do that check in C? It'll get compiled into a
> simple mask so it should be no slower than a branch on finalize.
>
> In fact if your assembly function always updates sctx->count
> then you could simplify that to
>
>         sctx->count % SHA1_BLOCK_SIZE != 0
>

The problem is that the block transform needs to be informed of the
fact that the current invocation is the final one, and it can perform
the padding, add the count etc. The C code can figure out by looking
at the count whether the asm code has applied the padding or not, but
it cannot really tell the asm code whether it should apply the padding
without an additional argument.

So what about

> +typedef void (sha1_block_fn)(int blocks, u8 const *src, u32 *state,
> +                          bool final);

?