[PATCH v2 09/11] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1

Thu May 15 14:47:04 PDT 2014

On 15 May 2014, at 22:35, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 15 May 2014 10:24, Catalin Marinas <catalin.marinas at arm.com> wrote:
>> On Wed, May 14, 2014 at 07:17:29PM +0100, Ard Biesheuvel wrote:
>>> +static u8 const *sha1_do_update(struct shash_desc *desc, const u8 *data,
>>> +                             int blocks, u8 *head, unsigned int len)
>>> +{
>>> +     struct sha1_state *sctx = shash_desc_ctx(desc);
>>> +     struct thread_info *ti = NULL;
>>> +
>>> +     /*
>>> +      * Pass current's thread info pointer to sha1_ce_transform()
>>> +      * below if we want it to play nice under preemption.
>>> +      */
>>> +     if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || IS_ENABLED(CONFIG_PREEMPT))
>>> +         && (desc->flags & CRYPTO_TFM_REQ_MAY_SLEEP))
>>> +             ti = current_thread_info();
>>> +
>>> +     do {
>>> +             int rem;
>>> +
>>> +             kernel_neon_begin_partial(16);
>>> +             rem = sha1_ce_transform(blocks, data, sctx->state, head, 0, ti);
>>> +             kernel_neon_end();
>>> +
>>> +             data += (blocks - rem) * SHA1_BLOCK_SIZE;
>>> +             blocks = rem;
>>> +             head = NULL;
>>> +     } while (unlikely(ti && blocks > 0));
>>> +     return data;
>>> +}
>> 
>> What latencies are we talking about? Would it make sense to always
>> call cond_resched() even if preemption is disabled?
> 
> Calculating the SHA256 (which is the most striking example) hash of a
> 4k buffer takes ~10 microseconds, so perhaps this is all a bit moot as
> I don't think there will generally be many in-kernel users hashing
> substantially larger quantities of data at a time.

And what’s the maximum size? Could we do it in blocks of 4K?

>> But I think we should have this loop always rescheduling if
>> CRYPTO_TFM_REQ_MAY_SLEEP. I can see there is a crypto_yield() function
>> that conditionally reschedules. What's the overhead of calling
>> sha1_ce_transform() in a loop vs a single call for the entire data?
> 
> For SHA256 we lose at least 10% if we call into the core function for
> each 64 byte block rather than letting it chew away at all the data
> itself.

I was thinking more like calling it for bigger buffers (e.g. page) at
once and a crypto_yield() in between.

> (This is due to the fact that it needs to load 64 32-bit round
> keys) But perhaps (considering the above) it is better to just shelve
> the last 4 patches in this series until it becomes a problem for
> anyone.

For the time being I would drop the last 4 patches. We can revisit them
for the next kernel version.

Catalin