[PATCH v2 09/11] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1

Thu May 15 15:10:20 PDT 2014

On 15 May 2014 14:47, Catalin Marinas <catalin.marinas at arm.com> wrote:
> On 15 May 2014, at 22:35, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
>> On 15 May 2014 10:24, Catalin Marinas <catalin.marinas at arm.com> wrote:
>>> On Wed, May 14, 2014 at 07:17:29PM +0100, Ard Biesheuvel wrote:
>>>> +static u8 const *sha1_do_update(struct shash_desc *desc, const u8 *data,
>>>> +                             int blocks, u8 *head, unsigned int len)
>>>> +{
>>>> +     struct sha1_state *sctx = shash_desc_ctx(desc);
>>>> +     struct thread_info *ti = NULL;
>>>> +
>>>> +     /*
>>>> +      * Pass current's thread info pointer to sha1_ce_transform()
>>>> +      * below if we want it to play nice under preemption.
>>>> +      */
>>>> +     if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || IS_ENABLED(CONFIG_PREEMPT))
>>>> +         && (desc->flags & CRYPTO_TFM_REQ_MAY_SLEEP))
>>>> +             ti = current_thread_info();
>>>> +
>>>> +     do {
>>>> +             int rem;
>>>> +
>>>> +             kernel_neon_begin_partial(16);
>>>> +             rem = sha1_ce_transform(blocks, data, sctx->state, head, 0, ti);
>>>> +             kernel_neon_end();
>>>> +
>>>> +             data += (blocks - rem) * SHA1_BLOCK_SIZE;
>>>> +             blocks = rem;
>>>> +             head = NULL;
>>>> +     } while (unlikely(ti && blocks > 0));
>>>> +     return data;
>>>> +}
>>>
>>> What latencies are we talking about? Would it make sense to always
>>> call cond_resched() even if preemption is disabled?
>>
>> Calculating the SHA256 (which is the most striking example) hash of a
>> 4k buffer takes ~10 microseconds, so perhaps this is all a bit moot as
>> I don't think there will generally be many in-kernel users hashing
>> substantially larger quantities of data at a time.
>
> And what’s the maximum size? Could we do it in blocks of 4K?
>

Yes, that should be doable as well. Once we feel we actually need this
feature, that is probably a better approach.

>>> But I think we should have this loop always rescheduling if
>>> CRYPTO_TFM_REQ_MAY_SLEEP. I can see there is a crypto_yield() function
>>> that conditionally reschedules. What's the overhead of calling
>>> sha1_ce_transform() in a loop vs a single call for the entire data?
>>
>> For SHA256 we lose at least 10% if we call into the core function for
>> each 64 byte block rather than letting it chew away at all the data
>> itself.
>
> I was thinking more like calling it for bigger buffers (e.g. page) at
> once and a crypto_yield() in between.
>
>> (This is due to the fact that it needs to load 64 32-bit round
>> keys) But perhaps (considering the above) it is better to just shelve
>> the last 4 patches in this series until it becomes a problem for
>> anyone.
>
> For the time being I would drop the last 4 patches. We can revisit them
> for the next kernel version.
>

Agreed.

Should I send an updated pull request?

-- 
Ard.