[PATCH v2 09/11] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
Ard Biesheuvel
ard.biesheuvel at linaro.org
Thu May 15 15:10:20 PDT 2014
On 15 May 2014 14:47, Catalin Marinas <catalin.marinas at arm.com> wrote:
> On 15 May 2014, at 22:35, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
>> On 15 May 2014 10:24, Catalin Marinas <catalin.marinas at arm.com> wrote:
>>> On Wed, May 14, 2014 at 07:17:29PM +0100, Ard Biesheuvel wrote:
>>>> +static u8 const *sha1_do_update(struct shash_desc *desc, const u8 *data,
>>>> + int blocks, u8 *head, unsigned int len)
>>>> +{
>>>> + struct sha1_state *sctx = shash_desc_ctx(desc);
>>>> + struct thread_info *ti = NULL;
>>>> +
>>>> + /*
>>>> + * Pass current's thread info pointer to sha1_ce_transform()
>>>> + * below if we want it to play nice under preemption.
>>>> + */
>>>> + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || IS_ENABLED(CONFIG_PREEMPT))
>>>> + && (desc->flags & CRYPTO_TFM_REQ_MAY_SLEEP))
>>>> + ti = current_thread_info();
>>>> +
>>>> + do {
>>>> + int rem;
>>>> +
>>>> + kernel_neon_begin_partial(16);
>>>> + rem = sha1_ce_transform(blocks, data, sctx->state, head, 0, ti);
>>>> + kernel_neon_end();
>>>> +
>>>> + data += (blocks - rem) * SHA1_BLOCK_SIZE;
>>>> + blocks = rem;
>>>> + head = NULL;
>>>> + } while (unlikely(ti && blocks > 0));
>>>> + return data;
>>>> +}
>>>
>>> What latencies are we talking about? Would it make sense to always
>>> call cond_resched() even if preemption is disabled?
>>
>> Calculating the SHA256 (which is the most striking example) hash of a
>> 4k buffer takes ~10 microseconds, so perhaps this is all a bit moot as
>> I don't think there will generally be many in-kernel users hashing
>> substantially larger quantities of data at a time.
>
> And what’s the maximum size? Could we do it in blocks of 4K?
>
Yes, that should be doable as well. Once we feel we actually need this
feature, that is probably a better approach.
>>> But I think we should have this loop always rescheduling if
>>> CRYPTO_TFM_REQ_MAY_SLEEP. I can see there is a crypto_yield() function
>>> that conditionally reschedules. What's the overhead of calling
>>> sha1_ce_transform() in a loop vs a single call for the entire data?
>>
>> For SHA256 we lose at least 10% if we call into the core function for
>> each 64 byte block rather than letting it chew away at all the data
>> itself.
>
> I was thinking more like calling it for bigger buffers (e.g. page) at
> once and a crypto_yield() in between.
>
>> (This is due to the fact that it needs to load 64 32-bit round
>> keys) But perhaps (considering the above) it is better to just shelve
>> the last 4 patches in this series until it becomes a problem for
>> anyone.
>
> For the time being I would drop the last 4 patches. We can revisit them
> for the next kernel version.
>
Agreed.
Should I send an updated pull request?
--
Ard.
More information about the linux-arm-kernel
mailing list