[PATCH v2 09/11] arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
Catalin Marinas
catalin.marinas at arm.com
Thu May 15 14:47:04 PDT 2014
On 15 May 2014, at 22:35, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 15 May 2014 10:24, Catalin Marinas <catalin.marinas at arm.com> wrote:
>> On Wed, May 14, 2014 at 07:17:29PM +0100, Ard Biesheuvel wrote:
>>> +static u8 const *sha1_do_update(struct shash_desc *desc, const u8 *data,
>>> + int blocks, u8 *head, unsigned int len)
>>> +{
>>> + struct sha1_state *sctx = shash_desc_ctx(desc);
>>> + struct thread_info *ti = NULL;
>>> +
>>> + /*
>>> + * Pass current's thread info pointer to sha1_ce_transform()
>>> + * below if we want it to play nice under preemption.
>>> + */
>>> + if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) || IS_ENABLED(CONFIG_PREEMPT))
>>> + && (desc->flags & CRYPTO_TFM_REQ_MAY_SLEEP))
>>> + ti = current_thread_info();
>>> +
>>> + do {
>>> + int rem;
>>> +
>>> + kernel_neon_begin_partial(16);
>>> + rem = sha1_ce_transform(blocks, data, sctx->state, head, 0, ti);
>>> + kernel_neon_end();
>>> +
>>> + data += (blocks - rem) * SHA1_BLOCK_SIZE;
>>> + blocks = rem;
>>> + head = NULL;
>>> + } while (unlikely(ti && blocks > 0));
>>> + return data;
>>> +}
>>
>> What latencies are we talking about? Would it make sense to always
>> call cond_resched() even if preemption is disabled?
>
> Calculating the SHA256 (which is the most striking example) hash of a
> 4k buffer takes ~10 microseconds, so perhaps this is all a bit moot as
> I don't think there will generally be many in-kernel users hashing
> substantially larger quantities of data at a time.
And what’s the maximum size? Could we do it in blocks of 4K?
>> But I think we should have this loop always rescheduling if
>> CRYPTO_TFM_REQ_MAY_SLEEP. I can see there is a crypto_yield() function
>> that conditionally reschedules. What's the overhead of calling
>> sha1_ce_transform() in a loop vs a single call for the entire data?
>
> For SHA256 we lose at least 10% if we call into the core function for
> each 64 byte block rather than letting it chew away at all the data
> itself.
I was thinking more like calling it for bigger buffers (e.g. page) at
once and a crypto_yield() in between.
> (This is due to the fact that it needs to load 64 32-bit round
> keys) But perhaps (considering the above) it is better to just shelve
> the last 4 patches in this series until it becomes a problem for
> anyone.
For the time being I would drop the last 4 patches. We can revisit them
for the next kernel version.
Catalin
More information about the linux-arm-kernel
mailing list