[PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

Ard Biesheuvel ard.biesheuvel at linaro.org
Wed Dec 6 03:57:12 PST 2017


On 6 December 2017 at 11:51, Dave Martin <Dave.Martin at arm.com> wrote:
> On Tue, Dec 05, 2017 at 06:04:34PM +0000, Ard Biesheuvel wrote:
>> On 5 December 2017 at 12:45, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
>> >
>> >
>> >> On 5 Dec 2017, at 12:28, Dave Martin <Dave.Martin at arm.com> wrote:
>> >>
>> >>> On Mon, Dec 04, 2017 at 12:26:37PM +0000, Ard Biesheuvel wrote:
>> >>> Add a support macro to conditionally yield the NEON (and thus the CPU)
>> >>> that may be called from the assembler code. Given that especially the
>> >>> instruction based accelerated crypto code may use very tight loops, add
>> >>> some parametrization so that the TIF_NEED_RESCHED flag test is only
>> >>> executed every so many loop iterations.
>> >>>
>> >>> In some cases, yielding the NEON involves saving and restoring a non
>> >>> trivial amount of context (especially in the CRC folding algorithms),
>> >>> and so the macro is split into two, and the code in between is only
>> >>> executed when the yield path is taken, allowing the contex to be preserved.
>> >>> The second macro takes a label argument that marks the resume-from-yield
>> >>> path, which should restore the preserved context again.
>> >>>
>> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>> >>> ---
>> >>> arch/arm64/include/asm/assembler.h | 50 ++++++++++++++++++++
>> >>> 1 file changed, 50 insertions(+)
>> >>>
>> >>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> >>> index aef72d886677..917b026d3e00 100644
>> >>> --- a/arch/arm64/include/asm/assembler.h
>> >>> +++ b/arch/arm64/include/asm/assembler.h
>> >>> @@ -512,4 +512,54 @@ alternative_else_nop_endif
>> >>> #endif
>> >>>    .endm
>> >>>
>> >>> +/*
>> >>> + * yield_neon - check whether to yield to another runnable task from
>> >>> + *        kernel mode NEON code (running with preemption disabled)
>> >>> + *
>> >>> + * - Check whether the preempt count is exactly 1, in which case disabling
>> >>> + *   preemption once will make the task preemptible. If this is not the case,
>> >>> + *   yielding is pointless.
>> >>> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
>> >>> + *   kernel mode NEON (which will trigger a reschedule), and branch to the
>> >>> + *   yield fixup code at @lbl.
>> >>> + */
>> >>> +    .macro        yield_neon, lbl:req, ctr, order, stride, loop
>> >>> +    yield_neon_pre    \ctr, \order, \stride, \loop
>> >>> +    yield_neon_post    \lbl
>> >>> +    .endm
>> >>> +
>> >>> +    .macro        yield_neon_pre, ctr, order=0, stride, loop=4444f
>> >>> +#ifdef CONFIG_PREEMPT
>> >>> +    /*
>> >>> +     * With some algorithms, it makes little sense to poll the
>> >>> +     * TIF_NEED_RESCHED flag after every iteration, so only perform
>> >>> +     * the check every 2^order strides.
>> >>> +     */
>> >>> +    .if        \order > 1
>> >>> +    .if        (\stride & (\stride - 1)) != 0
>> >>> +    .error        "stride should be a power of 2"
>> >>> +    .endif
>> >>> +    tst        \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
>> >>> +    b.ne        \loop
>> >>> +    .endif
>> >>
>> >> I'm not sure what baking in this check gives us, and this seems to
>> >> conflate two rather different things: yielding and defining a
>> >> "heartbeat" frequency for the calling code.
>> >>
>> >> Can we separate out the crypto-loop-helper semantics from the yield-
>> >> neon stuff?
>> >>
>> >
>> > Fair enough. I incorporated the check here so it disappears from the code entirely when !CONFIG_PREEMPT, because otherwise, you end up with a sequence that is mispredicted every # iterations without any benefit.
>> > I guess i could macroise that separately though.
>> >
>> >> If we had
>> >>
>> >>    if_cond_yield_neon
>> >>    // patchup code
>> >>    endif_yield_neon
>> >>
>>
>> I like this, but ...
>>
>> >> then the caller is free to conditionally branch over that as appropriate
>> >> like
>> >>
>> >> loop:
>> >>    // crypto stuff
>> >>    tst x0, #0xf
>> >>    b.ne    loop
>> >>
>> >>    if_cond_yield_neon
>> >>    // patchup code
>> >>    endif_cond_yield_neon
>> >>
>>
>> I need to put the post patchup code somewhere too. Please refer to the
>> CRC patches for the best examples of this.
>
> I'm not sure I follow.  If you need to do something different after
> patchup, can't you either pull that code into the if...endif too, or
> otherwise put a branch before the endif?
>

No, not really.

What I currently have is

>    if_cond_yield_neon
>    // patchup code
>    endif_cond_yield_neon

which is being turned into

    get_thread_info x0
    ldr w1, [x0, #TSK_TI_PREEMPT]
    ldr x0, [x0, #TSK_TI_FLAGS]
    cmp w1, #1 // == PREEMPT_OFFSET
    csel x0, x0, xzr, eq
    tbnz x0, #TIF_NEED_RESCHED, 5555f // needs rescheduling?
4444:

    .subsection 1
5555:
    // patchup code
    bl kernel_neon_end
    bl kernel_neon_begin
    // post patchup code goes here
    b 4444b
    .subsection

so what I basically need is a place to put the post patchup code,
unless I open code it and branch to a different label right after
kernel_neon_begin



More information about the linux-arm-kernel mailing list