[PATCH v5] arm64: fpsimd: improve stacking logic in non-interruptible context
Dave Martin
Dave.Martin at arm.com
Tue Dec 13 03:11:20 PST 2016
On Mon, Dec 12, 2016 at 05:56:27PM +0000, Ard Biesheuvel wrote:
> Currently, we allow kernel mode NEON in softirq or hardirq context by
> stacking and unstacking a slice of the NEON register file for each call
> to kernel_neon_begin() and kernel_neon_end(), respectively.
>
> Given that
> a) a CPU typically spends most of its time in userland, during which time
> no kernel mode NEON in process context is in progress,
> b) a CPU spends most of its time in the kernel doing other things than
> kernel mode NEON when it gets interrupted to perform kernel mode NEON
> in softirq context
>
> the stacking and subsequent unstacking is only necessary if we are
> interrupting a thread while it is performing kernel mode NEON in process
> context, which means that in all other cases, we can simply preserve the
> userland FPSIMD state once, and only restore it upon return to userland,
> even if we are being invoked from softirq or hardirq context.
>
> So instead of checking whether we are running in interrupt context, keep
> track of the level of nested kernel mode NEON calls in progress, and only
> perform the eager stack/unstack if the level exceeds 1.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> ---
> arch/arm64/kernel/fpsimd.c | 64 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 46 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 394c61db5566..c19363775436 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -220,45 +220,73 @@ void fpsimd_flush_task_state(struct task_struct *t)
>
> #ifdef CONFIG_KERNEL_MODE_NEON
>
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
> +/*
> + * Although unlikely, it is possible for three kernel mode NEON contexts to
> + * be live at the same time: process context, softirq context and hardirq
> + * context. So while the userland context is stashed in the thread's fpsimd
> + * state structure, we need two additional levels of storage.
> + */
> +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]);
> +static DEFINE_PER_CPU(int, kernel_neon_nesting_level);
>
> /*
> * Kernel-side NEON support functions
> */
> void kernel_neon_begin_partial(u32 num_regs)
> {
> - if (in_interrupt()) {
> - struct fpsimd_partial_state *s = this_cpu_ptr(
> - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
> + struct fpsimd_partial_state *s;
> + int level;
>
> - BUG_ON(num_regs > 32);
> - fpsimd_save_partial_state(s, roundup(num_regs, 2));
> - } else {
> + preempt_disable();
> +
> + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> /*
> * Save the userland FPSIMD state if we have one and if we
> * haven't done so already. Clear fpsimd_last_state to indicate
> * that there is no longer userland FPSIMD state in the
> * registers.
> */
> - preempt_disable();
> - if (current->mm &&
> - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> - fpsimd_save_state(¤t->thread.fpsimd_state);
> + if (current->mm) {
> + unsigned long flags;
> +
> + local_irq_save(flags);
For non-SVE hardware (i.e., all hardware for now), this raises mean IRQ
latency if KERNEL_MODE_NEON is used, to fix a bug that doesn't exist.
For SVE hardware, we would be disabling interrupts around typically a
few KB of stores. I don't know what the actual hardware numbers would
look like, but this feels like a disproportionate cost to the problem
being solved...
After all, why do this here...
> + if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> + fpsimd_save_state(¤t->thread.fpsimd_state);
> + local_irq_restore(flags);
> + } else {
> + set_thread_flag(TIF_FOREIGN_FPSTATE);
> + }
> this_cpu_write(fpsimd_last_state, NULL);
> }
> +
> + level = this_cpu_inc_return(kernel_neon_nesting_level);
...and yet go to all this trouble to avoid disabling IRQs for a _nested_
kernel_neon_begin()?
Did I miss something, or does increasing the count around the outermost
case too without IRQ disabling not work, in the non-SVE case?
Cheers
---Dave
> + BUG_ON(level > 3);
> +
> + if (level > 1) {
> + s = this_cpu_ptr(nested_fpsimdstate);
> +
> + WARN_ON_ONCE(num_regs > 32);
> + num_regs = min(roundup(num_regs, 2), 32U);
> +
> + fpsimd_save_partial_state(&s[level - 2], num_regs);
> + }
> }
> EXPORT_SYMBOL(kernel_neon_begin_partial);
>
> void kernel_neon_end(void)
> {
> - if (in_interrupt()) {
> - struct fpsimd_partial_state *s = this_cpu_ptr(
> - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
> - fpsimd_load_partial_state(s);
> - } else {
> - preempt_enable();
> + struct fpsimd_partial_state *s;
> + int level;
> +
> + level = this_cpu_read(kernel_neon_nesting_level);
> + BUG_ON(level < 1);
> +
> + if (level > 1) {
> + s = this_cpu_ptr(nested_fpsimdstate);
> + fpsimd_load_partial_state(&s[level - 2]);
> }
> + this_cpu_dec(kernel_neon_nesting_level);
> + preempt_enable();
> }
> EXPORT_SYMBOL(kernel_neon_end);
>
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
More information about the linux-arm-kernel
mailing list