[RFC v3 PATCH 3/7] ARM64: defer reloading a task's FPSIMD state to userland resume
Catalin Marinas
catalin.marinas at arm.com
Mon Oct 28 18:29:52 EDT 2013
On 28 Oct 2013, at 20:32, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 28 October 2013 11:12, Catalin Marinas <catalin.marinas at arm.com> wrote:
>> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
>>> --- a/arch/arm64/kernel/fpsimd.c
>>> +++ b/arch/arm64/kernel/fpsimd.c
>>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>> void fpsimd_thread_switch(struct task_struct *next)
>>> {
>>> /* check if not kernel threads */
>>> - if (current->mm)
>>> + if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>> fpsimd_save_state(¤t->thread.fpsimd_state);
>>
>> Why does it need test_and_set_thread_flag() here? Some comments would be
>> useful as it looks strange to check a reload flag to decide whether to
>> save a state. Or change the name to something like 'dirty'.
>
> Actually, it's test and clear.
Yes, just a typo.
> If the userland register file has
> already been preserved for the purpose of performing kernel mode NEON,
> it should not be saved again when the task gets scheduled out. The
> clearing could also be deferred to the time when the task gets
> scheduled in again.
The above should be turned into a comment in the code.
> Or perhaps, it would be even better to always
> defer loading the userland state for the next task when that task in
> fact enters userland.
It needs some more thinking, its usually cleaner in the context
switching code.
>>> if (next->mm)
>>> fpsimd_load_state(&next->thread.fpsimd_state);
>>
>> This function could be optimised a bit more to avoid saving/restoring if
>> the switch only happened between a user thread and a kernel one (and
>> back again) since the FP state may not have been dirtied. But what I had
>> in mind was per-CPU fpstate (possibly pointer or some flag) rather than
>> per-thread.
>
> Well, then we are entering the realm of lazy restore, imo. There were
> some patches proposed for that already, I think? But I do agree that
> at his point, there is no need to restore the userland register
> contents yet, it can be deferred to the point when the task reenters
> userland (as mentioned above).
Not entirely lazy. What I dont want to see (without proper benchmarks)
is disabling the FP at context switch and restoring the registers lazily
via the fault mechanism. What Im proposing above is not lazy, just an
optimisation for a clear case where the FP is not used in a kernel
thread.
>>> --- a/arch/arm64/kernel/signal.c
>>> +++ b/arch/arm64/kernel/signal.c
>>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs,
>>> clear_thread_flag(TIF_NOTIFY_RESUME);
>>> tracehook_notify_resume(regs);
>>> }
>>> + if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>> + fpsimd_load_state(¤t->thread.fpsimd_state);
>>
>> I think this code can be preempted, it is run with IRQs enabled. And
>> there is a small window where we cleared the flag but haven't loaded the
>> state.
>
> If we are preempted at this point, the fpstate will be loaded in the
> normal way the next time this task runs, so I think this is harmless.
> Although I guess we may be restoring the fp state twice in that case?
Lets say task A does an svc and gets into kernel mode followed by
kernel_neon_begin/end(). Before returning to user, the kernel runs the
above test_and_clear_thread_flag(). Immediately after, an interrupt
happens the task A is preempted. The fpsimd_thread_switch() function
finds that TIF_RELOAD_FPSTATE is cleared and save the current FP state.
But the FP regs contain whatever the kernel neon code did, so you
corrupt the existing data.
> So in summary, what I need to do is:
> - rework to use a per_cpu flag rather than a TIF;
> - preserve the userland state (if it has one) when a task gets scheduled out;
> - restore the userland state when a task enters userland;
Happy to discuss the algorithm before you code it (unless you prefer to
write the code quickly).
> I will propose an updated patch after I wrap up the work on the
> in_interrupt() kernel mode NEON, as these topics are really orthogonal
> and there is no reason to keep them combined in a single series.
Sounds fine.
Catalin
More information about the linux-arm-kernel
mailing list