[RFT PATCH v1 0/7] enable FPSIMD lazy save and restore for arm64

Jiang Liu liuj97 at gmail.com
Fri Sep 27 11:20:15 EDT 2013


On 09/27/2013 07:23 PM, Will Deacon wrote:
> On Fri, Sep 27, 2013 at 11:50:46AM +0100, Catalin Marinas wrote:
>> On Fri, Sep 27, 2013 at 09:04:40AM +0100, Jiang Liu wrote:
>>> From: Jiang Liu <jiang.liu at huawei.com>
>>>
>>> This patchset enables FPSIMD lazy save and restore for ARM64, you could
>>> apply it against v3.12-rc2.
>>>
>>> We have done basic functional tests on ARM fast model, but still lack
>>> of detail performance benchmark on real hardware platforms. We would
>>> appreciate if you could help to test it on really hardware platforms!
>>
>> That's my issue as well, I would like to see some benchmarks before
>> merging such patches.
> 
> Furthermore, with GCC's register allocator starting to use vector registers to
> optimise *integer* code instead of spilling to the stack, it's going to become
> more and more common to tasks to have live FP state at context switch. Lazy
> switching might simply introduce overhead in the form of additional trapping.
> 
> Will
> 
Hi Will,
	The patchset actually includes three optimizations.

The first one uses PF_USED_MATH to track whether the thread has
accessed FPSIMD registers since it has been created. If the thread
hasn't accessed FPSIMD registers since it's birth, we don't need to
save and restore FPSIMD context on thread context switching.

The second one uses a percpu variable to track the owner of the
FPSIMD hardware. When switching a thread, if it's the owner of
the FPSIMD hardware, we don't need to load FPSIMD registers again.
This is useful when context switching between user thread and
kernel(idle) threads.

The third one disable access to FPSIMD registers when switching a
thread. When the thread tries to access FPSIMD registers the first
time since it has been switched in, an exception is raised and then
we will load FPSIMD context onto hardware.

The overhead (penalty) of the first and second optimizations is
relatively small, so we could always enable them. The overhead
of the third one is relatively high and the optimization effect
depends on many factors, such as workload, glibc etc. So we
provide a kernel boot option "eagerfpu" to enable/disable the
third optimization.

So what's your thought about the first and second optimizations?
Should we always enable them? I do need to do some benchmark for
this, but still lack of hardware.

Thanks!
Gerry



More information about the linux-arm-kernel mailing list