[PATCH 11/18] arm64: fpsimd: Split FPSR/FPCR from SVE save/restore

Mark Brown broonie at kernel.org
Wed May 27 07:13:04 PDT 2026


On Wed, May 27, 2026 at 02:51:13PM +0100, Mark Rutland wrote:
> On Tue, May 26, 2026 at 05:28:21PM +0100, Mark Brown wrote:
> > On Thu, May 21, 2026 at 02:25:49PM +0100, Mark Rutland wrote:

> > > ... so I've assumed that this doesn't actually matter in practice, and
> > > implemented the C version matching the existing SVE assembly.

> > There is a possibility that it only matters for older, FPSIMD only CPUs
> > or just that nobody got round to benchmarking this on physical CPUs with
> > SVE and in fact a similar optimisation is also useful there.

> All of that might be true, but that doesn't change my assessment that
> this doesn't seem to matter in practice, and given that the overall goal
> of this series is to *simplify* things, I'd much rather err towards that
> than hypothetical performance concerns.

This could do with more clarification in the commit log, right now it
just points to us not having done this for SVE but that's running on a
rather shiner set of CPUs (and likely written without any physical
implementation available) so that's a rather large jump.

> > I'm a bit wary of dropping the optimisation without any verification
> > of the performance impact, but equally I'm not aware of a specific
> > benchmark that showed the impact or even if there was one in the first
> > place.  The changelog sounds like the optimisation might've been
> > written based on inspection alone, I don't know if anyone will
> > remember more than a decade later.

> From what I remember, the changes in commit 5959e25729a5 were made based
> on intuition, inspired by a contemporary retrospective change to the
> architecture that made FPCR self-synchronizing. Previously the
> architecture required a context synchronization event for the write to
> take effect, but implementations happened to be stronger.

> The conditional write isn't necessarily a win, because the cost of
> recovering from a branch mispredict can be much larger than the cost of
> micro-architectural mechanisms to ensure that FPCR is
> self-synchronizing.

That seems likely from my read of the commit log there, it smells like
something done from inspection rather than because there's an observed
performance change.  It'd be good to put something like the above in the
commit log since it's a much more relevant analysis than the comparison
with the SVE path.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20260527/469503dd/attachment.sig>


More information about the linux-arm-kernel mailing list