[PATCH v7 0/2] arm64/sve: Improve performance when handling SVE access traps
broonie at kernel.org
Tue Feb 9 13:22:02 EST 2021
On Mon, Feb 08, 2021 at 05:26:10PM +0000, Dave Martin wrote:
> On Mon, Feb 01, 2021 at 12:28:59PM +0000, Mark Brown wrote:
> > v7:
> > - A few minor cosmetic updates and one bugfix for
> > fpsimd_update_current_state().
> I see the following delta in this function:
> @@ -1272,7 +1272,8 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
> current->thread.uw.fpsimd_state = *state;
> - clear_thread_flag(TIF_SVE_FULL_REGS);
> + if (test_thread_flag(TIF_SVE_FULL_REGS))
> + fpsimd_to_sve(current);
> Can you elaborate on that?
This restores the behaviour we have in mainline, I'd messed up the
update here when going through doing the changes to split the flags.
The function is called by the signal code to load both the SVE and
FPSIMD state, it can enable SVE.
> I think the implementation flow in this series is roughly:
> a) Split TIF_SVE into two flags, but keep the flags in lockstep, and
> don't pretend to handle cases where they are unequal (no functional
> b) Add handling for cases where the flags are unequal (the only
> meaningful case being TIF_SVE_EXEC && !TIF_SVE_FULL_REGS) (purely
> dead code addition; no functional change).
> c) Add code that causes / resolves inequality of the flags in order to
> achieve the goal of the series (functional change).
> Does that sound about right? Patch 2 just does (c), broadly speaking,
> I couldn't convince myself whether patch 1 is introducing functional
> changes or not.
Yes, that's roughly the idea.
> The KVM parts of (b) could maybe benefit from their own commit message.
> If (b) were all in a separate patch by itself, it would be
> straightfoward to split the host and KVM changes up.
> Maybe splitting patch 1 into two would help clarify the situation.
I did start off splitting it up more but doing it in a way that doesn't
result in messier changes due to needing to work with two flags and
manages to preserve bisectability ended up pushing me towards the
smaller set here. It'd be easy to add a bit more verbiage about KVM to
the commit log though.
> Second, TIF_SVE_EXEC is advertised as being a pure indication of whether
> the task is to be allowed to use SVE in userspace. I find that this
> jars a little with the fact that we leave TIF_SVE_EXEC set in the
> !FULL_REGS case -- where we cannot enter userspace without doing some
> additional work. This is not necessarily a problem, but it was one
> extra thing to get my head around.
> There seems to be a preexisting convention that a set thread flag
> implies a "dirtier" condition than when the flag is clear, though
> obviously it works either way. TIF_SVE_FULL_REGS here is really a work-
> not-needed flag (whereas TIF_FOREIGN_FPSTATE and various others are
I have to say I was completely unaware of that convention, it didn't
jump out at me from the code :/ . I suppose you could also say that
it's saying that it's a work needed in that it means that the code has
to work with the full register set rather than just the FPSIMD subset
but it I guess it really just depends on which particular bit of code
you're looking at at the specific moment - on the save paths _FULL_REGS
means more work as you have to save all the SVE state rather than just
the FPSIMD subset. As you say it doesn't *really* matter.
> work-needed flags). This also is not necessarily a problem, but I find
> it a bit confusing. Inverting the sense of TIF_SVE_FULL_REGS, or
> a more explicit name such as TIF_SVE_FULL_REGS_VALID might make it
> clearer that there is work to do in the "not valid" case and we cannot
> blindly enter userspace or save the task state without taking any notice
> of the flag.
I'm not at all attached to the name, if people find _FULL_REGS_VALID
or some other name clearer that works for me.
> Ultimately, we want a decision point somewhere about whether to fall
> back to FPSIMD-only mode for a task or not, even if the decision is
> trivial for now. Where do you think that sits? I'd generally assumed
> that it would go in fpsimd_save() since that's where we take the hit for
> saving (and later restoring) the state. That function isn't preemptible
> though, so it might also make sense to make the decision elsewhere if
> there are suitable places to do it.
To me fpsimd_save() is more of a "do the thing we already decided to do"
function - it doesn't have enough context to make any decisions, and one
of the areas where we can benefit is when we never need to save at all,
for example when we take a SVE access trap and can initialize the SVE
state from the FPSIMD state already in registers and return to userspace
without saving to memory.
We definitely should optimize fpsimd_save() here, it shouldn't need to
set TIF_SVE_FULL_REGS or spill to the SVE state, but I wouldn't call it
a *decision* exactly.
> I say this because it is important for this series to make a foundation
> that such a decision can be slotted naturally into -- that's part of the
> motiviation of making these changes in the first place (IIUC, anyway).
To me the goal was mainly about making the access trap case more
efficient and potentially also enabling further work like having an
equivalent to fpu_counter instead of disabling the SVE access trap on
The way I was thinking about this was more that the decision points were
on kernel entry and exit. We need to generate the full register state
any time it's observed by something outside the kernel and there are a
set of defined kernel entry points where the SVE state can be discarded
but these are all individual points where an individual thing happens.
I'm therefore not 100% sure I'm fully understanding what you're thinking
of here. It doesn't really seem like a single point thing to me but
I've got a feeling that I'm not exactly talking about the same thing as
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 488 bytes
Desc: not available
More information about the linux-arm-kernel