[PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch

Marc Zyngier maz at kernel.org
Sat Mar 9 03:01:56 PST 2024


On Thu, 07 Mar 2024 14:26:30 +0000,
Mark Brown <broonie at kernel.org> wrote:
> 
> On Thu, Mar 07, 2024 at 11:10:40AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie at kernel.org> wrote:
> > > On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> > > > Mark Brown <broonie at kernel.org> wrote:
> > > > > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> 
> > > > > The SME patch series proposes adding an additional state to this
> > > > > enumeration which would say if the registers are stored in a format
> > > > > suitable for exchange with userspace, that would make this state part of
> > > > > the vCPU state.  With the addition of SME we can have two vector lengths
> > > > > in play so the series proposes picking the larger to be the format for
> > > > > userspace registers.
> 
> > > > What does this addition have anything to do with the ownership of the
> > > > physical register file? Not a lot, it seems.
> 
> > > > Specially as there better be no state resident on the CPU when
> > > > userspace messes up with it.
> 
> > > If we have a situation where the state might be stored in memory in
> > > multiple formats it seems reasonable to consider the metadata which
> > > indicates which format is currently in use as part of the state.
> 
> > There is no reason why the state should be in multiple formats
> > *simultaneously*. All the FP/SIMD/SVE/SME state is largely
> > overlapping, and we only need to correctly invalidate the state that
> > isn't relevant to writes from userspace.
> 
> I agree that we don't want to store multiple copies, the proposed
> implementation for SME does that - when converting between guest native
> and userspace formats we discard the original format after conversion
> and updates the metadata which says which format is stored.  That
> metadata is modeled as adding a new owner.

What conversion? If userspace writes to FP, the upper bits of the SVE
registers should get zeroed. If it writes to SVE, it writes using the
current VL for the current streaming/non-streaming mode.

If the current userspace API doesn't give us the tools to do so, we
rev it.

>
> > > > > We could store this separately to fp_state/owner but it'd still be a
> > > > > value stored in the vCPU.
> 
> > > > I totally disagree.
> 
> > > Where would you expect to see the state stored?
> 
> > Sorry, that came out wrong. I expect *some* vcpu state to describe the
> > current use of the FP/vector registers, and that's about it. Not the
> > ownership information.
> 
> Ah, I think the distinction here is that you're modeling the state
> tracking updated by this patch as purely tracking the registers in which
> case yes, we should just add a separate variable in the vCPU for
> tracking the format of the data.  I was modeling the state tracking as
> covering both the state in the registers and the state of the storage
> backing them (since the guest state being in the registers means that
> the state in memory is invalid).
> 
> > > > > Storing in a format suitable for userspace
> > > > > usage all the time when we've got SME would most likely result in
> > > > > performance overhead
> 
> > > > What performance overhead? Why should we care?
> > > 
> > > Since in situations where we're not using the larger VL we would need to
> > > load and store the registers using a vector length other than the
> 
> ...
> 
> > > and would instead need to manually compute the memory locations where
> > > values are stored.  As well as the extra instructions when using the
> > > smaller vector length we'd also be working with sparser data likely over
> > > more cache lines.
> 
> > Are you talking about a context switch? or userspace accesses? I don't
> > give a damn about the latter, as it statistically never happens. The
> > former is of course of interest, but you still don't explain why the
> > above is a problem.
> 
> Well, there will be a cost in any rewriting so I'm trying to shift it
> away from context switch to userspace access since as you say they are
> negligably frequent.

So we're in violent agreement.

> > Nothing prevent you from storing the registers using the *current* VL,
> > since there is no data sharing between the SVE registers and the
> > streaming-SVE ones. All you need to do is to make sure you don't mix
> > the two.
> 
> You seem to be suggesting modeling the streaming mode registers as a
> completely different set of registers in the KVM ABI.

Where are you reading this? I *never* said anything of the sort. There
is only one set of SVE registers. The fact that they change size and
get randomly zeroed when userspace flips bits is a property of the
architecture. The HW *can* implements them as two sets of registers if
SME is a separate accelerator, but that's not architectural.

All I'm saying is that you can have a *single* register file, spanning
both FPSIMD and SVE, using the maximum VL achievable in the guest, and
be done with it. Yes, you'll have to refactor the code so that FPSIMD
lands at the right spot in the SVE register file. Big deal.

> That would
> certainly be another option, though it's a bit unclear from an
> architecture point of view and it didn't seem to entirely fit with the
> handling of the FPSIMD registers when SVE is in use.  From an
> architecture point of view the streaming mode registers aren't really a
> separate set of registers.  When in streaming mode it is not possible to
> observe the non-streaming register state and vice versa, and entering or
> exiting streaming mode zeros the register state so functionally you just
> have floating point registers.
> 
> You'd need to handle what happens with access to the inactive register
> set from userspace, the simplest thing to implement would be to read the
> logical value of zero and either discard or return an error on writes.
> 
> > > We would also need to consider if we need to zero the holes in the data
> > > when saving, we'd only potentially be leaking information from the guest
> > > but it might cause nasty surprises given that transitioning to/from
> > > streaming mode is expected to zero values.  If we do need to zero then
> > > that would be additional work that would need doing.
> 
> > The zeroing is mandated by the architecture, AFAIU. That's not optional.
> 
> Yes, we need to zero at some point - we could just do it on userspace
> read though I think rather than having to do it when saving.

As long as the guest only observes an architecturally compliant state,
I'm find with that.

> 
> > > Spending more effort to implement something which also has more runtime
> > > performance overhead for the case of saving and restoring guest state
> > > which I expect to be vastly more common than the VMM accessing the guest
> > > registers just doesn't seem like an appealing choice.
> 
> > I don't buy the runtime performance aspect at all. As long as you have
> > the space to dump the largest possible VL, you can always dump it in
> > the native format.
> 
> Sure, my point was that you appeared to be asking to dump in a
> non-native format.  

Again, what makes you think that? You keep putting words in my mouth.

>
> > > If we were storing the data in the native format for the guest then that
> > > format will change if streaming mode is changed via a write to SVCR.
> > > This would mean that the host would need to understand that when writing
> > > values SVCR needs to be written before the Z and P registers.  To be
> > > clear I don't think this is a good idea.
> 
> > The architecture is crystal clear: you flip SVCR.SM, you loose all
> > data in both Z and P regs. If userspace doesn't understand the
> > architecture, that's their problem. The only thing we need to provide
> > is a faithful emulation of the architecture.
> 
> It is not clear to me that the intention behind the KVM register ABI is
> that it should have all the ordering requirements that the architecture
> has, and decisions like hiding the V registers when SVE is active do
> take us away from just a natural architectural point of view.

My intention is that userspace accesses should be as close as possible
as that of a guest. If the current ABI doesn't allow this, I'm all for
changing it. Won't be the first time, won't be the last.

> My
> understanding was that it was intended that userspace should be able to
> do something like just enumerate all the registers, save them and then
> later on restore them without really understanding them.

And this statement doesn't get in the way of anything above. We own
the ABI, and can change it as we see fit when SME is enabled.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list