[PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch

Marc Zyngier maz at kernel.org
Thu Mar 7 03:10:40 PST 2024


On Wed, 06 Mar 2024 22:19:03 +0000,
Mark Brown <broonie at kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie at kernel.org> wrote:
> > > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> 
> > > > Move the ownership tracking into the host data structure, and
> > > > rename it from fp_state to fp_owner, which is a better description
> > > > (name suggested by Mark Brown).
> 
> > > The SME patch series proposes adding an additional state to this
> > > enumeration which would say if the registers are stored in a format
> > > suitable for exchange with userspace, that would make this state part of
> > > the vCPU state.  With the addition of SME we can have two vector lengths
> > > in play so the series proposes picking the larger to be the format for
> > > userspace registers.
> 
> > What does this addition have anything to do with the ownership of the
> > physical register file? Not a lot, it seems.
> 
> > Specially as there better be no state resident on the CPU when
> > userspace messes up with it.
> 
> If we have a situation where the state might be stored in memory in
> multiple formats it seems reasonable to consider the metadata which
> indicates which format is currently in use as part of the state.

There is no reason why the state should be in multiple formats
*simultaneously*. All the FP/SIMD/SVE/SME state is largely
overlapping, and we only need to correctly invalidate the state that
isn't relevant to writes from userspace.

> 
> > > We could store this separately to fp_state/owner but it'd still be a
> > > value stored in the vCPU.
> 
> > I totally disagree.
> 
> Where would you expect to see the state stored?

Sorry, that came out wrong. I expect *some* vcpu state to describe the
current use of the FP/vector registers, and that's about it. Not the
ownership information.

> 
> > > Storing in a format suitable for userspace
> > > usage all the time when we've got SME would most likely result in
> > > performance overhead
> 
> > What performance overhead? Why should we care?
> 
> Since in situations where we're not using the larger VL we would need to
> load and store the registers using a vector length other than the
> currently configured vector length we would not be able to use the
> ability to load and store to a location based on a multiple of the
> vector length that the architecture has:
> 
>    LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    
>    STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
> 
> and would instead need to manually compute the memory locations where
> values are stored.  As well as the extra instructions when using the
> smaller vector length we'd also be working with sparser data likely over
> more cache lines.

Are you talking about a context switch? or userspace accesses? I don't
give a damn about the latter, as it statistically never happens. The
former is of course of interest, but you still don't explain why the
above is a problem.

Nothing prevent you from storing the registers using the *current* VL,
since there is no data sharing between the SVE registers and the
streaming-SVE ones. All you need to do is to make sure you don't mix
the two.

> We would also need to consider if we need to zero the holes in the data
> when saving, we'd only potentially be leaking information from the guest
> but it might cause nasty surprises given that transitioning to/from
> streaming mode is expected to zero values.  If we do need to zero then
> that would be additional work that would need doing.

The zeroing is mandated by the architecture, AFAIU. That's not optional.

> 
> Exactly what the performance hit would be will be system and use case
> dependent.  *Hopefully* we aren't needing to save and load the guest
> state too often but I would be very surprised if we didn't have people
> considering any cost in the guest context switch path worth paying
> attention to.

These people can come out of the wood with numbers and reproducible
workloads. Until they do, their concerns do not really exist.

> As well as the performance overhead there would be some code complexity
> cost, if nothing else we'd not be using the same format as fpsimd_save()
> and would need to rearrange how we handle saving the register state.

And I'm fine with that. The way we store things is nobody's business
but ours, and I'm not sentimentally attached to 15 year old code.

> Spending more effort to implement something which also has more runtime
> performance overhead for the case of saving and restoring guest state
> which I expect to be vastly more common than the VMM accessing the guest
> registers just doesn't seem like an appealing choice.

I don't buy the runtime performance aspect at all. As long as you have
the space to dump the largest possible VL, you can always dump it in
the native format.

> > > if nothing else and feels more complicated than
> > > rewriting the data in the relatively unusual case where userspace looks
> > > at it.  Trying to convert userspace writes into the current layout would
> > > have issues if the current layout uses the smaller vector length and
> > > create fragility with ordering issues when loading the guest state.
> 
> > What ordering issues? If userspace manipulates the guest state, the
> > guest isn't running. If it is, all bets are off.
> 
> If we were storing the data in the native format for the guest then that
> format will change if streaming mode is changed via a write to SVCR.
> This would mean that the host would need to understand that when writing
> values SVCR needs to be written before the Z and P registers.  To be
> clear I don't think this is a good idea.

The architecture is crystal clear: you flip SVCR.SM, you loose all
data in both Z and P regs. If userspace doesn't understand the
architecture, that's their problem. The only thing we need to provide
is a faithful emulation of the architecture.

> 
> > > The proposal is not the most lovely idea ever but given the architecture
> > > I think some degree of clunkiness would be unavoidable.
> 
> > It is only unavoidable if we decide to make a bad job of it.
> 
> I don't think the handling of the vector registers for KVM with SME is
> something where there is a clear good and bad job we can do - I don't
> see how we can reasonably avoid at some point needing to translate
> vector lengths or to/from FPSIMD format (in the case of a system with
> SME but not SVE) which is just inherently a sharp edge.  It's just a
> question of when and how we do that.

My point is that there is no reason to translate the vector registers.
As long as your vcpu is in a given mode, all storage is done in that
mode. You switch mode, you lose data, as per the architecture. And
yes, there is some zeroing and invalidation to perform if the vcpu has
switched mode behind your back.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list