KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

Tue Apr 19 07:10:13 PDT 2022

On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote:
> The approach I've taken so far in adding support for SPE in KVM [1] relies
> on pinning the entire VM memory to avoid SPE triggering stage 2 faults
> altogether. I've taken this approach because:
> 
> 1. SPE reports the guest VA on an stage 2 fault, similar to stage 1 faults,
> and at the moment KVM has no way to resolve the VA to IPA translation.  The
> AT instruction is not useful here, because PAR_EL1 doesn't report the IPA
> in the case of a stage 2 fault on a stage 1 translation table walk.
> 
> 2. The stage 2 fault is reported asynchronously via an interrupt, which
> means there will be a window where profiling is stopped from the moment SPE
> triggers the fault and when the PE taks the interrupt. This blackout window
> is obviously not present when running on bare metal, as there is no second
> stage of address translation being performed.

Are these faults actually recoverable? My memory is a bit hazy here, but I
thought SPE buffer data could be written out in whacky ways such that even
a bog-standard page fault could result in uncoverable data loss (i.e. DL=1),
and so pinning is the only game in town.

A funkier approach might be to defer pinning of the buffer until the SPE is
enabled and avoid pinning all of VM memory that way, although I can't
immediately tell how flexible the architecture is in allowing you to cache
the base/limit values.

Will