[PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation

Marc Zyngier maz at kernel.org
Wed Jul 31 03:18:06 PDT 2024


On Wed, 31 Jul 2024 10:53:14 +0100,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> 
> Hi,
> 
> On Wed, Jul 31, 2024 at 09:55:28AM +0100, Marc Zyngier wrote:
> > On Mon, 29 Jul 2024 16:26:00 +0100,
> > Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> > > > In order to plug the brokenness of our current AT implementation,
> > > > we need a SW walker that is going to... err.. walk the S1 tables
> > > > and tell us what it finds.
> > > > 
> > > > Of course, it builds on top of our S2 walker, and share similar
> > > > concepts. The beauty of it is that since it uses kvm_read_guest(),
> > > > it is able to bring back pages that have been otherwise evicted.
> > > > 
> > > > This is then plugged in the two AT S1 emulation functions as
> > > > a "slow path" fallback. I'm not sure it is that slow, but hey.
> > > > 
> > > > Signed-off-by: Marc Zyngier <maz at kernel.org>
> > > > ---
> > > >  arch/arm64/kvm/at.c | 538 ++++++++++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 520 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> > > > index 71e3390b43b4c..8452273cbff6d 100644
> > > > --- a/arch/arm64/kvm/at.c
> > > > +++ b/arch/arm64/kvm/at.c
> > > > @@ -4,9 +4,305 @@
> > > >   * Author: Jintack Lim <jintack.lim at linaro.org>
> > > >   */
> > > >  
> > > > +#include <linux/kvm_host.h>
> > > > +
> > > > +#include <asm/esr.h>
> > > >  #include <asm/kvm_hyp.h>
> > > >  #include <asm/kvm_mmu.h>
> > > >  
> > > > +struct s1_walk_info {
> > > > +	u64	     baddr;
> > > > +	unsigned int max_oa_bits;
> > > > +	unsigned int pgshift;
> > > > +	unsigned int txsz;
> > > > +	int 	     sl;
> > > > +	bool	     hpd;
> > > > +	bool	     be;
> > > > +	bool	     nvhe;
> > > > +	bool	     s2;
> > > > +};
> > > > +
> > > > +struct s1_walk_result {
> > > > +	union {
> > > > +		struct {
> > > > +			u64	desc;
> > > > +			u64	pa;
> > > > +			s8	level;
> > > > +			u8	APTable;
> > > > +			bool	UXNTable;
> > > > +			bool	PXNTable;
> > > > +		};
> > > > +		struct {
> > > > +			u8	fst;
> > > > +			bool	ptw;
> > > > +			bool	s2;
> > > > +		};
> > > > +	};
> > > > +	bool	failed;
> > > > +};
> > > > +
> > > > +static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool ptw, bool s2)
> > > > +{
> > > > +	wr->fst		= fst;
> > > > +	wr->ptw		= ptw;
> > > > +	wr->s2		= s2;
> > > > +	wr->failed	= true;
> > > > +}
> > > > +
> > > > +#define S1_MMU_DISABLED		(-127)
> > > > +
> > > > +static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
> > > > +			 struct s1_walk_result *wr, const u64 va, const int el)
> > > > +{
> > > > +	u64 sctlr, tcr, tg, ps, ia_bits, ttbr;
> > > > +	unsigned int stride, x;
> > > > +	bool va55, tbi;
> > > > +
> > > > +	wi->nvhe = el == 2 && !vcpu_el2_e2h_is_set(vcpu);
> > > 
> > > Where 'el' is computed in handle_at_slow() as:
> > > 
> > > 	/*
> > > 	 * We only get here from guest EL2, so the translation regime
> > > 	 * AT applies to is solely defined by {E2H,TGE}.
> > > 	 */
> > > 	el = (vcpu_el2_e2h_is_set(vcpu) &&
> > > 	      vcpu_el2_tge_is_set(vcpu)) ? 2 : 1;
> > > 
> > > I think 'nvhe' will always be false ('el' is 2 only when E2H is
> > > set).
> > 
> > Yeah, there is a number of problems here. el should depend on both the
> > instruction (some are EL2-specific) and the HCR control bits. I'll
> > tackle that now.
> 
> Yeah, also noticed that how sctlr, tcr and ttbr are chosen in setup_s1_walk()
> doesn't look quite right for the nvhe case.

Are you sure? Assuming the 'el' value is correct (and I think I fixed
that on my local branch), they seem correct to me (we check for va55
early in the function to avoid an later issue).

Can you point out what exactly fails in that logic?

>
> > 
> > > I'm curious about what 'el' represents. The translation regime for the AT
> > > instruction?
> > 
> > Exactly that.
> 
> Might I make a suggestion here? I was thinking about dropping the (el, wi-nvhe*)
> tuple to represent the translation regime and have a wi->regime (or similar) to
> unambiguously encode the regime. The value can be an enum with three values to
> represent the three possible regimes (REGIME_EL10, REGIME_EL2, REGIME_EL20).

I've been thinking of that, but I'm wondering whether that just
results in pretty awful code in the end, because we go from 2 cases
(el==1 or el==2) to 3. But most of the time, we don't care about the
E2H=0 case, because we can handle it just like E2H=1.

I'll give it a go and see what it looks like.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list