[PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation

Marc Zyngier maz at kernel.org
Thu Jul 11 01:05:02 PDT 2024


On Wed, 10 Jul 2024 16:12:53 +0100,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> 
> Hi Marc,
> 
> On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> > In order to plug the brokenness of our current AT implementation,
> > we need a SW walker that is going to... err.. walk the S1 tables
> > and tell us what it finds.
> > 
> > Of course, it builds on top of our S2 walker, and share similar
> > concepts. The beauty of it is that since it uses kvm_read_guest(),
> > it is able to bring back pages that have been otherwise evicted.
> > 
> > This is then plugged in the two AT S1 emulation functions as
> > a "slow path" fallback. I'm not sure it is that slow, but hey.
> > 
> > Signed-off-by: Marc Zyngier <maz at kernel.org>
> > [..]
> > @@ -331,18 +801,17 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> >  	}
> >  
> >  	if (!fail)
> > -		par = read_sysreg(par_el1);
> > +		par = read_sysreg_par();
> >  	else
> >  		par = SYS_PAR_EL1_F;
> >  
> > +	retry_slow = !fail;
> > +
> >  	vcpu_write_sys_reg(vcpu, par, PAR_EL1);
> >  
> >  	/*
> > -	 * Failed? let's leave the building now.
> > -	 *
> > -	 * FIXME: how about a failed translation because the shadow S2
> > -	 * wasn't populated? We may need to perform a SW PTW,
> > -	 * populating our shadow S2 and retry the instruction.
> > +	 * Failed? let's leave the building now, unless we retry on
> > +	 * the slow path.
> >  	 */
> >  	if (par & SYS_PAR_EL1_F)
> >  		goto nopan;
> 
> This is what follows after the 'if' statement above, and before the 'switch'
> below:
> 
>         /* No PAN? No problem. */
>         if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
>                 goto nopan;
> 
> When KVM is executing this statement, the following is true:
> 
> 1. SYS_PAR_EL1_F is clear => the hardware translation table walk was successful.
> 2. retry_slow = true;
>
> Then if the PAN bit is not set, the function jumps to the nopan label, and
> performs a software translation table walk, even though the hardware walk
> performed by AT was successful.

Hmmm. Are you being polite and trying to avoid saying that this code
is broken and that I should look for a retirement home instead?
There, I've said it for you! ;-)

The more I stare at this code, the more I hate it. Trying to
interleave the replay condition with the many potential failure modes
of the HW walker feels completely wrong, and I feel that I'd better
split the whole thing in two:

void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
{
	__kvm_at_s1e01_hw(vcpu, vaddr);
	if (vcpu_read_sys_reg(vcpu, PAR_EL1) & SYS_PAR_F)
		__kvm_at_s1e01_sw(vcpu, vaddr);
}

and completely stop messing with things. This is AT S1 we're talking
about, not something that has any sort of high-frequency. Apart for
Xen. But as Butch said: "Xen's dead, baby. Xen's dead.".

> 
> > @@ -354,29 +823,58 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> >  	switch (op) {
> >  	case OP_AT_S1E1RP:
> >  	case OP_AT_S1E1WP:
> > +		retry_slow = false;
> >  		fail = check_at_pan(vcpu, vaddr, &par);
> >  		break;
> >  	default:
> >  		goto nopan;
> >  	}
> >  
> > +	if (fail) {
> > +		vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1);
> > +		goto nopan;
> > +	}
> > +
> >  	/*
> >  	 * If the EL0 translation has succeeded, we need to pretend
> >  	 * the AT operation has failed, as the PAN setting forbids
> >  	 * such a translation.
> > -	 *
> > -	 * FIXME: we hardcode a Level-3 permission fault. We really
> > -	 * should return the real fault level.
> >  	 */
> > -	if (fail || !(par & SYS_PAR_EL1_F))
> > -		vcpu_write_sys_reg(vcpu, (0xf << 1) | SYS_PAR_EL1_F, PAR_EL1);
> > -
> > +	if (par & SYS_PAR_EL1_F) {
> > +		u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> > +
> > +		/*
> > +		 * If we get something other than a permission fault, we
> > +		 * need to retry, as we're likely to have missed in the PTs.
> > +		 */
> > +		if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM)
> > +			retry_slow = true;
> 
> Shouldn't VCPU's PAR_EL1 register be updated here? As far as I can tell, at this
> point the VCPU PAR_EL1 register has the result from the successful walk
> performed by AT S1E1R or AT S1E1W in the first 'switch' statement.

Yup, yet another sign that this flow is broken. I'll apply my last few
grey cells to it, and hopefully the next iteration will be a bit
better.

Thanks a lot for having a look!

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list