[PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation

Alexandru Elisei alexandru.elisei at arm.com
Thu Jul 11 03:56:13 PDT 2024


Hi,

On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> In order to plug the brokenness of our current AT implementation,
> we need a SW walker that is going to... err.. walk the S1 tables
> and tell us what it finds.
> 
> Of course, it builds on top of our S2 walker, and share similar
> concepts. The beauty of it is that since it uses kvm_read_guest(),
> it is able to bring back pages that have been otherwise evicted.
> 
> This is then plugged in the two AT S1 emulation functions as
> a "slow path" fallback. I'm not sure it is that slow, but hey.
> [..]
>  	switch (op) {
>  	case OP_AT_S1E1RP:
>  	case OP_AT_S1E1WP:
> +		retry_slow = false;
>  		fail = check_at_pan(vcpu, vaddr, &par);
>  		break;
>  	default:
>  		goto nopan;
>  	}

For context, this is what check_at_pan() does:

static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res)
{
        u64 par_e0;
        int error;

        /*
         * For PAN-involved AT operations, perform the same translation,
         * using EL0 this time. Twice. Much fun.
         */
        error = __kvm_at(OP_AT_S1E0R, vaddr);
        if (error)
                return error;

        par_e0 = read_sysreg_par();
        if (!(par_e0 & SYS_PAR_EL1_F))
                goto out;

        error = __kvm_at(OP_AT_S1E0W, vaddr);
        if (error)
                return error;

        par_e0 = read_sysreg_par();
out:
        *res = par_e0;
        return 0;
}

I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W
regardless of the type of the access (read/write) in the PAN-aware AT
instruction. Would you mind elaborating on that?

> +	if (fail) {
> +		vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1);
> +		goto nopan;
> +	}
> [..]
> +	if (par & SYS_PAR_EL1_F) {
> +		u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> +
> +		/*
> +		 * If we get something other than a permission fault, we
> +		 * need to retry, as we're likely to have missed in the PTs.
> +		 */
> +		if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM)
> +			retry_slow = true;
> +	} else {
> +		/*
> +		 * The EL0 access succeded, but we don't have the full
> +		 * syndrom information to synthetize the failure. Go slow.
> +		 */
> +		retry_slow = true;
> +	}

This is what PSTATE.PAN controls:

If the Effective value of PSTATE.PAN is 1, then a privileged data access from
any of the following Exception levels to a virtual memory address that is
accessible to data accesses at EL0 generates a stage 1 Permission fault:

- A privileged data access from EL1.
- If HCR_EL2.E2H is 1, then a privileged data access from EL2.

With that in mind, I am really struggling to understand the logic.

If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual
memory address is not accessible to EL0? Add that to the fact that the AT
S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean
that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be
the one KVM got from AT S1E1{R,W}?

Thanks,
Alex

>  nopan:
>  	__mmu_config_restore(&config);
>  out:
>  	local_irq_restore(flags);
>  
>  	write_unlock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * If retry_slow is true, then we either are missing shadow S2
> +	 * entries, have paged out guest S1, or something is inconsistent.
> +	 *
> +	 * Either way, we need to walk the PTs by hand so that we can either
> +	 * fault things back, in or record accurate fault information along
> +	 * the way.
> +	 */
> +	if (retry_slow) {
> +		par = handle_at_slow(vcpu, op, vaddr);
> +		vcpu_write_sys_reg(vcpu, par, PAR_EL1);
> +	}
>  }
>  
>  void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> @@ -433,6 +931,10 @@ void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>  
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  
> +	/* We failed the translation, let's replay it in slow motion */
> +	if (!fail && (par & SYS_PAR_EL1_F))
> +		par = handle_at_slow(vcpu, op, vaddr);
> +
>  	vcpu_write_sys_reg(vcpu, par, PAR_EL1);
>  }
>  
> -- 
> 2.39.2
> 
> 



More information about the linux-arm-kernel mailing list