[PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation
Alexandru Elisei
alexandru.elisei at arm.com
Thu Jul 11 03:56:13 PDT 2024
Hi,
On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> In order to plug the brokenness of our current AT implementation,
> we need a SW walker that is going to... err.. walk the S1 tables
> and tell us what it finds.
>
> Of course, it builds on top of our S2 walker, and share similar
> concepts. The beauty of it is that since it uses kvm_read_guest(),
> it is able to bring back pages that have been otherwise evicted.
>
> This is then plugged in the two AT S1 emulation functions as
> a "slow path" fallback. I'm not sure it is that slow, but hey.
> [..]
> switch (op) {
> case OP_AT_S1E1RP:
> case OP_AT_S1E1WP:
> + retry_slow = false;
> fail = check_at_pan(vcpu, vaddr, &par);
> break;
> default:
> goto nopan;
> }
For context, this is what check_at_pan() does:
static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res)
{
u64 par_e0;
int error;
/*
* For PAN-involved AT operations, perform the same translation,
* using EL0 this time. Twice. Much fun.
*/
error = __kvm_at(OP_AT_S1E0R, vaddr);
if (error)
return error;
par_e0 = read_sysreg_par();
if (!(par_e0 & SYS_PAR_EL1_F))
goto out;
error = __kvm_at(OP_AT_S1E0W, vaddr);
if (error)
return error;
par_e0 = read_sysreg_par();
out:
*res = par_e0;
return 0;
}
I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W
regardless of the type of the access (read/write) in the PAN-aware AT
instruction. Would you mind elaborating on that?
> + if (fail) {
> + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1);
> + goto nopan;
> + }
> [..]
> + if (par & SYS_PAR_EL1_F) {
> + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> +
> + /*
> + * If we get something other than a permission fault, we
> + * need to retry, as we're likely to have missed in the PTs.
> + */
> + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM)
> + retry_slow = true;
> + } else {
> + /*
> + * The EL0 access succeded, but we don't have the full
> + * syndrom information to synthetize the failure. Go slow.
> + */
> + retry_slow = true;
> + }
This is what PSTATE.PAN controls:
If the Effective value of PSTATE.PAN is 1, then a privileged data access from
any of the following Exception levels to a virtual memory address that is
accessible to data accesses at EL0 generates a stage 1 Permission fault:
- A privileged data access from EL1.
- If HCR_EL2.E2H is 1, then a privileged data access from EL2.
With that in mind, I am really struggling to understand the logic.
If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual
memory address is not accessible to EL0? Add that to the fact that the AT
S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean
that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be
the one KVM got from AT S1E1{R,W}?
Thanks,
Alex
> nopan:
> __mmu_config_restore(&config);
> out:
> local_irq_restore(flags);
>
> write_unlock(&vcpu->kvm->mmu_lock);
> +
> + /*
> + * If retry_slow is true, then we either are missing shadow S2
> + * entries, have paged out guest S1, or something is inconsistent.
> + *
> + * Either way, we need to walk the PTs by hand so that we can either
> + * fault things back, in or record accurate fault information along
> + * the way.
> + */
> + if (retry_slow) {
> + par = handle_at_slow(vcpu, op, vaddr);
> + vcpu_write_sys_reg(vcpu, par, PAR_EL1);
> + }
> }
>
> void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> @@ -433,6 +931,10 @@ void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>
> write_unlock(&vcpu->kvm->mmu_lock);
>
> + /* We failed the translation, let's replay it in slow motion */
> + if (!fail && (par & SYS_PAR_EL1_F))
> + par = handle_at_slow(vcpu, op, vaddr);
> +
> vcpu_write_sys_reg(vcpu, par, PAR_EL1);
> }
>
> --
> 2.39.2
>
>
More information about the linux-arm-kernel
mailing list