[PATCH v2 4/4] KVM: arm64: Remove PMSWINC_EL0 shadow register

Alexandru Elisei alexandru.elisei at arm.com
Tue Jul 20 09:44:32 PDT 2021


Hi Marc,

On 7/19/21 5:56 PM, Marc Zyngier wrote:
> Hi Alex,
>
> On 2021-07-19 17:35, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> On 7/19/21 1:39 PM, Marc Zyngier wrote:
>>> We keep an entry for the PMSWINC_EL0 register in the vcpu structure,
>>> while *never* writing anything there outside of reset.
>>>
>>> Given that the register is defined as write-only, that we always
>>> trap when this register is accessed, there is little point in saving
>>> anything anyway.
>>>
>>> Get rid of the entry, and save a mighty 8 bytes per vcpu structure.
>>>
>>> We still need to keep it exposed to userspace in order to preserve
>>> backward compatibility with previously saved VMs. Since userspace
>>> cannot expect any effect of writing to PMSWINC_EL0, treat the
>>> register as RAZ/WI for the purpose of userspace access.
>>>
>>> Signed-off-by: Marc Zyngier <maz at kernel.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_host.h |  1 -
>>>  arch/arm64/kvm/sys_regs.c         | 21 ++++++++++++++++++++-
>>>  2 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 41911585ae0c..afc169630884 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -185,7 +185,6 @@ enum vcpu_sysreg {
>>>      PMCNTENSET_EL0,    /* Count Enable Set Register */
>>>      PMINTENSET_EL1,    /* Interrupt Enable Set Register */
>>>      PMOVSSET_EL0,    /* Overflow Flag Status Set Register */
>>> -    PMSWINC_EL0,    /* Software Increment Register */
>>>      PMUSERENR_EL0,    /* User Enable Register */
>>>
>>>      /* Pointer Authentication Registers in a strict increasing order. */
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index f22139658e48..a1f5101f49a3 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -1286,6 +1286,20 @@ static int set_raz_id_reg(struct kvm_vcpu *vcpu, const
>>> struct sys_reg_desc *rd,
>>>      return __set_id_reg(vcpu, rd, uaddr, true);
>>>  }
>>>
>>> +static int set_wi_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
>>> +              const struct kvm_one_reg *reg, void __user *uaddr)
>>> +{
>>> +    int err;
>>> +    u64 val;
>>> +
>>> +    /* Perform the access even if we are going to ignore the value */
>>> +    err = reg_from_user(&val, uaddr, sys_reg_to_index(rd));
>>
>> I don't understand why the read still happens if the value is ignored.
>> Just so KVM
>> preserves the previous behaviour and tells userspace there was an error?
>
> If userspace has given us a duff pointer, it needs to know about it.

Makes sense, thanks.

>
>>> +    if (err)
>>> +        return err;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>>                 const struct sys_reg_desc *r)
>>>  {
>>> @@ -1629,8 +1643,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>        .access = access_pmcnten, .reg = PMCNTENSET_EL0 },
>>>      { PMU_SYS_REG(SYS_PMOVSCLR_EL0),
>>>        .access = access_pmovs, .reg = PMOVSSET_EL0 },
>>> +    /*
>>> +     * PM_SWINC_EL0 is exposed to userspace as RAZ/WI, as it was
>>> +     * previously (and pointlessly) advertised in the past...
>>> +     */
>>>      { PMU_SYS_REG(SYS_PMSWINC_EL0),
>>> -      .access = access_pmswinc, .reg = PMSWINC_EL0 },
>>> +      .get_user = get_raz_id_reg, .set_user = set_wi_reg,
>>
>> In my opinion, the call chain to return 0 looks pretty confusing to me, as the
>> functions seemed made for ID register accesses, and the leaf function,
>> read_id_reg(), tries to match this register with a list of ID
>> registers. Since we
>> have already added a new function just for PMSWINC_EL0, I was
>> wondering if adding
>> another function, something like get_raz_reg(), would make more sense.
>
> In that case, I'd rather just kill get_raz_id_reg() and replace it with
> this get_raz_reg(). If we trat something as RAZ, who cares whether it is
> an idreg or not?

I agree, the Arm ARM doesn't make the distinction between ID registers and other
system registers in the definition of RAZ, I don't think KVM should either. And
the way read_id_reg() is written allows returning a value different than 0 even if
raz is true, which in my opinion could only happen because of a bug in KVM.

I can have a go at writing the patch(es) on top of this series, if you want. At
the moment I'm rewriting the KVM SPE series, so it will be a few weeks until I get
around to doing it though.

Thanks,

Alex




More information about the linux-arm-kernel mailing list