[PATCH v6 3/6] KVM: arm64: Enable writable for ID_AA64DFR0_EL1 and ID_DFR0_EL1

Jing Zhang jingzhangos at google.com
Mon Jul 31 13:51:18 PDT 2023


On Sat, Jul 29, 2023 at 3:36 AM Marc Zyngier <maz at kernel.org> wrote:
>
> On Fri, 21 Jul 2023 10:48:27 +0100,
> Cornelia Huck <cohuck at redhat.com> wrote:
> >
> > On Fri, Jul 21 2023, Marc Zyngier <maz at kernel.org> wrote:
> >
> > > On Fri, 21 Jul 2023 09:38:23 +0100,
> > > Cornelia Huck <cohuck at redhat.com> wrote:
> > >>
> > >> On Thu, Jul 20 2023, Jing Zhang <jingzhangos at google.com> wrote:
> > >> > No mechanism was provided to userspace to discover if a given idreg or
> > >> > any fields of a given idreg is writable. The write to a readonly idreg
> > >> > can also succeed (write ignored) without any error if what's written
> > >> > is exactly the same as what the idreg holds or if it is a write to
> > >> > AArch32 idregs on an AArch64-only system.
> > >>
> > >> Hm, I'm not sure that's a good thing for the cases where we want to
> > >> support mix-and-match userspace and kernels. Userspace may want to know
> > >> upfront whether it can actually tweak the contents of an idreg or not
> > >> (for example, in the context of using CPU models for compatibility), so
> > >> that it can reject or warn about certain configurations that may not
> > >> turn out as the user expects.
> > >>
> > >> > Not sure if it is worth adding an API to return the writable mask for
> > >> > idregs, since we want to enable the writable for all allocated
> > >> > unhidden idregs eventually.
> > >>
> > >> We'd enable any new idregs for writing from the start in the future, I
> > >> guess?
> > >>
> > >> I see two approaches here:
> > >> - add an API to get a list of idregs with their writable masks
> > >> - add a capability "you can write to all idregs whatever you'd expect to
> > >>   be able to write there architecture wise", which would require to add
> > >>   support for all idregs prior to exposing that cap
> > >>
> > >> The second option would be the easier one (if we don't manage to break
> > >> it in the future :)
> > >
> > > I'm not sure the last option is even possible. The architecture keeps
> > > allocating new ID registers in the op0==3, op1=={0, 1, 3}, CRn==0,
> > > CRm=={0-7}, op2=={0-7} space, so fields that were RES0 until then
> > > start having a non-0 value.
> > >
> > > This could lead to a situation where you move from a system that
> > > didn't know about ID_AA64MMFR6_EL1.XYZ to a system that advertises it,
> > > and for which the XYZ instruction has another behaviour. Bad things
> > > follow.
> >
> > Hrm :(
> >
> > >
> > > My preference would be a single ioctl that returns the full list of
> > > writeable masks in the ID reg range. It is big, but not crazy big
> > > (1536 bytes, if I haven't messed up), and includes the non ID_*_EL1
> > > sysreg such as MPIDR_EL1, CTR_EL1, SMIDR_EL1.
> > >
> > > It would allow the VMM to actively write zeroes to any writable ID
> > > register it doesn't know about, or for which it doesn't have anything
> > > to restore. It is also relatively future proof, as it covers
> > > *everything* the architecture has provisioned for the future (by the
> > > time that space is exhausted, I hope none of us will still be involved
> > > with this crap).
> >
> > Famous last words :)
> >
> > But yes, that should work. This wouldn't be the first ioctl returning a
> > long list, and the VMM would just call it once on VM startup to figure
> > things out anyway.
>
> To be clear, see below for what I had in mind. It is of course
> untested, and is probably overlooking a number of details, but you'll
> hopefully get my drift. I think this has some benefit over the
> per-sysreg ioctl, as it covers everything in one go, and is guaranteed
> to be exhaustive (until the architecture grows another range of ID
> crap).
>
> Note that we don't necessarily need to restrict ourselves to a single
> range either. We could also return some other ranges depending on
> additional parameters (Oliver mentioned offline the case of the
> PCMEIDx_EL0 registers).
>
> Thank,
>
>         M.
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2ca2973abe66..fa79f3651423 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3589,3 +3589,91 @@ int __init kvm_sys_reg_table_init(void)
>
>         return 0;
>  }
> +
> +/*
> + * From DDI0487J.a, D19.2.66 ("ID_AA64MMFR2_EL1, AArch64 Memory Model
> + * Feature Register 2"):
> + *
> + * "The Feature ID space is defined as the System register space in
> + * AArch64 with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7},
> + * op2=={0-7}."
> + *
> + * This covers all R/O registers that indicate anything useful feature
> + * wise, including the ID registers.
> + */
> +
> +/* Userspace-visible definitions */
> +#define ARM64_FEATURE_ID_SPACE_SIZE    (3 * 8 * 8)
> +#define __ARM64_FEATURE_ID_SPACE_IDX(op0, op1, crn, crm, op2)          \
> +       ({                                                              \
> +               __u64 __op1 = op1 & 3;                                  \
> +               __op1 -= (__op1 == 3);                                  \
> +               ((ARM64_SYS_REG_SHIFT_MASK(3, OP0) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(__op1, OP1) |                \
> +                 ARM64_SYS_REG_SHIFT_MASK(0, CRN) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(crm & 7, CRM) |              \
> +                 ARM64_SYS_REG_SHIFT_MASK(op2, OP2)) -                 \
> +                (ARM64_SYS_REG_SHIFT_MASK(3, OP0) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(0, OP1) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(0, CRN) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(0, CRM) |                    \
> +                 ARM64_SYS_REG_SHIFT_MASK(0, OP2)));                   \
> +       })
> +
> +#define ARM64_FEATURE_ID_SPACE_INDEX(r)                                        \
> +       __ARM64_FEATURE_ID_SPACE_IDX(sys_reg_Op0(r),                    \
> +                                    sys_reg_Op1(r),                    \
> +                                    sys_reg_CRn(r),                    \
> +                                    sys_reg_CRm(r),                    \
> +                                    sys_reg_Op2(r))
> +
> +struct feature_id_writeable_masks {
> +       u64     mask[ARM64_FEATURE_ID_SPACE_SIZE];
> +};
> +
> +static bool is_feature_id_reg(u32 encoding)
> +{
> +       return (sys_reg_Op0(encoding) == 3 &&
> +               (sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
> +               sys_reg_CRn(encoding) == 0 &&
> +               sys_reg_CRm(encoding) <= 7);
> +}
> +
> +int kvm_get_writeable_feature_regs(struct kvm *kvm, u64 __user *masks)
> +{
> +       /* Wipe the whole thing first */
> +       for (int i = 0; i < ARM64_FEATURE_ID_SPACE_SIZE; i++)
> +               if (put_user(0, masks + i))
> +                       return -EFAULT;
> +
> +       for (int i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
> +               const struct sys_reg_desc *reg = &sys_reg_descs[i];
> +               u32 encoding = reg_to_encoding(reg);
> +               u64 val;
> +
> +               if (!is_feature_id_reg(encoding) || !reg->set_user)
> +                       continue;
> +
> +               /*
> +                * For ID registers, we return the writable mask.
> +                * Other feature registers return a full 64bit mask.
> +                * That's not necessarily compliant with a given
> +                * revision of the architecture, but the RES0/RES1
> +                * definitions allow us to do that
> +                */
> +               if (is_id_reg(encoding)) {
> +                       if (!reg->val)
> +                               continue;
> +
> +                       val = reg->val;
> +               } else {
> +                       val = ~0UL;
> +               }
> +
> +               if (put_user(val,
> +                            (masks + ARM64_FEATURE_ID_SPACE_INDEX(encoding))))
> +                       return -EFAULT;
> +       }
> +
> +       return 0;
> +}
Thanks Marc.
The whole idea is clear to me now. I'll implement this in the next version.

Jing
>
>
>
> --
> Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list