[PATCH v6 3/6] KVM: arm64: Enable writable for ID_AA64DFR0_EL1 and ID_DFR0_EL1
Marc Zyngier
maz at kernel.org
Sat Jul 29 03:36:50 PDT 2023
On Fri, 21 Jul 2023 10:48:27 +0100,
Cornelia Huck <cohuck at redhat.com> wrote:
>
> On Fri, Jul 21 2023, Marc Zyngier <maz at kernel.org> wrote:
>
> > On Fri, 21 Jul 2023 09:38:23 +0100,
> > Cornelia Huck <cohuck at redhat.com> wrote:
> >>
> >> On Thu, Jul 20 2023, Jing Zhang <jingzhangos at google.com> wrote:
> >> > No mechanism was provided to userspace to discover if a given idreg or
> >> > any fields of a given idreg is writable. The write to a readonly idreg
> >> > can also succeed (write ignored) without any error if what's written
> >> > is exactly the same as what the idreg holds or if it is a write to
> >> > AArch32 idregs on an AArch64-only system.
> >>
> >> Hm, I'm not sure that's a good thing for the cases where we want to
> >> support mix-and-match userspace and kernels. Userspace may want to know
> >> upfront whether it can actually tweak the contents of an idreg or not
> >> (for example, in the context of using CPU models for compatibility), so
> >> that it can reject or warn about certain configurations that may not
> >> turn out as the user expects.
> >>
> >> > Not sure if it is worth adding an API to return the writable mask for
> >> > idregs, since we want to enable the writable for all allocated
> >> > unhidden idregs eventually.
> >>
> >> We'd enable any new idregs for writing from the start in the future, I
> >> guess?
> >>
> >> I see two approaches here:
> >> - add an API to get a list of idregs with their writable masks
> >> - add a capability "you can write to all idregs whatever you'd expect to
> >> be able to write there architecture wise", which would require to add
> >> support for all idregs prior to exposing that cap
> >>
> >> The second option would be the easier one (if we don't manage to break
> >> it in the future :)
> >
> > I'm not sure the last option is even possible. The architecture keeps
> > allocating new ID registers in the op0==3, op1=={0, 1, 3}, CRn==0,
> > CRm=={0-7}, op2=={0-7} space, so fields that were RES0 until then
> > start having a non-0 value.
> >
> > This could lead to a situation where you move from a system that
> > didn't know about ID_AA64MMFR6_EL1.XYZ to a system that advertises it,
> > and for which the XYZ instruction has another behaviour. Bad things
> > follow.
>
> Hrm :(
>
> >
> > My preference would be a single ioctl that returns the full list of
> > writeable masks in the ID reg range. It is big, but not crazy big
> > (1536 bytes, if I haven't messed up), and includes the non ID_*_EL1
> > sysreg such as MPIDR_EL1, CTR_EL1, SMIDR_EL1.
> >
> > It would allow the VMM to actively write zeroes to any writable ID
> > register it doesn't know about, or for which it doesn't have anything
> > to restore. It is also relatively future proof, as it covers
> > *everything* the architecture has provisioned for the future (by the
> > time that space is exhausted, I hope none of us will still be involved
> > with this crap).
>
> Famous last words :)
>
> But yes, that should work. This wouldn't be the first ioctl returning a
> long list, and the VMM would just call it once on VM startup to figure
> things out anyway.
To be clear, see below for what I had in mind. It is of course
untested, and is probably overlooking a number of details, but you'll
hopefully get my drift. I think this has some benefit over the
per-sysreg ioctl, as it covers everything in one go, and is guaranteed
to be exhaustive (until the architecture grows another range of ID
crap).
Note that we don't necessarily need to restrict ourselves to a single
range either. We could also return some other ranges depending on
additional parameters (Oliver mentioned offline the case of the
PCMEIDx_EL0 registers).
Thank,
M.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2ca2973abe66..fa79f3651423 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3589,3 +3589,91 @@ int __init kvm_sys_reg_table_init(void)
return 0;
}
+
+/*
+ * From DDI0487J.a, D19.2.66 ("ID_AA64MMFR2_EL1, AArch64 Memory Model
+ * Feature Register 2"):
+ *
+ * "The Feature ID space is defined as the System register space in
+ * AArch64 with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7},
+ * op2=={0-7}."
+ *
+ * This covers all R/O registers that indicate anything useful feature
+ * wise, including the ID registers.
+ */
+
+/* Userspace-visible definitions */
+#define ARM64_FEATURE_ID_SPACE_SIZE (3 * 8 * 8)
+#define __ARM64_FEATURE_ID_SPACE_IDX(op0, op1, crn, crm, op2) \
+ ({ \
+ __u64 __op1 = op1 & 3; \
+ __op1 -= (__op1 == 3); \
+ ((ARM64_SYS_REG_SHIFT_MASK(3, OP0) | \
+ ARM64_SYS_REG_SHIFT_MASK(__op1, OP1) | \
+ ARM64_SYS_REG_SHIFT_MASK(0, CRN) | \
+ ARM64_SYS_REG_SHIFT_MASK(crm & 7, CRM) | \
+ ARM64_SYS_REG_SHIFT_MASK(op2, OP2)) - \
+ (ARM64_SYS_REG_SHIFT_MASK(3, OP0) | \
+ ARM64_SYS_REG_SHIFT_MASK(0, OP1) | \
+ ARM64_SYS_REG_SHIFT_MASK(0, CRN) | \
+ ARM64_SYS_REG_SHIFT_MASK(0, CRM) | \
+ ARM64_SYS_REG_SHIFT_MASK(0, OP2))); \
+ })
+
+#define ARM64_FEATURE_ID_SPACE_INDEX(r) \
+ __ARM64_FEATURE_ID_SPACE_IDX(sys_reg_Op0(r), \
+ sys_reg_Op1(r), \
+ sys_reg_CRn(r), \
+ sys_reg_CRm(r), \
+ sys_reg_Op2(r))
+
+struct feature_id_writeable_masks {
+ u64 mask[ARM64_FEATURE_ID_SPACE_SIZE];
+};
+
+static bool is_feature_id_reg(u32 encoding)
+{
+ return (sys_reg_Op0(encoding) == 3 &&
+ (sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
+ sys_reg_CRn(encoding) == 0 &&
+ sys_reg_CRm(encoding) <= 7);
+}
+
+int kvm_get_writeable_feature_regs(struct kvm *kvm, u64 __user *masks)
+{
+ /* Wipe the whole thing first */
+ for (int i = 0; i < ARM64_FEATURE_ID_SPACE_SIZE; i++)
+ if (put_user(0, masks + i))
+ return -EFAULT;
+
+ for (int i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
+ const struct sys_reg_desc *reg = &sys_reg_descs[i];
+ u32 encoding = reg_to_encoding(reg);
+ u64 val;
+
+ if (!is_feature_id_reg(encoding) || !reg->set_user)
+ continue;
+
+ /*
+ * For ID registers, we return the writable mask.
+ * Other feature registers return a full 64bit mask.
+ * That's not necessarily compliant with a given
+ * revision of the architecture, but the RES0/RES1
+ * definitions allow us to do that
+ */
+ if (is_id_reg(encoding)) {
+ if (!reg->val)
+ continue;
+
+ val = reg->val;
+ } else {
+ val = ~0UL;
+ }
+
+ if (put_user(val,
+ (masks + ARM64_FEATURE_ID_SPACE_INDEX(encoding))))
+ return -EFAULT;
+ }
+
+ return 0;
+}
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list