[RFC v2 REPOST] arm64: KVM: KVM API extensions for SVE
Christoffer Dall
christoffer.dall at linaro.org
Wed Feb 7 06:58:31 PST 2018
On Wed, Feb 07, 2018 at 11:33:28AM +0000, Dave Martin wrote:
> On Tue, Feb 06, 2018 at 02:17:57PM +0100, Christoffer Dall wrote:
> > On Tue, Feb 06, 2018 at 11:43:16AM +0000, Dave Martin wrote:
> > > On Mon, Feb 05, 2018 at 05:13:08PM +0100, Christoffer Dall wrote:
> > > > Hi Dave,
> > > >
> > > > On Fri, Jan 26, 2018 at 05:28:49PM +0000, Dave Martin wrote:
>
> [...]
>
> > > Exposing variable-size regs directly through the ioctl interface,
> > > or growing the ioctl view of the regs much beyond 2048 bits seems
> > > impractical, so the sliced view seemed reasonable. In practice, we
> > > might never need more than one slice anyway.
> > >
> > > > > The slice sizes allow each register to be read/written in exactly
> > > > > one slice for SVE.
> > > > >
> > > > > Surplus bits (beyond the maximum VL supported by the vcpu) will
> > > > > be read-as-zero write-ignore.
> > > >
> > > > This implies that there are some ordering fun to be had between
> > > > accessing SVE registers and changing the maximum vector length for the
> > > > vcpu. Why not loosen the API slightly and say that anything written to
> > > > the surplus bits is not guaranteed to be preserved when read back? (In
> > > > other words, we make no guarantees).
> > >
> > > I'm happy enough to say that, since it may remove some burden from the
> > > implementation; though only memzeroing so perhaps not too exciting.
>
> [...]
>
> > > > > * KVM_REG_ARM_SVE_Z(n, 0) bits [127:0] alias Vn in
> > > > > KVM_REG_ARM_CORE(fp_regs.v[n]) .. KVM_REG_ARM_CORE(fp_regs.v[n])+3.
> > > >
> > > > nit: KVM_REG_ARM_CORE_REG doesn't specify the size, so I think this
> > > > would be more clearly stated as
> > > > "aliases KVM_REG_SIZE_U128 | KVM_REG_ARM_CORE(fp_regs.vregs[n])" or as
> > > > "aliases kvm_regs.regs.fp_regs.vregs[n]".
> > >
> > > I thought the size was effectively hard-coded to be 64 bits for
> > > KVM_REG_ARM_CORE(). Or did I misunderstand?
> > >
> >
> > It's not. KVM_REG_ARM_CORE only encodes the 'group' of registers.
> > Userspace then has to OR on the size. So the 128 bits FP regs are
> > accessed with the following logic in QEMU:
> >
> > #define AARCH64_SIMD_CORE_REG(x) \
> > (KVM_REG_ARM64 | KVM_REG_SIZE_U128 | \
> > KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >
> > reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
> >
> > Looking back, the macros in kvm.h are potentially a bit silly, and
> > should rather have a full AARCH64_SIMD_CORE_REG() definition (maybe we
> > should add that), but the idea is to use the offset into the kvm
> > structure as an indication for *which* register to retrieve, and then
> > encode the size of the register at the same time, and provide a pointer
> > to a value of the same size, which should avoid any endianness problems
> > etc.
>
> After another look at arch/arm64/kvm/guest.c:get_core_reg(), I get
> this now.
>
> > > > > It's simplest for userspace if the two views always appear to be
> > > > > in sync, but it's unclear whether this is really useful. Perhaps
> > > > > this can be relaxed if it's a big deal for the KVM implementation;
> > > > > I don't know yet.
> > > >
> > > > It's not going to be a big deal, and it's the only sensible thing to do;
> > > > otherwise we'll have no clear semantics of where the values are. If KVM
> > > > chooses to duplicate the 128 bits of state in memory, then it must also
> > > > know how to syncrhonize that duplicated state when running the guest,
> > > > and therefore can also do this in software on SET/GET.
> > >
> > > I tried to remember my full rationale and describe what the rules
> > > would be if the two views are allowed to be incoherent ... and
> > > concluded that you are probably correct. This also matches the
> > > architectural view of the registers, which is what users would
> > > naturally expect.
> >
> > Some early experiences taught us that anything we expose to user space
> > should really be as close to the architectural concepts as possible,
> > otherwise we quickly venture into land of ambiguity, which is a terrible
> > place to be for a kernel<->userspace ABI.
>
> Agreed!
>
> > > Software intentionally modifying the registers would need to know
> > > about this aliasing, but since software is explicitly required to
> > > enable the KVM_ARM_VCPU_SVE feature, this shouldn't pose problems
> > > for backwards compatibility.
> > >
> >
> > Agreed.
> >
> > > > > Vector length control:
>
> [...]
>
> > > > > ioctl(vcpu_fd, KVM_ARM_SVE_SET_VLS, __u64 vqs[SVE_VQ_MAX / 64]);
> > > >
> > > > I don't understand the proposed semantics here. What is the vqs array
> > > > and how should the values be set?
> > >
> > > vqs is a bitmap, where bit (n - 1) indicates support for VL = 128n bits.
> > >
> > > Since the hardware is not required to support all possible vector
> > > lengths up to the maximum, a bitmap seemed the most straightforward way
> > > of representing the possibly sparse set of supported VLs.
> > >
> > > Calling it vq_bitmap[] would make this clearer.
> >
> > Either way, I just think the details need to be spelled out.
>
> Sure. This won't be the final documentation, but I'll bear it in
> mind. The vq bitmap concept is the same as used to track the supported
> vector lengths in fpsimd.c, but sane people are probably not aware of
> that.
>
> > > Describing only the maximum supported vl turns out to be insufficient.
> > >
> > > [...]
> > >
> > > > > At present, other than initialising each vcpu to the maximum
> > > > > supportable set of VLs, I don't propose having a way to probe for
> > > > > what sets of VLs are supportable: the above call either succeeds or
> > > > > fails.
> > > > >
> > > >
> > > > I think we should have such a way. Libvirt can do things like figure
> > > > out compatible features across nodes, and trying to set all possible
> > > > vector lengths doesn't seem like a great approach.
> > > >
> > > > How about creating the interface based on a bitmap of all the possible
> > > > lengths, and setting the bits for all lengths, and reading back the
> > > > value returns the actual supported lengths?
> > >
> > > Would it be sufficient for an initial call to KVM_ARM_VCPU_GET_SVE_VLS
> > > to return the default (i.e., maximum supported) set?
> > >
> > > This would mean that once KVM_ARM_VCPU_SET_SVE_VLS has been called to
> > > set a different set, it would no longer be possible to find out the
> > > maximum set. I'm not sure this would matter, though.
> > >
> >
> > I don't particularly like this, because the init sequence in QEMU can be
> > pretty complicated, and I think it can become pretty nasty to have to
> > remember if we 'once upon a time' called some ioctl, because then
> > calling it now will mean something else.
> >
> > In case my suggestion wasn't clear, this is what I meant:
> >
> > #define NR_VQS ((32 * 2048 / 128) / 64)
> >
> > __u64 vqs[NR_VQS] = { [0 ... NR_VQS - 1] = ~0 };
> >
> > ioctl(vcpu_fd, KVM_ARM_SVE_SET_VLS, vqs);
> > ioctl(vcpu_fd, KVM_ARM_SVE_GET_VLS, vqs);
> >
> > for (i = 0; i < 16; i++)
> > if (vqs[0] & BIT(i))
> > printf("vector length %d supported by KVM\n", (i+1) * 128);
>
> This does mean that if the caller doesn't get the set they asked for
> in KVM_ARM_SVE_SET_VLS this is not treated as an error by the kernel. I
> was concerned this would encourage userspace to silently miss this
> error.
Good point.
>
> What if KVM_ARM_SVE_SET_VLS() were to yield 0 if the exact requested set
> of VLs was configured, -ERANGE if some subset was configured successfully
> (but not exactly the set requested), and the usual -EINVAL/-EIO/whatever
> if the set of VLs couldn't be configured at all?
Sounds good to me.
>
> Then the probe would go like this:
>
> __u64 vqs[SVE_VQ_MAX / 64] = { [0 ... SVE_VQ_MAX / 64 - 1] = ~(u64)0 };
> if (ioctl(vcpu_fd, KVM_ARM_SVE_SET_VLS, vqs) && errno != ERANGE))
> goto error;
>
> ioctl(vcpu_fd, KVM_ARM_SVE_GET_VLS, vqs);
>
> /* ... */
>
> Another option would be for SVE_ARM_SVE_SET_VLS to write the resulting
> set back to its argument, possibly with the same 0/ERANGE/EINVAL semantics.
>
>
> Alternatively, userspace would be require to do a KVM_ARM_SVE_GET_VLS,
> and check the resulting set:
>
> /* ... */
>
> __u64 newvqs[SVE_VQ_MAX / 64];
> ioctl(vcpu_fd, KVM_ARM_SVE_GET_VLS, newvqs);
>
> if (memcmp(vqs, newvqs, sizeof vqs))
> goto mismatch;
>
> vcpu restore would need to treat any mismatch as an error:
> the exact requested set but be configurable, or the VM may go wrong.
I'm not sure I can parse this sentence or extract the meaning?
>
>
> Any opinion on which approach is best?
>
I think I prefer letting KVM_ARM_SVE_SET_VLS change the supplied vqs,
since having it be two separate ioctls always potentially leaves room
for some other thread having modified the set in the meantime (or making
a programmer doubt if this can be the case) where a single ioctl() will
look atomic.
The user can always call KVM_ARM_SVE_GET_VLS afterwards and should get
the same result.
I think it's worth trying to write this up as patches to the KVM
documentation and to the kernel and see what people say on that.
> I was fighting with this in December, trying to come up with a way
> for userspace to specify which VLs it requires/permits/forbids and
> letting the kernel come up with something matching these constraints.
> But this seemed too expressive and complex, so I had dropped the idea
> in favour of something simpler.
>
Thanks,
-Christoffer
More information about the linux-arm-kernel
mailing list