[RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
Chen, Jason CJ
jason.cj.chen at intel.com
Fri Feb 3 00:39:41 PST 2023
> -----Original Message-----
> From: Tian, Kevin <kevin.tian at intel.com>
> Sent: Friday, February 3, 2023 10:05 AM
> To: Jean-Philippe Brucker <jean-philippe at linaro.org>
> Cc: maz at kernel.org; catalin.marinas at arm.com; will at kernel.org;
> joro at 8bytes.org; robin.murphy at arm.com; james.morse at arm.com;
> suzuki.poulose at arm.com; oliver.upton at linux.dev; yuzenghui at huawei.com;
> smostafa at google.com; dbrazdil at google.com; ryan.roberts at arm.com; linux-
> arm-kernel at lists.infradead.org; kvmarm at lists.linux.dev;
> iommu at lists.linux.dev; Chen, Jason CJ <jason.cj.chen at intel.com>; Zhang,
> Tina <tina.zhang at intel.com>
> Subject: RE: [RFC PATCH 00/45] KVM: Arm SMMUv3 driver for pKVM
>
> > From: Jean-Philippe Brucker <jean-philippe at linaro.org>
> > Sent: Thursday, February 2, 2023 6:05 PM
> >
> > On Thu, Feb 02, 2023 at 07:07:55AM +0000, Tian, Kevin wrote:
> > > > From: Jean-Philippe Brucker <jean-philippe at linaro.org>
> > > > Sent: Wednesday, February 1, 2023 8:53 PM
> > > >
> > > > 3. Private I/O page tables
> > > >
> > > > A flexible alternative uses private page tables in the SMMU,
> > > > entirely disconnected from the CPU page tables. With this the SMMU
> > > > can
> > implement
> > > > a
> > > > reduced set of features, even shed a stage of translation. This
> > > > also provides a virtual I/O address space to the host, which
> > > > allows more efficient memory allocation for large buffers, and for
> > > > devices with limited addressing abilities.
> > > >
> > > > This is the solution implemented in this series. The host creates
> > > > IOVA->HPA mappings with two hypercalls map_pages() and
> > unmap_pages(),
> > > > and
> > > > the hypervisor populates the page tables. Page tables are
> > > > abstracted into IOMMU domains, which allow multiple devices to
> > > > share the same
> > address
> > > > space. Another four hypercalls, alloc_domain(), attach_dev(),
> > detach_dev()
> > > > and free_domain(), manage the domains.
> > > >
> > >
> > > Out of curiosity. Does virtio-iommu fit in this usage?
> >
> > I don't think so, because you still need a driver for the physical
> > IOMMU in the hypervisor. virtio-iommu would only replace the hypercall
> > interface with queues, and I don't think that buys us anything.
> >
> > Maybe virtio on the guest side could be advantageous, because that
> > interface has to be stable and virtio comes with stable APIs for
> > several classes of devices. But implementing virtio in pkvm means a
> > lot of extra code so it needs to be considered carefully.
> >
>
> this makes sense.
>
> > > If yes then there is
> > > no need to add specific enlightenment in existing iommu drivers. If
> > > no probably because as mentioned in the start a full-fledged iommu
> > > driver doesn't fit nVHE so lots of smmu driver logic has to be kept in the
> host?
> >
> > To minimize the attack surface of the hypervisor, we don't want to
> > load any superfluous code, so the hypervisor part of the SMMUv3 driver
> > only contains code to populate tables and send commands (which is
> > still too much for my taste but seems unavoidable to isolate host
> > DMA). Left in the host are things like ACPI/DT parser, interrupts,
> > possibly the event queue (which informs of DMA errors), extra features
> and complex optimizations.
> > The host also has to implement IOMMU ops to liaise between the DMA API
> > and the hypervisor.
> >
> > > anyway just want to check your thoughts on the possibility.
> > >
> > > btw some of my colleagues are porting pKVM to Intel platform. I
> > > believe they will post their work shortly and there might require
> > > some common framework in pKVM hypervisor like iommu domain,
> > > hypercalls, etc. like what we have in the host iommu subsystem. CC
> > > them in case of any early thought they want to throw in. 😊
> >
> > Cool! The hypervisor part contains iommu/iommu.c which deals with
> > hypercalls and domains and doesn't contain anything specific to Arm
> > (it's only in arch/arm64 because that's where pkvm currently sits). It
> > does rely on io-pgtable at the moment which is not used by VT-d but
> > that can be abstracted as well. It's possible however that on Intel an
> > entirely different set of hypercalls will be needed, if a simpler
> > solution such as sharing page tables fits better because VT-d
> > implementations are more homogeneous.
> >
>
> yes depending on the choice on VT-d there could be different degree of the
> sharing possibility. I'll let Jason/Tina comment on their design choice.
Thanks Kevin bring us here. Current our POC solution for VT-d is based on nested
translation, as there are two level io-pgtable, we keep first-level page table full
controlled by host VM (IOVA -> host_GPA) and second-level page table is managed
by pKVM (host_GPA -> HPA). This solution is simple straight-forward, but pKVM
still need to provide vIOMMU emulation for host (e.g., shadowing root/context/
pasid tables, emulating IOTLB flush etc.).
As I know, SMMU also support nested translation mode, may I know what's the
mode used for pKVM?
We met similar solution choices whether to share second-level io-pgtable with CPU
pgtable, and finally we also decided to introduce a new pgtable, this increase the
complexity of page state management - as io-pgtable & cpu-pgtable need to align
the page ownership.
Now our solution is based on vIOMMU emulation in pKVM, enlighten method should
also be an alternative solution.
Thanks
Jason CJ Chen
More information about the linux-arm-kernel
mailing list