[PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
Tian, Kevin
kevin.tian at intel.com
Tue Mar 21 01:34:00 PDT 2023
> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Tuesday, March 21, 2023 2:01 AM
>
> On Mon, Mar 20, 2023 at 09:12:06AM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg at nvidia.com>
> > > > > Sent: Saturday, March 11, 2023 12:20 AM
> > > > >
> > > > > What I'm broadly thinking is if we have to make the infrastructure for
> > > > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > > > have the kernel SW path use the same infrastructure just with a CPU
> > > > > wake up instead of a MMIO poke.
> > > > >
> > > > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases
> without HW
> > > > > support.
> > > > >
> > > >
> > > > I thought about this in VT-d context. Looks there are some difficulties.
> > > >
> > > > The most prominent one is that head/tail of the VT-d invalidation
> queue
> > > > are in MMIO registers. Handling it in kernel iommu driver suggests
> > > > reading virtual tail register and updating virtual head register. Kind of
> > > > moving some vIOMMU awareness into the kernel which, iirc, is not
> > > > a welcomed model.
> > >
> > > qemu would trap the MMIO and generate an IOCTL with the written head
> > > pointer. It isn't as efficient as having the kernel do the trap, but
> > > does give batching.
> >
> > Rephrasing that to put into a design: the IOCTL would pass a
> > user pointer to the queue, the size of the queue, then a head
> > pointer and a tail pointer? Then the kernel reads out all the
> > commands between the head and the tail and handles all those
> > invalidation commands only?
>
> Yes, that is one possible design
>
If we cannot have the short path in the kernel then I'm not sure the
value of using native format and queue in the uAPI. Batching can
be enabled over any format.
Btw probably a dumb question. The current invalidation IOCTL is
per hwpt. If picking a native format does it suggest making the IOCTL
per iommufd given native format is per IOMMU and could carry
scope bigger than a hwpt.
More information about the linux-arm-kernel
mailing list