[PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user

Mon Mar 20 11:00:58 PDT 2023

On Mon, Mar 20, 2023 at 09:12:06AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> > On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg at nvidia.com>
> > > > Sent: Saturday, March 11, 2023 12:20 AM
> > > > 
> > > > What I'm broadly thinking is if we have to make the infrastructure for
> > > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > > have the kernel SW path use the same infrastructure just with a CPU
> > > > wake up instead of a MMIO poke.
> > > > 
> > > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > > > support.
> > > > 
> > > 
> > > I thought about this in VT-d context. Looks there are some difficulties.
> > > 
> > > The most prominent one is that head/tail of the VT-d invalidation queue
> > > are in MMIO registers. Handling it in kernel iommu driver suggests
> > > reading virtual tail register and updating virtual head register. Kind of 
> > > moving some vIOMMU awareness into the kernel which, iirc, is not
> > > a welcomed model.
> > 
> > qemu would trap the MMIO and generate an IOCTL with the written head
> > pointer. It isn't as efficient as having the kernel do the trap, but
> > does give batching.
> 
> Rephrasing that to put into a design: the IOCTL would pass a
> user pointer to the queue, the size of the queue, then a head
> pointer and a tail pointer? Then the kernel reads out all the
> commands between the head and the tail and handles all those
> invalidation commands only?

Yes, that is one possible design

Jason