[PATCH v3 7/7] iommu/riscv: Paging domain support
Jason Gunthorpe
jgg at ziepe.ca
Tue May 7 09:51:56 PDT 2024
On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote:
> On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg at ziepe.ca> wrote:
> >
> > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > > For detach I think yes:
> > > >
> > > > Inv CPU Detach CPU
> > > >
> > > > write io_pte Update device descriptor
> > > > rcu_read_lock
> > > > list_for_each
> > > > <make invalidation command> <make description inv cmd>
> > > > dma_wmb() dma_wmb()
> > > > <doorbell> <cmd doorbell>
> > > > rcu_read_unlock
> > > > list_del_rcu()
> > > > <wipe ASID>
> > > >
> > > > In this case I think we never miss an invalidation, the list_del is
> > > > always after the HW has been fully fenced, so I don't think we can
> > > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > > re-used, but who cares.
> > > >
> > > > Attach is different..
> > > >
> > > > Inv CPU Attach CPU
> > > >
> > > > write io_pte
> > > > rcu_read_lock
> > > > list_for_each // empty
> > > > list_add_rcu()
> > > > Update device descriptor
> > > > <make description inv cmd>
> > > > dma_wmb()
> > > > <cmd doorbell>
> > > > rcu_read_unlock
> > > >
> > > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > > the old state. Effectively this is because there is no release/acquire
> > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > > IOMMU.
> > > >
> > > > It seems like it should be solvable somehow:
> > > > 1) Inv CPU releases all the io ptes
> > > > 2) Attach CPU acquires the io ptes before updating the DDT
> > > > 3) Inv CPU acquires the RCU list in such a way that either attach
> > > > CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > > 4) Either invalidation works or we release the new iopte to the SMMU
> > > > and don't need it.
> > > >
> > > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > > job??
> > > >
> > >
> > > Actual attach sequence is slightly different.
> > >
> > > Inv CPU Attach CPU
> > >
> > > write io_pte
> > > rcu_read_lock
> > > list_for_each // empty
> > > list_add_rcu()
> > > IOTLB.INVAL(PSCID)
> > > <make description inv cmd>
> > > dma_wmb()
> > > <cmd doorbell>
> > > rcu_read_unlock
> > >
> > > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > > before the attached domain is visible to the device.
> >
> > That invalidation shouldn't do anything. If this is the first attach
> > of a PSCID then the PSCID had better already be empty, it won't become
> > non-empty until the DDT entry is installed.
> >
> > And if it is the second attach then the Inv CPU is already taking care
> > of things, no need to invalidate at all.
> >
> > Regardless, there is still a theortical race that the IOPTEs haven't
> > been made visible yet because there is still no synchronization with
> > the CPU writing them.
> >
> > So, I don't think this solves any problem. I belive you need the
> > appropriate kind of CPU barrier here instead of an invalidation.
> >
>
> Yes. There was a small, but still plausible race w/ IOPTEs visibility
> to the IOMMU.
> For v5 I'm adding two barriers to the inval/detach flow, I believe
> should cover it.
>
> 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
> pending writes to PTEs visible to the IOMMU device. This should cover
> the case when list_add_rcu() update is not yet visible in the
> _iotlb_inval() sequence, for the first time the domain is attached to
> the IOMMU.
>
> Inv CPU Attach CPU
> write io_pte
> dma_wmb (1)
> rcu_read_lock
> list_for_each // empty list_add_rcu()
> smp_wmb (2)
> Update device descriptor
> <make description inv cmd>
> // PTEs are visible to the HW (*1)
> dma_wmb()
> <cmd doorbell>
> rcu_read_unlock
>
> 2) In riscv_iommu_bond_link() write memory barrier to ensure list
> update is visible before IOMMU descriptor update. If stale data has
> been fetched by the HW, inval CPU will run iotlb-invalidation
> sequence. There is a possibility that IOMMU will fetch correct PTEs
> and will receive unnecessary IOTLB inval, but I don't think anyone
> would care.
>
> Inv CPU Attach CPU
> write io_pte list_add_rcu()
> smp_wmb (2)
> Update device descriptor
> <make description inv cmd>
> // HW might fetch stale PTEs
> dma_wmb()
> <cmd doorbell>
> dma_wmb (1)
> rcu_read_lock
> list_for_each // non-empty (*2)
> <make iotlb inval cmd>
> dma_wmb()
> <cmd doorbell>
> rcu_read_unlock
>
> 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
> on the last domain unlink from the IOMMU.
>
> Thank you for pointing this out. Let me know if that makes sense.
I'm not an expert in barriers, but I think you need the more expensive
"mb" in both cases.
The inv side is both releasing the write and acquiring the list
read. IIRC READ_ONCE is not a full acquire?
The Attach side is both releasing the list_add_rcu() and acquiring the
iopte.
rcu is still a benefit, there is no cache line sharing and there is
only one full barrier, not two, like a spinlock.
And a big fat comment in both sides explaining this :)
Jason
More information about the linux-riscv
mailing list