[PATCH v3 7/7] iommu/riscv: Paging domain support

Tomasz Jeznach tjeznach at rivosinc.com
Mon May 6 19:22:07 PDT 2024


On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg at ziepe.ca> wrote:
>
> On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > For detach I think yes:
> > >
> > >    Inv CPU                                   Detach CPU
> > >
> > >   write io_pte                               Update device descriptor
> > >   rcu_read_lock
> > >     list_for_each
> > >       <make invalidation command>            <make description inv cmd>
> > >       dma_wmb()                              dma_wmb()
> > >       <doorbell>                             <cmd doorbell>
> > >   rcu_read_unlock
> > >                                              list_del_rcu()
> > >                                              <wipe ASID>
> > >
> > > In this case I think we never miss an invalidation, the list_del is
> > > always after the HW has been fully fenced, so I don't think we can
> > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > re-used, but who cares.
> > >
> > > Attach is different..
> > >
> > >    Inv CPU                                   Attach CPU
> > >
> > >   write io_pte
> > >   rcu_read_lock
> > >     list_for_each // empty
> > >                                              list_add_rcu()
> > >                                              Update device descriptor
> > >                                              <make description inv cmd>
> > >                                              dma_wmb()
> > >                                              <cmd doorbell>
> > >   rcu_read_unlock
> > >
> > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > the old state. Effectively this is because there is no release/acquire
> > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > IOMMU.
> > >
> > > It seems like it should be solvable somehow:
> > >  1) Inv CPU releases all the io ptes
> > >  2) Attach CPU acquires the io ptes before updating the DDT
> > >  3) Inv CPU acquires the RCU list in such a way that either attach
> > >     CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > >  4) Either invalidation works or we release the new iopte to the SMMU
> > >     and don't need it.
> > >
> > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > job??
> > >
> >
> > Actual attach sequence is slightly different.
> >
> >  Inv CPU                            Attach CPU
> >
> >  write io_pte
> >   rcu_read_lock
> >     list_for_each // empty
> >                                     list_add_rcu()
> >                                     IOTLB.INVAL(PSCID)
> >                                     <make description inv cmd>
> >                                     dma_wmb()
> >                                     <cmd doorbell>
> >  rcu_read_unlock
> >
> > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > before the attached domain is visible to the device.
>
> That invalidation shouldn't do anything. If this is the first attach
> of a PSCID then the PSCID had better already be empty, it won't become
> non-empty until the DDT entry is installed.
>
> And if it is the second attach then the Inv CPU is already taking care
> of things, no need to invalidate at all.
>
> Regardless, there is still a theortical race that the IOPTEs haven't
> been made visible yet because there is still no synchronization with
> the CPU writing them.
>
> So, I don't think this solves any problem. I belive you need the
> appropriate kind of CPU barrier here instead of an invalidation.
>

Yes. There was a small, but still plausible race w/ IOPTEs visibility
to the IOMMU.
For v5 I'm adding two barriers to the inval/detach flow, I believe
should cover it.

1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
pending writes to PTEs visible to the IOMMU device. This should cover
the case when list_add_rcu() update is not yet visible in the
_iotlb_inval() sequence, for the first time the domain is attached to
the IOMMU.

  Inv CPU                                    Attach CPU
  write io_pte
  dma_wmb (1)
  rcu_read_lock
    list_for_each // empty                   list_add_rcu()
                                             smp_wmb (2)
                                             Update device descriptor
                                             <make description inv cmd>
                                             // PTEs are visible to the HW (*1)
                                             dma_wmb()
                                             <cmd doorbell>
  rcu_read_unlock

2) In riscv_iommu_bond_link() write memory barrier to ensure list
update is visible before IOMMU descriptor update. If stale data has
been fetched by the HW, inval CPU will run iotlb-invalidation
sequence. There is a possibility that IOMMU will fetch correct PTEs
and will receive unnecessary IOTLB inval, but I don't think anyone
would care.

  Inv CPU                                    Attach CPU
  write io_pte                               list_add_rcu()
                                             smp_wmb (2)
                                             Update device descriptor
                                             <make description inv cmd>
                                             // HW might fetch stale PTEs
                                             dma_wmb()
                                             <cmd doorbell>
  dma_wmb (1)
  rcu_read_lock
    list_for_each // non-empty (*2)
      <make iotlb inval cmd>
      dma_wmb()
      <cmd doorbell>
  rcu_read_unlock

3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
on the last domain unlink from the IOMMU.

Thank you for pointing this out. Let me know if that makes sense.

Best,
- Tomasz

> Jason



More information about the linux-riscv mailing list