[PATCH v3 7/7] iommu/riscv: Paging domain support

Tomasz Jeznach tjeznach at rivosinc.com
Fri May 3 10:44:14 PDT 2024


On Wed, May 1, 2024 at 7:56 AM Jason Gunthorpe <jgg at ziepe.ca> wrote:
>
> On Tue, Apr 30, 2024 at 01:01:57PM -0700, Tomasz Jeznach wrote:
>
> > +#define iommu_domain_to_riscv(iommu_domain) \
> > +     container_of(iommu_domain, struct riscv_iommu_domain, domain)
> > +
> > +#define dev_to_domain(dev) \
> > +     iommu_domain_to_riscv(dev_iommu_priv_get(dev))
>
> Please use the priv properly and put a struct around it, you'll
> certainly need this eventually to do the rest of the advanced
> features.
>

Done. Yes, indeed, I was going to introduce proper struct in follow up
patches anyway. Pulled this change sooner.

> > +static void riscv_iommu_bond_unlink(struct riscv_iommu_domain *domain, struct device *dev)
> > +{
> > +     struct riscv_iommu_bond *bond, *found = NULL;
> > +     unsigned long flags;
> > +
> > +     if (!domain)
> > +             return;
> > +
> > +     spin_lock_irqsave(&domain->lock, flags);
>
> This is never locked from an irq, you don't need to use the irqsave
> variations.
>

Good point. done in v4.

> > +     list_for_each_entry_rcu(bond, &domain->bonds, list) {
> > +             if (bond->dev == dev) {
> > +                     list_del_rcu(&bond->list);
> > +                     found = bond;
> > +             }
> > +     }
> > +     spin_unlock_irqrestore(&domain->lock, flags);
> > +
> > +     /* Release and wait for all read-rcu critical sections have completed. */
> > +     kfree_rcu(found, rcu);
> > +     synchronize_rcu();
>
> Please no, synchronize_rcu() on a path like this is not so
> reasonable.. Also you don't need kfree_rcu() if you write it like this.
>
> This still looks better to do what I said before, put the iommu not
> the dev in the bond struct.
>
>

I was trying not to duplicate data in bond struct and use whatever is
available to be referenced from dev pointer (eg iommu / ids / private
iommu dev data). If I'm reading core iommu code correctly, device
pointer and iommu pointers should be valid between _probe_device and
_release_device calls. I've moved synchronize_rcu out of the domain
attach path to _release_device, LMK if that would be ok for now.
I'll have a second another to rework other patches to avoid storing
dev pointers at all.


> > +static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
> > +{
> > +     struct riscv_iommu_domain *domain;
> > +     struct riscv_iommu_device *iommu;
> > +
> > +     iommu = dev ? dev_to_iommu(dev) : NULL;
> > +     domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> > +     if (!domain)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     INIT_LIST_HEAD_RCU(&domain->bonds);
> > +     spin_lock_init(&domain->lock);
> > +     domain->numa_node = NUMA_NO_NODE;
> > +
> > +     /*
> > +      * Follow system address translation mode.
> > +      * RISC-V IOMMU ATP mode values match RISC-V CPU SATP mode values.
> > +      */
> > +     domain->pgd_mode = satp_mode >> SATP_MODE_SHIFT;
>
> This seems really strange, the iommu paging domains should be
> unrelated to what the CPU is doing. There is no connection between
> these two concepts.
>
> Just pick a size that the iommu supports.
>
> The number of radix levels is a tunable alot of iommus have that we
> haven't really exposed to anything else yet.
>

Makes sense. I've left an option to pick mode from MMU for cases where
dev/iommu is not known at allocation time (with iommu_domain_alloc()).
I'd guess it's reasonable to assume IOMMU supported page modes will
match MMU.

> > +     /*
> > +      * Note: RISC-V Privilege spec mandates that virtual addresses
> > +      * need to be sign-extended, so if (VA_BITS - 1) is set, all
> > +      * bits >= VA_BITS need to also be set or else we'll get a
> > +      * page fault. However the code that creates the mappings
> > +      * above us (e.g. iommu_dma_alloc_iova()) won't do that for us
> > +      * for now, so we'll end up with invalid virtual addresses
> > +      * to map. As a workaround until we get this sorted out
> > +      * limit the available virtual addresses to VA_BITS - 1.
> > +      */
> > +     domain->domain.geometry.aperture_start = 0;
> > +     domain->domain.geometry.aperture_end = DMA_BIT_MASK(VA_BITS - 1);
> > +     domain->domain.geometry.force_aperture = true;
>
> Yikes.. This is probably the best solution long term anyhow, unless
> you need to use the last page in VFIO for some reason.
>
> >  static int riscv_iommu_device_domain_type(struct device *dev)
> >  {
> > -     return IOMMU_DOMAIN_IDENTITY;
> > +     struct riscv_iommu_device *iommu = dev_to_iommu(dev);
> > +
> > +     if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE)
> > +             return IOMMU_DOMAIN_IDENTITY;
> > +
>
> Is there even a point of binding an iommu driver if the HW can't
> support a DDT table? Just return -ENODEV from probe_device?
>
> Logically a iommu block that can't decode the RID has no association
> at all with a Linux struct device :)
>

Done. Good point ;)

Thanks for review,
- Tomasz


> Jason



More information about the linux-riscv mailing list