[PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user

Thu Mar 9 19:51:21 PST 2023

On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-09 10:53, Nicolin Chen wrote:
> > Add arm_smmu_cache_invalidate_user() function for user space to invalidate
> > TLB entries and Context Descriptors, since either an IO page table entrie
> > or a Context Descriptor in the user space is still cached by the hardware.
> > 
> > The input user_data is defined in "struct iommu_hwpt_invalidate_arm_smmuv3"
> > that contains the essential data for corresponding invalidation commands.
> > 
> > Co-developed-by: Eric Auger <eric.auger at redhat.com>
> > Signed-off-by: Eric Auger <eric.auger at redhat.com>
> > Signed-off-by: Nicolin Chen <nicolinc at nvidia.com>
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 +++++++++++++++++++++
> >   1 file changed, 56 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index ac63185ae268..7d73eab5e7f4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2880,9 +2880,65 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> >       arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
> >   }
> > 
> > +static void arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
> > +                                        void *user_data)
> > +{
> > +     struct iommu_hwpt_invalidate_arm_smmuv3 *inv_info = user_data;
> > +     struct arm_smmu_cmdq_ent cmd = { .opcode = inv_info->opcode };
> > +     struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +     size_t granule_size = inv_info->granule_size;
> > +     unsigned long iova = 0;
> > +     size_t size = 0;
> > +     int ssid = 0;
> > +
> > +     if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
> > +             return;
> > +
> > +     switch (inv_info->opcode) {
> > +     case CMDQ_OP_CFGI_CD:
> > +     case CMDQ_OP_CFGI_CD_ALL:
> > +             return arm_smmu_sync_cd(smmu_domain, inv_info->ssid, true);
> 
> Since we let the guest choose its own S1Fmt (and S1CDMax, yet not
> S1DSS?), how can we assume leaf = true here?

The s1dss is forwarded in the user_data structure too. So, the
driver should have set that too down to a nested STE. Will add
this missing pathway.

And you are right that the guest OS can use a 2-level table, so
we should set leaf = false to cover all cases, I think.

> > +     case CMDQ_OP_TLBI_NH_VA:
> > +             cmd.tlbi.asid = inv_info->asid;
> > +             fallthrough;
> > +     case CMDQ_OP_TLBI_NH_VAA:
> > +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
> 
> Non-range invalidations with TG=0 are perfectly legal, and should not be
> ignored.

I assume that you are talking about the pgsize_bitmap check.

QEMU embeds a !tg case into the granule_size [1]. So it might
not be straightforward to cover that case. Let me see how to
untangle different cases and handle them accordingly.

[1] https://patchew.org/QEMU/20200824094811.15439-1-peter.maydell@linaro.org/20200824094811.15439-9-peter.maydell@linaro.org/

> > +                 granule_size & ~(1ULL << __ffs(granule_size)))
> 
> If that's intended to mean is_power_of_2(), please just use is_power_of_2().
> 
> > +                     return;
> > +
> > +             iova = inv_info->range.start;
> > +             size = inv_info->range.last - inv_info->range.start + 1;
> 
> If the design here is that user_data is so deeply driver-specific and
> special to the point that it can't possibly be passed as a type-checked
> union of the known and publicly-visible UAPI types that it is, wouldn't
> it make sense to just encode the whole thing in the expected format and
> not have to make these kinds of niggling little conversions at both ends?

Hmm, that makes sense to me.

I just tracked back the history of Eric's previous work. There
was a mismatch between guest and host that RIL isn't supported
by the hardware. Now, guest can have whatever information it'd
need from the host to send supported instructions.

> > +             if (!size)
> > +                     return;
> > +
> > +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> > +             cmd.tlbi.leaf = inv_info->flags & IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF;
> > +             __arm_smmu_tlb_inv_range(&cmd, iova, size, granule_size, smmu_domain);
> > +             break;
> > +     case CMDQ_OP_TLBI_NH_ASID:
> > +             cmd.tlbi.asid = inv_info->asid;
> > +             fallthrough;
> > +     case CMDQ_OP_TLBI_NH_ALL:
> > +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> > +             arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
> > +             break;
> > +     case CMDQ_OP_ATC_INV:
> > +             ssid = inv_info->ssid;
> > +             iova = inv_info->range.start;
> > +             size = inv_info->range.last - inv_info->range.start + 1;
> > +             break;
> 
> Can we do any better than multiplying every single ATC_INV command, even
> for random bogus StreamIDs, into multiple commands across every physical
> device? In fact, I'm not entirely confident this isn't problematic, if
> the guest wishes to send invalidations for one device specifically while
> it's put some other device into a state where sending it a command would
> do something bad. At the very least, it's liable to be confusing if the
> guest sends a command for one StreamID but gets an error back for a
> different one.

We'd need here an sid translation from the guest value to the
host value to specify a device, so as not to multiply the cmd
with the device list, if I understand it correctly?

> And if we expect ATS, what about PRI? Per patch #4 you're currently
> offering that to the guest as well.

Oh, I should have probably blocked PRI. The PRI and the fault
injection will be followed after the basic 2-stage translation
patches. And I don't have a supporting hardware to test PRI.

> 
> > +     default:
> > +             return;
> 
> What about NSNH_ALL? That still needs to invalidate all the S1 context
> that the guest *thinks* it's invalidating.

NSNH_ALL is translated to NH_ALL at the guest level. But maybe
it should have been done here instead.

Thanks
Nic