[PATCH] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

Rob Herring robh at kernel.org
Thu Jan 16 08:57:52 PST 2020


On Wed, Jan 15, 2020 at 10:33 AM Auger Eric <eric.auger at redhat.com> wrote:
>
> Hi Rob,
>
> On 1/15/20 3:02 PM, Rob Herring wrote:
> > On Wed, Jan 15, 2020 at 3:21 AM Auger Eric <eric.auger at redhat.com> wrote:
> >>
> >> Hi Rob,
> >>
> >> On 1/13/20 3:39 PM, Rob Herring wrote:
> >>> Arm SMMUv3.2 adds support for TLB range invalidate operations.
> >>> Support for range invalidate is determined by the RIL bit in the IDR3
> >>> register.
> >>>
> >>> The range invalidate is in units of the leaf page size and operates on
> >>> 1-32 chunks of a power of 2 multiple pages. First we determine from the
> >>> size what power of 2 multiple we can use and then adjust the granule to
> >>> 32x that size.

> >>> @@ -2022,12 +2043,39 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> >>>               cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
> >>>       }
> >>>
> >>> +     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
> >>> +             unsigned long tg, scale;
> >>> +
> >>> +             /* Get the leaf page size */
> >>> +             tg = __ffs(smmu_domain->domain.pgsize_bitmap);
> >> it is unclear to me why you can't set tg with the granule parameter.
> >
> > granule could be 2MB sections if THP is enabled, right?
>
> Ah OK I thought it was a page size and not a block size.
>
> I requested this feature a long time ago for virtual SMMUv3. With
> DPDK/VFIO the guest was sending page TLB invalidation for each page
> (granule=4K or 64K) part of the hugepage buffer and those were trapped
> by the VMM. This stalled qemu.

I did some more testing to make sure THP is enabled, but haven't been
able to get granule to be anything but 4K. I only have the Fast Model
with AHCI on PCI to test this with. Maybe I'm hitting some place where
THPs aren't supported yet.

> >>> +             /* Determine the power of 2 multiple number of pages */
> >>> +             scale = __ffs(size / (1UL << tg));
> >>> +             cmd.tlbi.scale = scale;
> >>> +
> >>> +             cmd.tlbi.num = CMDQ_TLBI_RANGE_NUM_MAX - 1;
> >> Also could you explain why you use CMDQ_TLBI_RANGE_NUM_MAX.
> >
> > How's this:
> > /* The invalidation loop defaults to the maximum range */
> I would have expected num=0 directly. Don't we invalidate the &size in
> one shot as 2^scale * pages of granularity @tg? I fail to understand
> when NUM > 0.

NUM is > 0 anytime size is not a power of 2. For example, if size is
33 pages, then it takes 2 loops doing 32 pages and then 1 page. If
size is 34 pages, then NUM is (17-1) and SCALE is 1.

Rob



More information about the linux-arm-kernel mailing list