[PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
Jason Gunthorpe
jgg at nvidia.com
Tue Jul 9 12:52:10 PDT 2024
On Tue, Jul 02, 2024 at 07:46:07PM +0100, Will Deacon wrote:
> On Mon, Jun 10, 2024 at 09:31:17PM -0300, Jason Gunthorpe wrote:
> > The top of the 2 level CD table is (at most) 1024 entries big, and two
> > high order allocations are required. One of __le64 which is programmed
> > into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
> > CPU pointer (16k).
> >
> > There are two copies of the l2ptr_dma, one is stored in the struct
> > arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
> > use. Instead of storing two copies just decode the value from the __le64.
> >
> > Signed-off-by: Jason Gunthorpe <jgg at nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++------------
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
> > 2 files changed, 17 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 6245e2558e6a6a..dd65e27aebafd4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1167,31 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
> > arm_smmu_cmdq_batch_submit(smmu, &cmds);
> > }
> >
> > -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> > - struct arm_smmu_l1_ctx_desc *l1_desc)
> > +static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
> > {
> > - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> > -
> > - l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
> > - &l1_desc->l2ptr_dma, GFP_KERNEL);
> > - if (!l1_desc->l2ptr) {
> > - dev_warn(smmu->dev,
> > - "failed to allocate context descriptor table\n");
> > - return -ENOMEM;
> > - }
> > - return 0;
> > -}
> > -
> > -static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> > - struct arm_smmu_l1_ctx_desc *l1_desc)
> > -{
> > - u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> > - CTXDESC_L1_DESC_V;
> > + u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
> >
> > /* The HW has 64 bit atomicity with stores to the L2 CD table */
> > WRITE_ONCE(*dst, cpu_to_le64(val));
> > }
> >
> > +static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
> > +{
> > + return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
> > +}
>
> I'm assuming this is supposed to be *src,
Uh, wow, surprised that compiles, yes.
> in which case this could be
> accessing non-cacheable memory if the SMMU isn't coherent. That's why we
> shadow everything.
That is a confusing statement. I know it is a DMA coherent allocation,
but the driver does not avoid reading for DMA coherent memory in all
cases, and DMA coherent allocations are well defined to be readable
anyhow.
For instance the driver has always read back from coherent allocations
when working with the STE or CD:
strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
GFP_KERNEL);
cfg->strtab.linear = strtab;
static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
__le64 *dst)
{
u64 val = le64_to_cpu(dst[0]);
^^^^ dst points into strtab above
That's the same thing being done here, with no shadowing, no issue.
I know no reason why the first level cd table would be special that it
needs shadowing? It is just wasting memory. Let's fix it.
Jason
More information about the linux-arm-kernel
mailing list