[PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers

Michael Shavit mshavit at google.com
Wed Feb 21 08:19:06 PST 2024


On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg at nvidia.com> wrote:
>
> On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
>
> > Very roughly, yes, although I'd go further and just return a bitmap of
> > used qwords instead of tracking these bits. Basically, we could have some
> > #defines saying which qwords are used by which configs,
>
> I don't think this will work well for CD's EPD0 case..
>
> static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> {
>         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
>         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
>                 return;
>         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
>
>         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
>         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
>                 used_bits[0] &= ~cpu_to_le64(
>                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
>                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
>                         CTXDESC_CD_0_TCR_SH0);
>                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
>         }
> }
>
> > and then we can
> > simplify the algorithm while retaining the ability to reject updates
> > to qwords which we're not expecting.
>
> It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> shorter (not that it is complex anyhow) and other stuff gets worse.

I think the simplification here is in the first if branch of
arm_smmu_write_ste. With Will's proposal, we only perform a hitless
update if there's a single used qword that needs updating. There's no
longer a case where we first set unused bits in qwords whose other
bits are in use. I'd argue that setting unused bits of a q word was
very clever and removing the logic does conceptually simplify things,
although yes it's not much fewer lines of code. I also don't think
this throws away the entire logic of the current design, the idea of
counting the number of qwords that differ and writing qwords that are
unused first is still there.

But, it does mean that hitless updates are only possible under
narrower circumstances...We now have to figure out if there are
transitions where this is problematic where we could previously assume
that we'd always get the best behavior possible. Both in the present
(i.e this SHCFG discussion and EPD0 case) and in the future if new
parts of the configs start getting used. IMO not having to think about
this is a meaningful advantage of the current solution.

> > > We'd have to really start doing really hacky things like remove the
> > > SHCFG as a used field entirely - but I think if you do that you break
> > > the entire logic of the design and also go backwards to having
> > > programming that only works if STEs are constructed in certain ways.
> >
> > I would actually like to remove SHCFG as a used field. If the encoding
> > was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> > be easy, but it shouldn't be too hard to work around that.
>

What do you mean by removing SHCFG as a used field? Are we changing
the driver so that it only ever sets SHCFG to a single possible value?
Or are we talking about fudging things and pretending it's not used
when it is and might have different values?



More information about the linux-arm-kernel mailing list