[PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers

Michael Shavit mshavit at google.com
Wed Feb 21 08:52:21 PST 2024


On Thu, Feb 22, 2024 at 12:19 AM Michael Shavit <mshavit at google.com> wrote:
>
> On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg at nvidia.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> >
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs,
> >
> > I don't think this will work well for CD's EPD0 case..
> >
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> >         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> >         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> >                 return;
> >         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> >
> >         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> >         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> >                 used_bits[0] &= ~cpu_to_le64(
> >                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> >                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> >                         CTXDESC_CD_0_TCR_SH0);
> >                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> >         }
> > }
> >
> > > and then we can
> > > simplify the algorithm while retaining the ability to reject updates
> > > to qwords which we're not expecting.
> >
> > It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> > shorter (not that it is complex anyhow) and other stuff gets worse.
>
> I think the simplification here is in the first if branch of
> arm_smmu_write_ste. With Will's proposal, we only perform a hitless
> update if there's a single used qword that needs updating. There's no
> longer a case where we first set unused bits in qwords whose other
> bits are in use. I'd argue that setting unused bits of a q word was
> very clever and removing the logic does conceptually simplify things,
> although yes it's not much fewer lines of code. I also don't think
> this throws away the entire logic of the current design, the idea of
> counting the number of qwords that differ and writing qwords that are
> unused first is still there.
>
> But, it does mean that hitless updates are only possible under
> narrower circumstances...We now have to figure out if there are
> transitions where this is problematic where we could previously assume
> that we'd always get the best behavior possible. Both in the present
> (i.e this SHCFG discussion and EPD0 case) and in the future if new
> parts of the configs start getting used. IMO not having to think about
> this is a meaningful advantage of the current solution.

To be more explicit, I hope we can keep the current solution. The
tests we added mitigates the extra complexity, while there's no
certainty that the 1-bit-per-qword proposal will always be
satisfactory in the future (nor have we even reached consensus that it
is satisfactory in the present with the part 2 CD series)



More information about the linux-arm-kernel mailing list