[PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers

Tue Jan 2 00:08:41 PST 2024

On Wed, Dec 27, 2023 at 11:46 PM Jason Gunthorpe <jgg at nvidia.com> wrote:
>
> On Tue, Dec 19, 2023 at 09:42:27PM +0800, Michael Shavit wrote:
>
> > +static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
> > +                                __le64 *cur, const __le64 *target,
> > +                                __le64 *staging_entry)
> > +{
> > +       bool cleanup_sync_required = false;
> > +       u8 entry_qwords_used_diff = 0;
> > +       int i = 0;
> > +
> > +       entry_qwords_used_diff =
> > +               writer->ops.get_used_qword_diff_indexes(cur, target);
> > +       if (WARN_ON_ONCE(entry_qwords_used_diff == 0))
> > +               return;
>
> A no change update is actually API legal, eg we can set the same
> domain twice in a row. It should just do nothing.
>
> If the goal is to improve readability I'd split this into smaller
> functions and have the main function look like this:
>
>        compute_used(..)
>        if (hweight8(entry_qwords_used_diff) > 1) {
>              set_v_0(..);
>              set(qword_start=1,qword_end=N);
>              set(qword_start=0,qword_end=1); // V=1

This branch is probably a bit more complicated than that. It's a bit more like:
       if (hweight8(entry_qwords_used_diff) > 1) {
             compute_staging_entry(...);
             compute_used_diffs(...staging_entry...)
             if (hweight(entry_qwords_used_diff) > 1) {
                 set_v_0();
                 set(qword_start=1,qword_end=N);
                 set(qword_start=0,qword_end=1); // V=1
             } else {
                 set(qword_start=0, qword_end=N, staging_entry, entry)
                 critical = ffs(..);
                 set(qword_start=critical,qword_end=critical+1);
                 set(qword_start=0,qword_end=N);
             }
      }

>        } else if (hweight8(entry_qwords_used_diff) == 1) {
>              set_unused(..);
>              critical = ffs(..);
>              set(qword_start=critical,qword_end=critical+1);
>              set(qword_start=0,qword_end=N);

And then this branch is the case where you can directly switch to the
entry without first setting unused bits.

>        } else { // hweight8 == 0
>              set(qword_start=0,qword_end=N);
>        }
>
> Then the three different programming algorithms are entirely clear in
> code. Make the generic set() function skip the sync if nothing
> changed.
>
> > +       if (hweight8(entry_qwords_used_diff) > 1) {
> > +               /*
> > +                * If transitioning to the target entry with a single qword
> > +                * write isn't possible, then we must first transition to an
> > +                * intermediate entry. The intermediate entry may either be an
> > +                * entry that melds bits of the target entry into the current
> > +                * entry without disrupting the hardware, or a breaking entry if
> > +                * a hitless transition to the target is impossible.
> > +                */
> > +
> > +               /*
> > +                * Compute a staging entry that has all the bits currently
> > +                * unused by HW set to their target values, such that comitting
> > +                * it to the entry table woudn't disrupt the hardware.
> > +                */
> > +               memcpy(staging_entry, cur, writer->entry_length);
> > +               writer->ops.set_unused_bits(staging_entry, target);
> > +
> > +               entry_qwords_used_diff =
> > +                       writer->ops.get_used_qword_diff_indexes(staging_entry,
> > +                                                               target);
>
> Put the number qwords directly in the ops struct and don't make this
> an op.  Above will need N=number of qwords as well.

The reason I made get_used_qword_diff_indexes an op is because the
algorithm needs to compute the used_bits for entries (for the current
entry, the target entry as well as the melded-staging entry). This
requires a buffer to hold the output of the used_bits op however.
We're already passing such a buffer to arm_smmu_write_entry for the
staging_entry, and I wasn't a fan of adding a second one.