[PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers

Jason Gunthorpe jgg at nvidia.com
Tue Jan 2 06:48:45 PST 2024


On Tue, Jan 02, 2024 at 04:13:28PM +0800, Michael Shavit wrote:
> On Tue, Dec 19, 2023 at 9:42 PM Michael Shavit <mshavit at google.com> wrote:
> ...
> > +       if (hweight8(entry_qwords_used_diff) > 1) {
> > +               /*
> > +                * If transitioning to the target entry with a single qword
> > +                * write isn't possible, then we must first transition to an
> > +                * intermediate entry. The intermediate entry may either be an
> > +                * entry that melds bits of the target entry into the current
> > +                * entry without disrupting the hardware, or a breaking entry if
> > +                * a hitless transition to the target is impossible.
> > +                */
> > +
> > +               /*
> > +                * Compute a staging entry that has all the bits currently
> > +                * unused by HW set to their target values, such that comitting
> > +                * it to the entry table woudn't disrupt the hardware.
> > +                */
> > +               memcpy(staging_entry, cur, writer->entry_length);
> > +               writer->ops.set_unused_bits(staging_entry, target);
> > +
> > +               entry_qwords_used_diff =
> > +                       writer->ops.get_used_qword_diff_indexes(staging_entry,
> > +                                                               target);
> > +               if (hweight8(entry_qwords_used_diff) > 1) {
> > +                       /*
> > +                        * More than 1 qword is mismatched between the staging
> > +                        * and target entry. A hitless transition to the target
> > +                        * entry is not possible. Set the staging entry to be
> > +                        * equal to the target entry, apart from the V bit's
> > +                        * qword. As long as the V bit is cleared first then
> > +                        * writes to the subsequent qwords will not further
> > +                        * disrupt the hardware.
> > +                        */
> > +                       memcpy(staging_entry, target, writer->entry_length);
> > +                       staging_entry[0] &= ~writer->v_bit;
> > +                       /*
> > +                        * After comitting the staging entry, only the 0th qword
> > +                        * will differ from the target.
> > +                        */
> > +                       entry_qwords_used_diff = 1;
> > +               }
> > +
> > +               /*
> > +                * Commit the staging entry. Note that the iteration order
> > +                * matters, as we may be comitting a breaking entry in the
> > +                * non-hitless case. The 0th qword which holds the valid bit
> > +                * must be written first in that case.
> > +                */
> > +               for (i = 0; i != writer->entry_length; i++)
> > +                       WRITE_ONCE(cur[i], staging_entry[i]);
> > +               writer->ops.sync_entry(writer);
> 
> Realized while replying to your latest email that this is wrong (and
> the unit-test as well!). It's not enough to just write the 0th qword
> first if it's a breaking entry, it must also sync after that 0th qword
> write.

Right.

Jason



More information about the linux-arm-kernel mailing list