[PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers
Michael Shavit
mshavit at google.com
Tue Jan 2 00:08:41 PST 2024
On Wed, Dec 27, 2023 at 11:46 PM Jason Gunthorpe <jgg at nvidia.com> wrote:
>
> On Tue, Dec 19, 2023 at 09:42:27PM +0800, Michael Shavit wrote:
>
> > +static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
> > + __le64 *cur, const __le64 *target,
> > + __le64 *staging_entry)
> > +{
> > + bool cleanup_sync_required = false;
> > + u8 entry_qwords_used_diff = 0;
> > + int i = 0;
> > +
> > + entry_qwords_used_diff =
> > + writer->ops.get_used_qword_diff_indexes(cur, target);
> > + if (WARN_ON_ONCE(entry_qwords_used_diff == 0))
> > + return;
>
> A no change update is actually API legal, eg we can set the same
> domain twice in a row. It should just do nothing.
>
> If the goal is to improve readability I'd split this into smaller
> functions and have the main function look like this:
>
> compute_used(..)
> if (hweight8(entry_qwords_used_diff) > 1) {
> set_v_0(..);
> set(qword_start=1,qword_end=N);
> set(qword_start=0,qword_end=1); // V=1
This branch is probably a bit more complicated than that. It's a bit more like:
if (hweight8(entry_qwords_used_diff) > 1) {
compute_staging_entry(...);
compute_used_diffs(...staging_entry...)
if (hweight(entry_qwords_used_diff) > 1) {
set_v_0();
set(qword_start=1,qword_end=N);
set(qword_start=0,qword_end=1); // V=1
} else {
set(qword_start=0, qword_end=N, staging_entry, entry)
critical = ffs(..);
set(qword_start=critical,qword_end=critical+1);
set(qword_start=0,qword_end=N);
}
}
> } else if (hweight8(entry_qwords_used_diff) == 1) {
> set_unused(..);
> critical = ffs(..);
> set(qword_start=critical,qword_end=critical+1);
> set(qword_start=0,qword_end=N);
And then this branch is the case where you can directly switch to the
entry without first setting unused bits.
> } else { // hweight8 == 0
> set(qword_start=0,qword_end=N);
> }
>
> Then the three different programming algorithms are entirely clear in
> code. Make the generic set() function skip the sync if nothing
> changed.
>
> > + if (hweight8(entry_qwords_used_diff) > 1) {
> > + /*
> > + * If transitioning to the target entry with a single qword
> > + * write isn't possible, then we must first transition to an
> > + * intermediate entry. The intermediate entry may either be an
> > + * entry that melds bits of the target entry into the current
> > + * entry without disrupting the hardware, or a breaking entry if
> > + * a hitless transition to the target is impossible.
> > + */
> > +
> > + /*
> > + * Compute a staging entry that has all the bits currently
> > + * unused by HW set to their target values, such that comitting
> > + * it to the entry table woudn't disrupt the hardware.
> > + */
> > + memcpy(staging_entry, cur, writer->entry_length);
> > + writer->ops.set_unused_bits(staging_entry, target);
> > +
> > + entry_qwords_used_diff =
> > + writer->ops.get_used_qword_diff_indexes(staging_entry,
> > + target);
>
> Put the number qwords directly in the ops struct and don't make this
> an op. Above will need N=number of qwords as well.
The reason I made get_used_qword_diff_indexes an op is because the
algorithm needs to compute the used_bits for entries (for the current
entry, the target entry as well as the melded-staging entry). This
requires a buffer to hold the output of the used_bits op however.
We're already passing such a buffer to arm_smmu_write_entry for the
staging_entry, and I wasn't a fan of adding a second one.
More information about the linux-arm-kernel
mailing list