[PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers

Jason Gunthorpe jgg at nvidia.com
Wed Oct 18 06:04:55 PDT 2023


On Wed, Oct 18, 2023 at 07:05:49PM +0800, Michael Shavit wrote:

> > FWIW, I found the most difficult part the used bit calculation, not
> > the update algorithm. Difficult because it is hard to read and find in
> > the spec when things are INGORED, but it is a "straightforward" job of
> > finding INGORED cases and making the used bits 0.
> 
> The update algorithm is the part I'm finding much harder to read and
> review :) . arm_smmu_write_entry_step in particular is hard to read
> through; on top of which there's some subtle dependencies between loop
> iterations that weren't obvious to grok:

Yes, you have it right, it is basically a classic greedy
algorithm. Let's improve the comment.

> * Relying on the used_bits to be recomputed after the first iteration
> where V=0 was set to 0 so that more bits can now be set.
> * The STE having to be synced between iterations to prevent broken STE
> reads by the SMMU (there's a comment somewhere else in arm-smmu-v3.c
> that would fit nicely here instead). But the caller is responsible for
> calling this between iterations for some reason (supposedly to support
> CD entries as well in the next series)

Yes, for CD entry support.

How about:

/*
 * This algorithm updates any STE/CD to any value without creating a situation
 * where the HW can percieve a corrupted entry. HW is only required to have a 64
 * bit atomicity with stores from the CPU, while entires are many 64 bit values
 * big.
 *
 * The algorithm works by evolving the entry toward the target in a series of
 * steps. Each step synchronizes with the HW so that the HW can not see an entry
 * torn across two steps. Upon each call cur/cur_used reflect the current
 * synchronized value seen by the HW.
 *
 * During each step the HW can observe a torn entry that has any combination of
 * the step's old/new 64 bit words. The algorithm objective is for the HW
 * behavior to always be one of current behavior, V=0, or new behavior, during
 * each step, and across all steps.
 *
 * At each step one of three actions is choosen to evolve cur to target:
 *  - Update all unused bits with their target values.
 *    This relies on the IGNORED behavior described in the specification
 *  - Update a single 64-bit value
 *  - Update all unused bits and set V=0
 *
 * The last two actions will cause cur_used to change, which will then allow the
 * first action on the next step.
 *
 * In the most general case we can make any update in three steps:
 *  - Disrupting the entry (V=0)
 *  - Fill now unused bits, all bits except V
 *  - Make valid (V=1), single 64 bit store
 *
 * However this disrupts the HW while it is happening. There are several
 * interesting cases where a STE/CD can be updated without disturbing the HW
 * because only a small number of bits are changing (S1DSS, CONFIG, etc) or
 * because the used bits don't intersect. We can detect this by calculating how
 * many 64 bit values need update after adjusting the unused bits and skip the
 * V=0 process.
 */
static bool arm_smmu_write_entry_step(__le64 *cur, const __le64 *cur_used,

Jason



More information about the linux-arm-kernel mailing list