[PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers

Fri Feb 23 07:18:41 PST 2024

On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > 
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs, 
> > 
> > I don't think this will work well for CD's EPD0 case..
> > 
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > 		return;
> > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > 
> > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > 		used_bits[0] &= ~cpu_to_le64(
> > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > 			CTXDESC_CD_0_TCR_SH0);
> > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > 	}
> > }
> 
> Please can you explain more about the issue here? I know what EPDx are,
> but I'm not understanding why they're problematic. This presumably
> involves a hitless transition to/from an aborting CD?

When a process using SVA exits uncleanly the MM is released so the
SMMU HW must stop chasing the page table pointers since all that
memory will be freed.

However, in an unclean exit we can't control the order of shutdown so
something like uacce or RDMA may not have quieted the DMA device yet.

So there is a period during shutdown where the mm has been released
and the device is doing DMA, the desire is that the DMA continue to be
handled as a PRI and the SW will return failure for all PRI requests.

Specifically we do not want to trigger any dmesg log events during
this condition.

Jean-Philippe came up with this solution where we hitlessly use EPD0
in release to allow the mm to release the page table while continuing
to use the PRI flow.

So it is going from a "SVA domain with a page table" to a "SVA domain
without a page table but EPD0 set", hitlessly.

> I'm just trying to avoid introducing dynamic behaviours to the driver
> which aren't actually used, and per-qword tracking feels like an easier
> way to maintain the hitless updates for the cases you care about. It's
> really not about throwing away the entire logic of the design -- as I
> said, I think this is looking pretty good. I'm also absolutely open to
> being convinced that per-field makes more sense and per-qword is terrible,
> so I'd really like to understand the E0PD case more.

It is not more sense/terrible, it is more that we have to make some
trade offs. I outlined what I think would be needed to make per-qw
work in the other email:

 - get_used becomes harder to explain but shorter (we ignore the used
   qw 1 for bypass/abort)
 - arm_smmu_entry_qword_diff becomes a bit simpler, less bitwise logic,
   no unused_update
 - arm_smmu_write_entry() has the same logic but unused_update is
   replaced by target
 - We have to hack something to make SHCFG=1 - change the make
   functions or have arm_smmu_write_ste() force SHCFG=1.
 - We have to write a seperate programming logic for CD -
   always do V=0/1 for normal updates, and a special EPD0 flow.

I think it is worse over all because none of those trade offs really
make the code clearer, and I dislike the idea of open coding
CD. Especially now that we have a test suite that requires the ops
anyhow.

It is a minor decision, trust Michael and I make this choice, we both
agree now and have spent alot of time studying this.

> As an aside: is this per-field/per-qword discussion the only thing holding
> up a v6?

As far as I know, yes. I have not typed in every feedback yet, but I
hope to get that done today. I will try to post it by Monday so we can
see what it looks like with Robin's suggestion but without per-qw.

> With the rest of the feedback addressed and a version of Michael's
> selftest that exercises stage-2 translating domains, I'd like to
> think we could get it queued up soon.

I would really like this, we have so many more patches to work on, you
probably saw the HTTU stuff was re posted again, we have a clean full
BTM enablement now on the list for the first time, nesting patches,
and more. Including this, I'm tracking a work list of about 100-150
patches for SMMUv3 in the next little bit.

This is not unique to SMMUv3, AMD is on part 6 of work for their
driver, and Intel has been pushing ~10-20 patches/cycle pretty
reliably. iommufd has opened the door to actually solving alot of the
stuck problems and everyone is rushing to complete their previously
stalled HW enablement. I have to review and help design all of this
work too! :)

BTW Michael's self test won't be in part 1 because it needs the ops to
be restored in order to work (now done in part 2), and has a few other
more minor dependencies on part 2 and 3.

Thanks,
Jason