[PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper

Robin Murphy robin.murphy at arm.com
Thu Mar 9 11:04:19 PST 2023


On 2023-03-09 14:19, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:51:20PM +0000, Robin Murphy wrote:
> 
>> Please note I stress "valid" since I'm not buying arbitrarily made-up
>> conceptual purity arguments. A nested domain cannot be the "one true domain"
>> that is an opaque combination of S1+S2; the IOMMU API view has to be more
>> like the device is attached to both the nested domain and the parent stage 2
>> domain somewhat in parallel.
> 
> I strongly disagree with this.
> 
> The design we have from the core perspective is an opaque domain that
> is a hidden combination of S1/S2 inside the driver. We do not want to
> change the basic design of the iommu core: there is only one domain
> attached to a device/group at a time.
> 
> This patch should be seen as a temporary hack to allow the ARM ITS
> stuff to hobble on a little longer. We already know that iommufd use
> cases are incompatible with the design and we need to fix it. The
> fixed solution must have iommufd install the ITS pages at domain
> allocation time and so it will not need these APIs at all. This
> tempoary code should not dictate the overall design of the iommu core.
> 
> If we belive exposing the S1/S2 relationships through the iommu core
> is necessary for another legitimate purpose I would like to hear about
> it. In my opinion using APIs to peek into these details is far more
> likely to be buggy and long term I prefer to block the ability to
> learn the S2 externally from iommufd completely.
> 
> Thus the overall design of the iommu core APIs is not being changed.
> The core API design follows this logic with and without nesting:
>     iommu_attach_device(domin);
>     WARN_ON(domain != iommu_get_domain_for_dev());

That is indeed one of many conditions that are true today, but the facts 
are that nothing makes that specific assertion, nothing should ever 
*need* to make that specific assertion, and any driver sufficiently dumb 
to not bother keeping track of its own domain and relying on 
iommu_get_domain_for_dev() to retrieve it should most definitely not be 
allowed anywhere near nesting.

The overall design of the iommu core APIs *is* being changed, because 
the current design is also that iommu_get_domain_for_dev() always 
returns the correct domain that iommu_dma_prepare_msi() needs, which 
breaks with nesting. You are literally insisting on changing this core 
API, to work around intentionally breaking an existing behaviour which 
could easily be preserved (far less invasively), for the sake of 
preserving some other theoretical behaviour that IS NOT USEFUL.

The overall design of the iommu core APIs *is* being changed, because 
the core API design also follows this logic for any domain type:

	domain = iommu_get_domain_for_dev();
	phys = iommu_iova_to_phys(domain, iova);
	//phys meaningfully represents whether iova was valid

Yes, even blocking and identity domains, because drivers ACTUALLY DO 
THIS. I'm not sure there even is a single correct thing that nesting 
domains could do to satisfy all the possible expectations that callers 
of iommu_iova_to_phys() may have. However if the grand design says it's 
not OK for iommu_get_domain_for_dev() to return what 
iommu_dma_prepare_msi() needs even though nobody else should ever be 
passing a nesting domain to it, then it's also not OK for 
iommu_iova_to_phys() to crash or lie and return 0 when a valid 
translation (by some notion) exists, even though nobody should ever pass 
a nesting domain in there either.

Forgive me for getting wound up, but I'm a pragmatist and my tolerance 
for ignoring reality is low.

> The hack in this patch gets its own special single-use APIs so we can
> purge them once they are not needed and do not confusingly contaminate
> the whole design. For this reason the ops call back should only be
> implemented by SMMUv3.
> 
>> Even when nesting is active, the S2 domain still exists as a domain
>> in its own right, and still needs to be visible and operated on as
>> such, for instance if memory is hotplugged in or out of the VM.
> 
> It exists in iommufd and iommufd will operate it. This is not a
> problem.
> 
> iommufd is not using a dual attach model.
> 
> The S2 is provided to the S1's domain allocation function as creation
> data. The returned S1 domain opaquely embeds the S2. The embedded S2
> cannot be changed once the S1 domain is allocated.
> 
> Attaching the S1 always causes the embedded S2 to be used too - they
> are not separable so we don't have APIs talking about about
> "attaching" the S2.

Yes, that is one way of viewing things, but it's far from the only way. 
The typical lifecycle will involve starting the VM with S2 alone, then 
enabling nesting later - we can view that as allocating a nested domain 
based on S2, then "replacing" S2 with nested, but we could equally view 
it as just attaching the nested domain on top of the existing S2, like 
pushing to a stack (I do agree that trying to model it as two completely 
independent and orthogonal attachments would not be sensible). It's 
really just semantics of how we prefer to describe things, and whether 
the s2_domain pointer is passed up-front to domain_alloc or later to 
attach_dev.

The opaque nested domain looks clean in some ways, but on the other hand 
it also heavily obfuscates how translations may be shared between one 
nested domain and other nested and non-nested domains, such that 
changing mappings in one may affect others. This is a major and 
potentially surprising paradigm shift away from the current notion that 
all domains represent individual isolated address spaces, so abstracting 
it more openly within the core API internals would be clearer about 
what's really going on. And putting common concepts at common levels of 
abstraction makes things simpler and is why we have core API code at all.

Frankly it's not like we don't already have various API-internal stuff 
in struct iommu_domain that nobody external should ever be looking at, 
but if that angst remains overwhelming then I can't imagine it being all 
that much work to move the definition to iommu-priv.h - AFAICS it 
shouldn't need much more than some helpers for the handful of 
iommu_get_domain_for_dev() users currently inspecting the type, 
pgsize_bitmap, or geometry - which would ultimately be a good deal 
neater and more productive than adding yet more special-case ops that 
every driver is just going to implement identically.

And to even imagine the future possibility of things like S2 pagetable 
sharing with KVM, or unpinned S2 with ATS/PRI or SMMU stalls, by 
tunnelling more private nesting interfaces directly between IOMMUFD and 
individual drivers without some degree of common IOMMU API abstraction 
in between... no. Just... no.

Thanks,
Robin.



More information about the linux-arm-kernel mailing list