[RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps

Christoph Hellwig hch at lst.de
Sun Mar 24 16:22:15 PDT 2024


On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote:
> If we are going to make caller provided uniformity a requirement, lets
> imagine a formal memory type idea to help keep this a little
> abstracted?
> 
>  DMA_MEMORY_TYPE_NORMAL
>  DMA_MEMORY_TYPE_P2P_NOT_ACS
>  DMA_MEMORY_TYPE_ENCRYPTED
>  DMA_MEMORY_TYPE_BOUNCE_BUFFER  // ??
> 
> Then maybe the driver flow looks like:
> 
> 	if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) {

Add a nice helper to make this somewhat readable, but yes.

> 	} else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) {
> 		num_hwsgls = transcation.num_sgls;
> 		for_each_range(transaction, range) {
> 			hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider);
> 			hwsgl[i].len = range.size;
> 		}
> 	} else {
> 		/* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */
> 		num_hwsgls = transcation.num_sgls;
> 		for_each_range(transaction, range) {
> 			hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length);
> 			hwsgl[i].len = range.size;
> 		}
>

And these two are really the same except that we call a different map
helper underneath.  So I think as far as the driver is concerned
they should be the same, the DMA API just needs to key off the
memory tap.

> And the hmm_range_fault case is sort of like:
> 
> 		struct dma_api_iommu_state state;
> 		dma_api_iommu_start(&state, mr.num_pages);
> 
> 		[..]
> 		hmm_range_fault(...)
> 		if (present)
> 			dma_link_page(&state, faulting_address_offset, page);
> 		else
> 			dma_unlink_page(&state, faulting_address_offset, page);
> 
> Is this looking closer?

Yes.

> > > So I take it as a requirement that RDMA MUST make single MR's out of a
> > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are
> > > not a functional replacement for a single MR.
> > 
> > But MRs consolidate multiple dma addresses anyway.
> 
> I'm not sure I understand this?

The RDMA MRs take a a list of PFNish address, (or SGLs with the
enhanced MRs from Mellanox) and give you back a single rkey/lkey.

> To go back to my main thesis - I would like a high performance low
> level DMA API that is capable enough that it could implement
> scatterlist dma_map_sg() and thus also implement any future
> scatterlist_v2, bio, hmm_range_fault or any other thing we come up
> with on top of it. This is broadly what I thought we agreed to at LSF
> last year.

I think the biggest underlying problem of the scatterlist based
DMA implementation for IOMMUs is that it's trying to handle to much,
that is magic coalescing even if the segments boundaries don't align
with the IOMMU page size.  If we can get rid of that misfeature I
think we'd greatly simply the API and implementation.



More information about the Linux-nvme mailing list