[RFC 0/7] Introduce swiotlb throttling

Michael Kelley mhklinux at outlook.com
Mon Aug 26 08:27:30 PDT 2024


From: Christoph Hellwig <hch at lst.de> Sent: Saturday, August 24, 2024 1:16 AM
> 
> On Thu, Aug 22, 2024 at 11:37:11AM -0700, mhkelley58 at gmail.com wrote:
> > Because it's not possible to detect at runtime whether a DMA map call
> > is made in a context that can block, the calls in key device drivers
> > must be updated with a MAY_BLOCK attribute, if appropriate. When this
> > attribute is set and swiotlb memory usage is above a threshold, the
> > swiotlb allocation code can serialize swiotlb memory usage to help
> > ensure that it is not exhausted.
> 
> One thing I've been doing for a while but haven't gotten to due to
> my lack of semantic patching skills is that we really want to split
> the few flags useful for dma_map* from DMA_ATTR_* which largely
> only applies to dma_alloc.
> 
> Only DMA_ATTR_WEAK_ORDERING (if we can't just kill it entirely)
> and for now DMA_ATTR_NO_WARN is used for both.
> 
> DMA_ATTR_SKIP_CPU_SYNC and your new SLEEP/BLOCK attribute is only
> useful for mapping, and the rest is for allocation only.
> 
> So I'd love to move to a DMA_MAP_* namespace for the mapping flags
> before adding more on potentially widely used ones.

OK, this makes sense to me. The DMA_ATTR_* symbols are currently
defined as just values that are not part of an enum or any other higher
level abstraction, and the "attrs" parameter to the dma_* functions is
just "unsigned long". Are you thinking that the separate namespace is
based only on the symbolic name (i.e., DMA_MAP_* vs DMA_ATTR_*),
with the values being disjoint? That seems straightforward to me.
Changing the "attrs" parameter to an enum is a much bigger change ....

For a transition period we can have both DMA_ATTR_SKIP_CPU_SYNC
and DMA_MAP_SKIP_CPU_SYNC, and then work to change all
occurrences of the former to the latter.

I'll have to look more closely at WEAK_ORDERING and NO_WARN.

There are also a couple of places where DMA_ATTR_NO_KERNEL_MAPPING
is used for dma_map_* calls, but those are clearly bogus since that
attribute is never tested in the map path.

> 
> With a little grace period we can then also phase out DMA_ATTR_NO_WARN
> for allocations, as the gfp_t can control that much better.
> 
> > In general, storage device drivers can take advantage of the MAY_BLOCK
> > option, while network device drivers cannot. The Linux block layer
> > already allows storage requests to block when the BLK_MQ_F_BLOCKING
> > flag is present on the request queue.
> 
> Note that this also in general involves changes to the block drivers
> to set that flag, which is a bit annoying, but I guess there is not
> easy way around it without paying the price for the BLK_MQ_F_BLOCKING
> overhead everywhere.

Agreed. I assumed there was some cost to BLK_MQ_F_BLOCKING since
the default is !BLK_MQ_F_BLOCKING, but I don't really know what
that is. Do you have a short summary, just for my education?

Michael




More information about the Linux-nvme mailing list