[RFC 0/7] Introduce swiotlb throttling

Michael Kelley mhklinux at outlook.com
Mon Aug 26 17:26:36 PDT 2024


From: Petr Tesařík <petr at tesarici.cz> Sent: Monday, August 26, 2024 12:28 PM
> 
> On Mon, 26 Aug 2024 16:24:53 +0000
> Michael Kelley <mhklinux at outlook.com> wrote:
> 
> > From: Petr Tesařík <petr at tesarici.cz> Sent: Saturday, August 24, 2024 1:06 PM
> > >
> > > On Fri, 23 Aug 2024 20:40:16 +0000
> > > Michael Kelley <mhklinux at outlook.com> wrote:
> > >
> > > > From: Petr Tesařík <petr at tesarici.cz> Sent: Thursday, August 22, 2024 11:45 PM
> > > >[...]
> > > > > > Discussion
> > > > > > ==========
> > > > > > * Since swiotlb isn't visible to device drivers, I've specifically
> > > > > > named the DMA attribute as MAY_BLOCK instead of MAY_THROTTLE or
> > > > > > something swiotlb specific. While this patch set consumes MAY_BLOCK
> > > > > > only on the DMA direct path to do throttling in the swiotlb code,
> > > > > > there might be other uses in the future outside of CoCo VMs, or
> > > > > > perhaps on the IOMMU path.
> > > > >
> > > > > I once introduced a similar flag and called it MAY_SLEEP. I chose
> > > > > MAY_SLEEP, because there is already a might_sleep() annotation, but I
> > > > > don't have a strong opinion unless your semantics is supposed to be
> > > > > different from might_sleep(). If it is, then I strongly prefer
> > > > > MAY_BLOCK to prevent confusing the two.
> > > >
> > > > My intent is that the semantics are the same as might_sleep(). I
> > > > vacillated between MAY_SLEEP and MAY_BLOCK. The kernel seems
> > > > to treat "sleep" and "block" as equivalent, because blk-mq has
> > > > the BLK_MQ_F_BLOCKING flag, and SCSI has the
> > > > queuecommand_may_block flag that is translated to
> > > > BLK_MQ_F_BLOCKING. So I settled on MAY_BLOCK, but as you
> > > > point out, that's inconsistent with might_sleep(). Either way will
> > > > be inconsistent somewhere, and I don't have a preference.
> > >
> > > Fair enough. Let's stay with MAY_BLOCK then, so you don't have to
> > > change it everywhere.
> > >
> > > >[...]
> > > > > > Open Topics
> > > > > > ===========
> > > > > > 1. swiotlb allocations from Xen and the IOMMU code don't make use
> > > > > > of throttling. This could be added if beneficial.
> > > > > >
> > > > > > 2. The throttling values are currently exposed and adjustable in
> > > > > > /sys/kernel/debug/swiotlb. Should any of this be moved so it is
> > > > > > visible even without CONFIG_DEBUG_FS?
> > > > >
> > > > > Yes. It should be possible to control the thresholds through
> > > > > sysctl.
> > > >
> > > > Good point.  I was thinking about creating /sys/kernel/swiotlb, but
> > > > sysctl is better.
> > >
> > > That still leaves the question where it should go.
> > >
> > > Under /proc/sys/kernel? Or should we make a /proc/sys/kernel/dma
> > > subdirectory to make room for more dma-related controls?
> >
> > I would be good with /proc/sys/kernel/swiotlb (or "dma"). There
> > are only two entries (high_throttle and low_throttle), but just
> > dumping everything directly in /proc/sys/kernel doesn't seem like
> > a good long-term approach.  Even though there are currently a lot
> > of direct entries in /proc/sys/kernel, that may be historical, and not
> > changeable due to backwards compatibility requirements.
> 
> I think SWIOTLB is a bit too narrow. How many controls would we add
> under /proc/sys/kernel/swiotlb? The chances seem higher if we call it
> /proc/sys/kernel/dma/swiotlb_{low,high}_throttle, and it follows the
> paths in source code (which are subject to change any time, however).
> Anyway, I don't want to get into bikeshedding; I'm fine with whatever
> you send in the end. :-)
> 
> BTW those entries directly under /proc/sys/kernel are not all
> historical. The io_uring_* controls were added just last year, see
> commit 76d3ccecfa18.
> 

Note that there could be multiple instances of the throttle values, since
a DMA restricted pool has its own struct io_tlb_mem that is separate
from the default. I wrote the code so that throttling is independently
applied to a restricted pool as well, though I haven't tested it.

So the typical case is that we'll have high and low throttle values for the
default swiotlb pool, but we could also have high and low throttle
values for any restricted pools.

Maybe the /proc pathnames would need to be:

   /proc/sys/kernel/dma/swiotlb_default/high_throttle
   /proc/sys/kernel/dma/swiotlb_default/low_throttle
   /proc/sys/kernel/dma/swiotlb_<rpoolname>/high_throttle
   /proc/sys/kernel/dma/swiotlb_<rpoolname>/low_throttle

Or we could throw all the throttles directly into the "dma" directory,
though that makes for fairly long names in lieu of a deeper directory
structure:

   /proc/sys/kernel/dma/default_swiotlb_high_throttle
   /proc/sys/kernel/dma/default_swiotlb_low_throttle
   /proc/sys/kernel/dma/<rpoolname>_swiotlb_high_throttle
   /proc/sys/kernel/dma/<rpoolname_>swiotlb_low_throttle

Thoughts?

Michael


More information about the Linux-nvme mailing list