[PATCH] nvme-pci: 512 byte dma pool segment quirk

Robert Beckett bob.beckett at collabora.com
Thu Nov 7 09:35:42 PST 2024



 ---- On Thu, 07 Nov 2024 17:19:30 +0000  Keith Busch  wrote --- 
 > On Thu, Nov 07, 2024 at 04:50:46PM +0000, Bob Beckett wrote:
 > > @@ -611,7 +612,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 > >      }
 > >  
 > >      nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 > > -    if (nprps <= (256 / 8)) {
 > > +    if (nprps small_dmapool_seg_size / 8)) {
 > >          pool = dev->prp_small_pool;
 > >          iod->nr_allocations = 0;
 > >      } else {
 > 
 > We have a constant expression currently, and this is changing it a full
 > division in the IO path. :(

yeah, that's fair. Does it get high enough throughput that this is a significant issue here? (I have little intuition for this driver).
how about pre-computing the nprps threshold during pool creation where we detect the quirk, it would then be variable comparison instead of a const comparison, but no divide?

 > 
 > Could we leave the pool selection check size as-is and just say the cost
 > of the quirk is additional memory overhead?
 > 
 > > @@ -2700,8 +2701,9 @@ static int nvme_setup_prp_pools(struct nvme_dev *dev)
 > >          return -ENOMEM;
 > >  
 > >      /* Optimisation for I/Os between 4k and 128k */
 > > -    dev->prp_small_pool = dma_pool_create("prp list 256", dev->dev,
 > > -                        256, 256, 0);
 > > +    dev->prp_small_pool = dma_pool_create("prp list small", dev->dev,
 > > +                        dev->small_dmapool_seg_size,
 > > +                        dev->small_dmapool_seg_size, 0);
 > 
 > I think it should work if we only change the alignment property of the
 > pool. Something like this:
 > 
 >     if (dev->ctrl.quirks & NVME_QUIRK_SMALL_DMAPOOL_512)
 >         dev->prp_small_pool = dma_pool_create("prp list 256", dev->dev,
 >                               256, 512, 0);

I actually already tested a change of 512, 512 while keeping the 256 devision above during testing (i.e. waste half of the segment). I'll confirm with a test again against latest and send a v2 assuming it tests fine.

 >     else
 >         dev->prp_small_pool = dma_pool_create("prp list 256", dev->dev,
 >                               256, 256, 0);
 > 




More information about the Linux-nvme mailing list