[PATCH] nvme: don't set a virt_boundary unless needed
Christoph Hellwig
hch at lst.de
Thu Dec 21 04:17:46 PST 2023
On Thu, Dec 21, 2023 at 11:30:38AM +0200, Sagi Grimberg wrote:
>
>> NVMe PRPs are a pain and force the expensive virt_boundary checking on
>> block layer, prevent secure passthrough and require scatter/gather I/O
>> to be split into multiple commands which is problematic for the upcoming
>> atomic write support.
>
> But is the threshold still correct? meaning for I/Os small enough the
> device will have lower performance? I'm not advocating that we keep it,
> but we should at least mention the tradeoff in the change log.
Chaitanya benchmarked it on the first generation of devices that
supported SGLs. On the only SGL-enabled device I have there is no
performance penality for using SGLs on small transfer, but I'd love
to see numbers from other setups.
>> For nvme-rdma the virt boundary is always required, as RMDA MRs are just
>> as dumb as NVMe PRPs.
>
> That is actually device dependent. The driver can ask for a pool of
> mrs with type IB_MR_TYPE_SG_GAPS if the device supports IBK_SG_GAPS_REG.
>
> See from ib_srp.c:
> --
> if (device->attrs.kernel_cap_flags & IBK_SG_GAPS_REG)
> mr_type = IB_MR_TYPE_SG_GAPS;
> else
> mr_type = IB_MR_TYPE_MEM_REG;
> --
For that we'd need to support IB_MR_TYPE_SG_GAPS gaps first, which can
be done as an incremental improvement.
>> + /*
>> + * nvme-apple always uses PRPs and thus needs to set a virt boundary.
>> + */
>> + set_bit(NVME_CTRL_VIRT_BOUNDARY_IO, &anv->ctrl.flags);
>> + set_bit(NVME_CTRL_VIRT_BOUNDARY_ADMIN, &anv->ctrl.flags);
>> +
>
> Why two flags? Why can't the core just always set the blk virt boundary
> on the admin request queue?
It could, and given that the admin queue isn't performance critical it
probably won't hurt in reality. But why enforce a really weird limit
on the queue if there is no reason for it? The only transport that
treats the admin queue different is PCIe, and that's just a spec
oddity.
More information about the Linux-nvme
mailing list