[PATCH] nvme: don't set a virt_boundary unless needed

Thu Dec 21 04:17:46 PST 2023

On Thu, Dec 21, 2023 at 11:30:38AM +0200, Sagi Grimberg wrote:
>
>> NVMe PRPs are a pain and force the expensive virt_boundary checking on
>> block layer, prevent secure passthrough and require scatter/gather I/O
>> to be split into multiple commands which is problematic for the upcoming
>> atomic write support.
>
> But is the threshold still correct? meaning for I/Os small enough the
> device will have lower performance? I'm not advocating that we keep it,
> but we should at least mention the tradeoff in the change log.

Chaitanya benchmarked it on the first generation of devices that
supported SGLs.  On the only SGL-enabled device I have there is no
performance penality for using SGLs on small transfer, but I'd love
to see numbers from other setups.

>> For nvme-rdma the virt boundary is always required, as RMDA MRs are just
>> as dumb as NVMe PRPs.
>
> That is actually device dependent. The driver can ask for a pool of
> mrs with type IB_MR_TYPE_SG_GAPS if the device supports IBK_SG_GAPS_REG.
>
> See from ib_srp.c:
> --
>        if (device->attrs.kernel_cap_flags & IBK_SG_GAPS_REG)
>                 mr_type = IB_MR_TYPE_SG_GAPS;
>         else
>                 mr_type = IB_MR_TYPE_MEM_REG;
> --

For that we'd need to support IB_MR_TYPE_SG_GAPS gaps first, which can
be done as an incremental improvement.

>>   +	/*
>> +	 * nvme-apple always uses PRPs and thus needs to set a virt boundary.
>> +	 */
>> +	set_bit(NVME_CTRL_VIRT_BOUNDARY_IO, &anv->ctrl.flags);
>> +	set_bit(NVME_CTRL_VIRT_BOUNDARY_ADMIN, &anv->ctrl.flags);
>> +
>
> Why two flags? Why can't the core just always set the blk virt boundary
> on the admin request queue?

It could, and given that the admin queue isn't performance critical it
probably won't hurt in reality.  But why enforce a really weird limit
on the queue if there is no reason for it?  The only transport that
treats the admin queue different is PCIe, and that's just a spec
oddity.