[PATCH] nvme-pci: calculate IO timeout

Wed Oct 13 08:34:33 PDT 2021

On Tue, Oct 12, 2021 at 07:27:44PM -0700, Keith Busch wrote:
> Existing host and nvme device combinations are more frequently capable
> of sustaining outstanding transfer sizes exceeding the driver's default
> timeout tolerance, given the available device throughput.
> 
> Let's consider a "mid" level server and controller with 128 CPUs and an
> NVMe controller with no MDTS limit (the driver will throttle to 4MiB).
> 
> If we assume the driver's default 1k depth per-queue, this can allow
> 128k outstanding IO submission queue entries.
> 
> If all SQ Entries are transferring the 4MiB max request, 512GB will be
> outstanding at the same time with the default 30 second timer to
> complete the entirety.
> 
> If we assume a currently modern PCIe Gen4 x4 NVMe device, that amount of
> data will take ~70 seconds to transfer over the PCIe link, not
> considering the device side internal latency: timeouts and IO failures
> are therefore inevitable.

PCIe link is supposed to be much quicker than handling IOs in device side,
so nvme device should have been saturated already before using up the
PCIe link, is there any event or feedback from nvme device side(host or
device) about the saturation status?

SCSI have such mechanism so that queue depth can be adjusted according
to the feedback, and Martin is familiar with this field. 

Thanks, 
Ming