[LSF/MM/BPF BOF] Userspace command abouts

Sagi Grimberg sagi at grimberg.me
Mon Feb 27 08:33:57 PST 2023


>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>>> I do think that we should work on CDL for NVMe as it will solve some of
>>> the timeout related problems effectively than using aborts or any other
>>> mechanism.
>>
>> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
>> The last I heard, one point of contention was where the duration limit property
>> exists: within the command, or the queue. From my perspective, if it's not at
>> the queue level, the limit becomes meaningless, but hey, it's not up to me.
> 
> Limit attached to the command makes things more flexible and easier for the
> host, so personally, I prefer that. But this has an impact on the controller:
> the device needs to pull in *all* commands to be able to know the limits and do
> scheduling/aborts appropriately. That is not something that the device designers
> like, for obvious reasons (device internal resources...).
> 
> On the other hand, limits attached to queues could lead to either a serious
> increase in the number of queues (PCI space & number of IRQ vectors limits), or,
> loss of performance as a particular queue with the desired limit would be
> accessed from multiple CPUs on the host (lock contention). Tricky problem I
> think with lots of compromises.

I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
the queue level would cause the host to open more queues?

Another question, does CDL have any relationship with NVMe "Time Limited
Error Recovery"? where the host can set a feature for timeout and
indicate if the controller should respect it per command?

While this is not a full-blown every queue/command has its own timeout,
it could address the original use-case given by Hannes. And it's already
there.



More information about the Linux-nvme mailing list