[PATCH 1/3] nvme-tcp: spurious I/O timeout under high load
Hannes Reinecke
hare at suse.de
Tue May 24 01:08:50 PDT 2022
On 5/24/22 09:57, Sagi Grimberg wrote:
>
>>>>>> I'm open to discussion what we should be doing when the request is
>>>>>> in the process of being sent. But when it didn't have a chance to
>>>>>> be sent and we just overloaded our internal queuing we shouldn't
>>>>>> be sending timeouts.
>>>>>
>>>>> As mentioned above, what happens if that same reporter opens
>>>>> another bug
>>>>> that the same phenomenon happens with soft-iwarp? What would you tell
>>>>> him/her?
>>>>
>>>> Nope. It's a HW appliance. Not a chance to change that.
>>>
>>> It was just a theoretical question.
>>>
>>> Do note that I'm not against solving a problem for anyone, I'm just
>>> questioning if increasing the io_timeout to be unbound in case the
>>> network is congested, is the right solution for everyone instead of
>>> a particular case that can easily be solved with udev to make the
>>> io_timeout to be as high as needed.
>>>
>>> One can argue that this patchset is making nvme-tcp to basically
>>> ignore the device io_timeout in certain cases.
>>
>> Oh, yes, sure, that will happen.
>> What I'm actually arguing is the imprecise difference between
>> BLK_STS_AGAIN / BLK_STS_RESOURCE as a return value from ->queue_rq()
>> and command timeouts in case of resource constraints on the driver
>> implementing ->queue_rq().
>>
>> If there is a resource constrain driver is free to return
>> BLK_STS_RESOURCE (in which case you wouldn't see a timeout) or accept
>> the request (in which case there will be a timeout).
>
> There is no resource constraint. The driver sizes up the resources
> to be able to queue all the requests it is getting.
>
>> I could live with a timeout if that would just result in the command
>> being retried. But in the case of nvme it results in a connection
>> reset to boot, making customers really nervous that their system is
>> broken.
>
> But how does the driver know that it is running in this environment that
> is completely congested? What I'm saying is that this is a specific use
> case that the solution can have negative side-effects for other common
> use-cases, because it is beyond the scope of the driver to handle.
>
> We can also trigger this condition with nvme-rdma.
>
> We could stay with this patch, but I'd argue that this might be the
> wrong thing to do in certain use-cases.
>
Right, okay.
Arguably this is a workload corner case, and we might not want to fix
this in the driver.
_However_: do we need to do a controller reset in this case?
Shouldn't it be sufficient to just complete the command w/ timeout error
and be done with it?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
More information about the Linux-nvme
mailing list