[PATCH 1/3] nvme-tcp: spurious I/O timeout under high load

Hannes Reinecke hare at suse.de
Mon May 23 09:07:36 PDT 2022


On 5/23/22 17:05, Sagi Grimberg wrote:
> 
[ .. ]
>>>> I'm open to discussion what we should be doing when the request is 
>>>> in the process of being sent. But when it didn't have a chance to be 
>>>> sent and we just overloaded our internal queuing we shouldn't be 
>>>> sending timeouts.
>>>
>>> As mentioned above, what happens if that same reporter opens another bug
>>> that the same phenomenon happens with soft-iwarp? What would you tell
>>> him/her?
>>
>> Nope. It's a HW appliance. Not a chance to change that.
> 
> It was just a theoretical question.
> 
> Do note that I'm not against solving a problem for anyone, I'm just
> questioning if increasing the io_timeout to be unbound in case the
> network is congested, is the right solution for everyone instead of
> a particular case that can easily be solved with udev to make the
> io_timeout to be as high as needed.
> 
> One can argue that this patchset is making nvme-tcp to basically
> ignore the device io_timeout in certain cases.

Oh, yes, sure, that will happen.
What I'm actually arguing is the imprecise difference between 
BLK_STS_AGAIN / BLK_STS_RESOURCE as a return value from ->queue_rq()
and command timeouts in case of resource constraints on the driver 
implementing ->queue_rq().

If there is a resource constrain driver is free to return 
BLK_STS_RESOURCE (in which case you wouldn't see a timeout) or accept 
the request (in which case there will be a timeout).

I could live with a timeout if that would just result in the command 
being retried. But in the case of nvme it results in a connection reset 
to boot, making customers really nervous that their system is broken.

And having a workload which can generate connection resets feels like a 
DoS attack to me; applications shouldn't be able to do that.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



More information about the Linux-nvme mailing list