[PATCH 2/3] nvme-tcp: Check for write space before queueing requests

Sat May 21 13:01:38 PDT 2022

>>> The current model of always queue incoming requests lead to
>>> write stalls as we easily overload the network device under
>>> high I/O load.
>>> To avoid unlimited queueing we should rather check if write
>>> space is available before accepting new requests.
>>
>> I'm somewhat on the fence with this one... On one end, we
>> are checking the sock write space, but don't check the queued
>> requests. And, this is purely advisory and not really a check
>> we rely on.
>>
>> The merit of doing something like this is that we don't start
>> the request timer, but we can just as easily queue the request
>> and have it later queued for long due to sock being overloaded.
>>
>> Can you explain your thoughts to why this is a good solution?
>>
> Request timeouts.
> As soon as we call 'blk_mq_start_request()' the I/O timer is called, and 
> given that we (currently) queue _every_ request irrespective of the 
> underlying device status we might end up queueing for a _loooong_ time.
> 
> Timeouts while still in the queue are being handled by the first patch, 
> but the underlying network might also be busy with retries and whatnot.
> So again, queuing requests when we _know_ there'll be a congestion is 
> just asking for trouble (or, rather, spurious I/O timeouts).
> 
> If one is worried about performance one can always increase the wmem 
> size :-), but really it means that either your testcase or your network 
> is misdesigned.
> And I'm perfectly fine with increasing the latency in these cases.
> What I don't like is timeouts, as these will show up to the user and we 
> get all the supportcalls telling us that the kernel is broken.

Can you run some sanity perf tests to understand if anything unexpected
comes up from this?