[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Sagi Grimberg sagi at grimberg.me
Sun Apr 27 03:35:58 PDT 2025



On 26/04/2025 2:47, Kamaljit Singh wrote:
> Sagi,
>
>>> Right after the check sk_stream_is_writeable(), there could have new
>>> room in the socket
>>> send buffer, state_change() fired, and queued io_work, before we get
>>> to the pending
>>> check. This is the "lost event" problem.
>> Actually, this is incorrect. write_space() -> queue_work(io_work) will
>> be re-queued if io_work
>> is currently running. Hence I'm not sure that this flag is needed.
>>
>> Kamaljit, can you run with the below and report if it fixes your problem?
> With the last set of fixes that included list_del_init() you suggested, all
> tests have been stable. I'm not seeing any more issues. Are you asking to
> test this patch regardless? If so, does this need to be in addition to
> previous patches?

Instead of my patch that introduces WAKE_SENDER - apply just the change 
below
that looks at sk_stream_is_writeable() only. I no longer think that the 
other changes in that patch solve anything.
>
>
>> --
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 4d20fcc0a230..835e29014841 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -1390,6 +1390,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
>>                   else if (unlikely(result < 0))
>>                           return;
>>
>> +               /* did we get some space after spending time in recv? */
>> +               if (nvme_tcp_queue_has_pending(queue) &&
>> +                   sk_stream_is_writeable(queue->sock->sk))
>> +                       pending = true;
>> +
>>                   if (!pending || !queue->rd_enabled)
>>                           return;
>> --
> Thanks,
> Kamaljit




More information about the Linux-nvme mailing list