[PATCH V4 1/2] nvme-tcp: Prevent infinite loop if socket closes during CONNECTING state

Mon Apr 14 14:35:11 PDT 2025

On 14/04/2025 10:25, Maurizio Lombardi wrote:
> On Mon Apr 14, 2025 at 12:44 AM CEST, Sagi Grimberg wrote:
>>
>> On 04/04/2025 11:28, Maurizio Lombardi wrote:
>>> There is a potential race condition that can occur if
>>> the target closes the socket while the host is in the CONNECTING state.
>>>
>>> If the socket's state changes to TCP_CLOSE, the nvme_tcp_state_change()
>>> function is invoked. However, nvme_tcp_error_recovery() is unable
>>> to transition the controller state to NVME_CTRL_RESETTING because
>>> the controller is still in the CONNECTING state. As a result, error
>>> recovery is bypassed, and the controller incorrectly transitions
>>> to the LIVE state with closed sockets.
>> I think that the issue is that the controller moves to LIVE state - it
>> shouldn't.
>> However its not clear where this happens.
>>
>>> Subsequent attempts by the host to communicate with the target
>>> will result in an infinite loop.
>>>
>>> Fix the bug by initiating the error recovery process to correctly
>>> handle the disconnection in case we missed this event
>>> while transitioning from CONNECTING to LIVE.
>> The problem is in the initial connect - here there is no error recovery
>> and we want to propagate the error to the user.
> Maybe there are other problems that I've not found, but this race
> condition definitely exists.
> This can be reproduced by deleting the port, target-side, with nvmetcli
> just after the connection has been estabilished, before the controller
> goes LIVE.
> Yes, it's quite hard to hit it in practice, but if you want to see it
> yourself, a small sleep in the right place in the host driver will
> help you.

I see the issue, but we need to make sure that if the connection closes 
before
the controller finished establishing, then it cleans up correctly. 
Because at some
point in the past - it wasn't the case. Things have changed in that path 
so it might
be ok now... Just need to check. I'd trigger the race while the admin 
queue is establishing, as well
as in the middle of the sequence of IO queues are establishing.