[PATCH V4 1/2] nvme-tcp: Prevent infinite loop if socket closes during CONNECTING state
Maurizio Lombardi
mlombard at bsdbackstore.eu
Mon Apr 14 00:25:25 PDT 2025
On Mon Apr 14, 2025 at 12:44 AM CEST, Sagi Grimberg wrote:
>
>
> On 04/04/2025 11:28, Maurizio Lombardi wrote:
>> There is a potential race condition that can occur if
>> the target closes the socket while the host is in the CONNECTING state.
>>
>> If the socket's state changes to TCP_CLOSE, the nvme_tcp_state_change()
>> function is invoked. However, nvme_tcp_error_recovery() is unable
>> to transition the controller state to NVME_CTRL_RESETTING because
>> the controller is still in the CONNECTING state. As a result, error
>> recovery is bypassed, and the controller incorrectly transitions
>> to the LIVE state with closed sockets.
>
> I think that the issue is that the controller moves to LIVE state - it
> shouldn't.
> However its not clear where this happens.
>
>>
>> Subsequent attempts by the host to communicate with the target
>> will result in an infinite loop.
>>
>> Fix the bug by initiating the error recovery process to correctly
>> handle the disconnection in case we missed this event
>> while transitioning from CONNECTING to LIVE.
>
> The problem is in the initial connect - here there is no error recovery
> and we want to propagate the error to the user.
Maybe there are other problems that I've not found, but this race
condition definitely exists.
This can be reproduced by deleting the port, target-side, with nvmetcli
just after the connection has been estabilished, before the controller
goes LIVE.
Yes, it's quite hard to hit it in practice, but if you want to see it
yourself, a small sleep in the right place in the host driver will
help you.
Maurizio
More information about the Linux-nvme
mailing list