[PATCH V4 1/2] nvme-tcp: Prevent infinite loop if socket closes during CONNECTING state

Tue Jun 10 05:50:32 PDT 2025

On Fri Apr 18, 2025 at 1:14 PM CEST, Sagi Grimberg wrote:
>
>
> On 4/17/25 16:04, Maurizio Lombardi wrote:
>> On Mon Apr 14, 2025 at 11:35 PM CEST, Sagi Grimberg wrote:
>>> I see the issue, but we need to make sure that if the connection closes
>>> before
>>> the controller finished establishing, then it cleans up correctly.
>>> Because at some
>>> point in the past - it wasn't the case. Things have changed in that path
>>> so it might
>>> be ok now... Just need to check. I'd trigger the race while the admin
>>> queue is establishing, as well
>>> as in the middle of the sequence of IO queues are establishing.
>> I believe my earlier testing for this patch already covered this scenario,
>> but I can rerun the tests to confirm and report back.
>
> So you indeed made sure that the failure starts sporadically in the 
> controller establishment sequence
> and there is no use-after-free issue?

Sorry for the long wait; I am now back at it.
I repeated the tests using a debug kernel, and nothing has been detected.

>
>>
>> Either way, any fixes needed should be unrelated to this patch in my opinion,
>> as this one covers the case where the controller
>> has already finished establishing the admin and I/O queues.
>
> Well, you are changing code that was added to prevent double free issues 
> when the error recovery
> and the initial connect sequence ran together.

Note that even with this patch, the error recovery and the initial
connect sequence cannot run together. The reset is initiated when
a send operation fails, after the controller has switched to the LIVE state.

Maurizio