nvme/tcp: infinite loop, livelock, and OOPS issues on disconnect

Sagi Grimberg sagi at grimberg.me
Tue Sep 27 23:24:55 PDT 2022


>> Thanks for moving to a recent kernel.
>>
>> Based on the stack trace, the reason for the hang is that
>> disconnect is attempting to remove all the namespaces before
>> starting to teardown the controller, before that it flushes
>> any ns scanning to not compete with it running concurrently.
>>
>> but NS scanning is stuck when discovering a namespace because
>> adding it as a disk triggers some I/O most likely for stuff
>> like partition scanning etc.
>>
>> The ns scan is stuck only because its inflight I/O never completes,
>> which is likely related to the errors you are inserting from the
>> controller side every now and then...
>>
>> I do see a possible race here. that right when a transport error is
>> observed, error recovery is scheduled (err_work) but before it is
>> executing, delete_ctrl is starting to teardown the controller,
>> particularly calling nvme_stop_ctrl() -> nvme_tcp_stop_ctrl()
>> which cancels the err_work.
>>
>> If the error recovery is not running, nothing will fail inflight
>> commands if they are not failed by the controller explicitly.
>>
>> What would be helpful, is to know that once the issue reproduced,
>> if the ctrl->err_work, was canceled without executing.
>>
>> So if the theory is correct, this fix is warranted:
>> --
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index d5871fd6f769..2524b5304bfb 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -2237,7 +2237,7 @@ static void nvme_reset_ctrl_work(struct
>> work_struct *work)
>>
>>    static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl)
>>    {
>> -       cancel_work_sync(&to_tcp_ctrl(ctrl)->err_work);
>> +       flush_work(&to_tcp_ctrl(ctrl)->err_work);
>>           cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
>>    }
>> --
>>
>> This will make err_work to run, and then it is guaranteed that all
>> inflight requests are completed/cancelled.
> 
> Good news. I ran with the fix proposed continuously for three days
> without reproducing the issue.

Great, thanks for testing!

> Is there any specific instrumentation
> you want me to run to get direct confirmation of the bug/fix? If not,
> I think we're all set.

I don't think that this is necessary, the fix makes sense, and it does
seem to address the issue.



More information about the Linux-nvme mailing list