[PATCH 3/6] nvme: tcp: unquiesce queues after ctrl state is changed

Sagi Grimberg sagi at grimberg.me
Tue Jul 11 02:05:34 PDT 2023



On 7/11/23 10:53, Ming Lei wrote:
> On Tue, Jul 11, 2023 at 10:41:46AM +0300, Sagi Grimberg wrote:
>>
>>> Unquiesce queues until controller state is changed successfully.
>>>
>>> Prepare for moving start freeze into reconnect work for preventing new
>>> request from entering queue in LIVE state.
>>>
>>> Before removing namespace, unquiescing queues is done unconditionally, so
>>> it is safe to do this way in case of state change failure.
>>>
>>> Signed-off-by: Ming Lei <ming.lei at redhat.com>
>>> ---
>>>    drivers/nvme/host/tcp.c | 7 ++++---
>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index e414cfb3433f..e68bf21af348 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -2109,10 +2109,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
>>>    	nvme_stop_keep_alive(ctrl);
>>>    	flush_work(&ctrl->async_event_work);
>>>    	nvme_tcp_teardown_io_queues(ctrl);
>>> -	/* unquiesce to fail fast pending requests */
>>> -	nvme_unquiesce_io_queues(ctrl);
>>>    	nvme_tcp_teardown_admin_queue(ctrl);
>>> -	nvme_unquiesce_admin_queue(ctrl);
>>>    	nvme_auth_stop(ctrl);
>>
>> This is potentially a problem. It is possible that auth work is
>> running in the bg here, and without unquiescing the admin queue
>> the the auth_stop may take unnecessarily long until its I/O times
>> out.
> 
> There isn't any IO queued to nvme before unquiesce since nvme_cancel_*tagset
> cancels all inflight IOs.

There is a window after cancel and before unquiesce that requests can
come in.

>>
>> I think it is fine to keep it as is...
> 
> If ctrl state is LIVE, nvme_check_ready() return true, then IO is
> sent to socket before changing ctrl state to NVME_CTRL_CONNECTING.

a) the controller is not live because it was changed to RESETTING before
b) nvme_check_ready() will also fail because the queue cleared bit
NVME_TCP_Q_LIVE when the io queues stopped in
nvme_tcp_teardown_io_queues.

> So unquisce has to be done after state is changed to NVME_CTRL_CONNECTING
> if we want to move start_freeze into re-connection work.

See my response. It is perfectly safe to keep it as is.



More information about the Linux-nvme mailing list