[RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting
Sagi Grimberg
sagi at grimberg.me
Sun Jun 11 23:36:53 PDT 2023
>>>> Hi Ming:
>>>>
>>>> Ming Lei <ming.lei at redhat.com> 于2023年6月6日周二 23:15写道:
>>>>>
>>>>> Hello Chunguang,
>>>>>
>>>>> On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
>>>>>> From: Chunguang Xu <chunguang.xu at shopee.com>
>>>>>>
>>>>>> We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
>>>>>> while removing ctrl. The root cause may due to the state of ctrl changed to
>>>>>> NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
>>>>>> nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
>>>>>
>>>>> I didn't dig into ctrl state check in these error handler yet, but error
>>>>> handling is supposed to provide forward progress for any controller state.
>>>>>
>>>>> Can you explain a bit how switching to DELETING interrupts the above
>>>>> error handling and breaks the forward progress guarantee?
>>>>
>>>> Here we freezed ctrl, if ctrl state has changed to DELETING or
>>>> DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
>>>> freeze, so nvme_remove_namespaces() hang.
>>>>
>>>> static void nvme_tcp_error_recovery_work(struct work_struct *work)
>>>> {
>>>> ...
>>>> if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
>>>> /* state change failure is ok if we started ctrl delete */
>>>> WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
>>>> ctrl->state != NVME_CTRL_DELETING_NOIO);
>>>> return;
>>>> }
>>>>
>>>> nvme_tcp_reconnect_or_remove(ctrl);
>>>> }
>>>>
>>>>
>>>> Another path, we will check ctrl state while reconnecting, if it changes to
>>>> DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
>>>> queue quiescing (through reset path), as a result Hang occurs.
>>>>
>>>> static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
>>>> {
>>>> /* If we are resetting/deleting then do nothing */
>>>> if (ctrl->state != NVME_CTRL_CONNECTING) {
>>>> WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
>>>> ctrl->state == NVME_CTRL_LIVE);
>>>> return;
>>>> }
>>>> ...
>>>> }
>>>>
>>>>>> freezed and queue is quiescing . Since scan_work may continue to issue IOs to
>>>>>> load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
>>>>>> hang in flush_work(&ctrl->scan_work).
>>>>>>
>>>>>> After analyzation, we found that there are mainly two case:
>>>>>> 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
>>>>>> new IO to load partition table.
>>>>>
>>>>> Yeah, nvme freeze usage is fragile, and I suggested to move
>>>>> nvme_start_freeze() from nvme_tcp_teardown_io_queues to
>>>>> nvme_tcp_configure_io_queues(), such as the posted change on rdma:
>>>>>
>>>>> https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
>>>>
>>>> While drive reconnecting, I think we should freeze ctrl or quiescing queue,
>>>> otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
>>>> and the IOs may retry frequently. So I think we may better freeze ctrl
>>>> while entering
>>>> error_recovery/reconnect, but need to unfreeze it while exit.
>>>
>>> quiescing is always done in error handling, and freeze is actually
>>> not a must, and it is easier to cause race by calling freeze & unfreeze
>>> from different contexts.
>>>
>>> But yes, unquiesce should have been done after exiting error handling, or
>>> simply do it in nvme_unquiesce_io_queues().
>>>
>>> And the following patch should cover all these hangs:
>>>
>>
>> Ming, are you sending a formal patchset for this?
>
> Not yet, will do it.
Would like it to get to the next pull request going out this week...
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 3ec38e2b9173..83d3818fc60b 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>> */
>>> nvme_mpath_clear_ctrl_paths(ctrl);
>>> + /* unquiesce io queues so scan work won't hang */
>>> + nvme_unquiesce_io_queues(ctrl);
>>
>> What guarantees that the queues won't be quiesced right after this
>> by the transport?
>
> Please see nvme_change_ctrl_state(), if controller state is in
> DELETING, new NVME_CTRL_RESETTING/NVME_CTRL_CONNECTING can be entered
> any more.
Yes, this relies on the fact that nvme_remove_namespaces is only called
after DELETING state was set. ok.
>> I'm still unclear why this affects the scan_work?
>
> As Chunguang mentioned, if error recover is terminated by nvme deletion,
> the controller can be kept in quiesced state, then in-queue IOs can'tu
> move on, meantime new error recovery can't be started successfully because
> controller state is NVME_CTRL_DELETING, so any pending IOs(include those
> from scan context) can't be completed.
Yes. please separate to individual patches when submitting though.
More information about the Linux-nvme
mailing list