[PATCH] Revert "nvme: remove the .stop_ctrl callout"
Sagi Grimberg
sagi at grimberg.me
Wed Jun 22 04:26:41 PDT 2022
> We encountered a problem that the disconnect command hangs.
> After analyzing the log and stack, we found that the triggering
> process is as follows:
> CPU0 CPU1
> nvme_rdma_error_recovery_work
> nvme_rdma_teardown_io_queues
> nvme_do_delete_ctrl nvme_stop_queues
> nvme_remove_namespaces
> --clear ctrl->namespaces
> nvme_start_queues
> --no ns in ctrl->namespaces
> nvme_ns_remove return(because ctrl is deleting)
> blk_freeze_queue
> blk_mq_freeze_queue_wait
> --wait for ns to unquiesce to clean infligt IO, hang forever
>
> This problem was not found in older kernels because we will flush
> err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
> seem to be modified for functional reasons, the patch can be revert
> to solve the problem.
>
> Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Indeed this analysis is correct.
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Couldn't retrieve the discussion that led to this patch
or the submission of it, but it does add a regression.
Perhaps you should have the title reflect that this is a regression so
stable kernels can pick it up.
More information about the Linux-nvme
mailing list