[PATCH v2 6/8] nvme-rdma: serialize controller teardown sequences
Sagi Grimberg
sagi at grimberg.me
Tue Aug 18 20:35:30 EDT 2020
On 8/14/20 2:12 PM, James Smart wrote:
>
>
> On 8/6/2020 12:11 PM, Sagi Grimberg wrote:
>> In the timeout handler we may need to complete a request because the
>> request that timed out may be an I/O that is a part of a serial sequence
>> of controller teardown or initialization. In order to complete the
>> request, we need to fence any other context that may compete with us
>> and complete the request that is timing out.
>>
>> In this case, we could have a potential double completion in case
>> a hard-irq or a different competing context triggered error recovery
>> and is running inflight request cancellation concurrently with the
>> timeout handler.
>>
>> Protect using a ctrl teardown_lock to serialize contexts that may
>> complete a cancelled request due to error recovery or a reset.
>>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>> drivers/nvme/host/rdma.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 44c76ffbb264..abc318737f35 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -122,6 +122,7 @@ struct nvme_rdma_ctrl {
>> struct sockaddr_storage src_addr;
>> struct nvme_ctrl ctrl;
>> + struct mutex teardown_lock;
>> bool use_inline_data;
>> u32 io_queues[HCTX_MAX_TYPES];
>> };
>> @@ -997,6 +998,7 @@ static int nvme_rdma_configure_io_queues(struct
>> nvme_rdma_ctrl *ctrl, bool new)
>> static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
>> bool remove)
>> {
>> + mutex_lock(&ctrl->teardown_lock);
>> blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
>> nvme_rdma_stop_queue(&ctrl->queues[0]);
>> if (ctrl->ctrl.admin_tagset) {
>> @@ -1007,11 +1009,13 @@ static void
>> nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
>> if (remove)
>> blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
>> nvme_rdma_destroy_admin_queue(ctrl, remove);
>> + mutex_unlock(&ctrl->teardown_lock);
>> }
>> static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
>> bool remove)
>> {
>> + mutex_lock(&ctrl->teardown_lock);
>> if (ctrl->ctrl.queue_count > 1) {
>> nvme_start_freeze(&ctrl->ctrl);
>> nvme_stop_queues(&ctrl->ctrl);
>> @@ -1025,6 +1029,7 @@ static void nvme_rdma_teardown_io_queues(struct
>> nvme_rdma_ctrl *ctrl,
>> nvme_start_queues(&ctrl->ctrl);
>> nvme_rdma_destroy_io_queues(ctrl, remove);
>> }
>> + mutex_unlock(&ctrl->teardown_lock);
>> }
>> static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
>> @@ -2278,6 +2283,7 @@ static struct nvme_ctrl
>> *nvme_rdma_create_ctrl(struct device *dev,
>> return ERR_PTR(-ENOMEM);
>> ctrl->ctrl.opts = opts;
>> INIT_LIST_HEAD(&ctrl->list);
>> + mutex_init(&ctrl->teardown_lock);
>> if (!(opts->mask & NVMF_OPT_TRSVCID)) {
>> opts->trsvcid =
>
> Looks good - but....
>
> I hit this same issue on FC - I will need to post a similar path. My
> problem was that the reset/teardown path due to the timeout then raced
> with the error that the connect path saw for its io that dropped into
> the partial-teardown steps as connect backed-out. So I recommend
> looking at nvme_rdma_setup_ctrl() and any of it's teardown paths that
> don't have the mutex and may race with cases that are taking the mutex.
Goof point.
The synchronization is not really required for the entire teardown path,
because the delete_work and flushing the connect_work, and state machine
doesn't allow reset and reconnect to compete. So this synchronization is
really just against the timeout handler.
More information about the Linux-nvme
mailing list