[PATCH 5/7] nvme/nvme-fabrics: introduce nvmf_error_recovery_work API
Hannes Reinecke
hare at suse.de
Tue Oct 19 06:34:37 PDT 2021
On 10/18/21 3:40 PM, Max Gurtovoy wrote:
> Error recovery work is duplicated in RDMA and TCP transports. Move this
> logic to common code. For that, introduce 2 new ctrl ops to teardown IO
> and admin queue.
>
> Also update the RDMA/TCP transport drivers to use this API and remove
> the duplicated code.
>
> Reviewed-by: Israel Rukshin <israelr at nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com>
> ---
> drivers/nvme/host/fabrics.c | 23 +++++++++++++++
> drivers/nvme/host/fabrics.h | 1 +
> drivers/nvme/host/nvme.h | 4 +++
> drivers/nvme/host/rdma.c | 56 ++++++++++++++++---------------------
> drivers/nvme/host/tcp.c | 56 +++++++++++++++----------------------
> 5 files changed, 75 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 2edd086fa922..544195369c97 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -493,6 +493,29 @@ void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl)
> }
> EXPORT_SYMBOL_GPL(nvmf_reconnect_or_remove);
>
> +void nvmf_error_recovery_work(struct work_struct *work)
> +{
> + struct nvme_ctrl *ctrl = container_of(work,
> + struct nvme_ctrl, err_work);
> +
> + nvme_stop_keep_alive(ctrl);
> + ctrl->ops->teardown_ctrl_io_queues(ctrl);
> + /* unquiesce to fail fast pending requests */
> + nvme_start_queues(ctrl);
> + ctrl->ops->teardown_ctrl_admin_queue(ctrl);
> + blk_mq_unquiesce_queue(ctrl->admin_q);
> +
> + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> + /* state change failure is ok if we started ctrl delete */
> + WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> + ctrl->state != NVME_CTRL_DELETING_NOIO);
> + return;
> + }
> +
> + nvmf_reconnect_or_remove(ctrl);
> +}
> +EXPORT_SYMBOL_GPL(nvmf_error_recovery_work);
> +
> void nvmf_error_recovery(struct nvme_ctrl *ctrl)
> {
> if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index 3d8ec7133fc8..8655eff74ed0 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -190,6 +190,7 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
> bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
> void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl);
> void nvmf_error_recovery(struct nvme_ctrl *ctrl);
> +void nvmf_error_recovery_work(struct work_struct *work);
> bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
> struct nvmf_ctrl_options *opts);
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index f9e1ce93d61d..1573edf6e97f 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -493,6 +493,10 @@ struct nvme_ctrl_ops {
> void (*submit_async_event)(struct nvme_ctrl *ctrl);
> void (*delete_ctrl)(struct nvme_ctrl *ctrl);
> int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
> +
> + /* Fabrics only */
> + void (*teardown_ctrl_io_queues)(struct nvme_ctrl *ctrl);
> + void (*teardown_ctrl_admin_queue)(struct nvme_ctrl *ctrl);
> };
>
> /*
As noted by Sagi, this abstraction really is awkward.
Why not have
void (*teardown_ctrl_queues)(struct nvme_ctrl *ctrl, int qid);
and let the callback figure out whether it's for the admin queue
(ie qid == 8) or the normal I/O queue?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list