[RFC] nvme-mpath: delete disk after last connection
Hannes Reinecke
hare at suse.de
Sat Sep 26 07:16:51 EDT 2020
On 9/25/20 11:38 PM, Keith Busch wrote:
> I have this tagged "RFC" because I'm not sure if there's a reason why the code
> is done the way that it is today.
>
> The multipath code currently deletes the disk only after all references
> to it are dropped rather than when the last path to that disk is lost.
> This has been reported to cause problems with some usage, like MD RAID.
>
> Delete the disk when the last path is gone. This is the same behavior we
> currently have with non-multipathed nvme devices.
>
> The following is just a simple example that demonstrates what is currently
> observed using a simple nvme loop back (loop setup file not shown):
>
> # nvmetcli restore loop.json
> [ 31.156452] nvmet: adding nsid 1 to subsystem testnqn1
> [ 31.159140] nvmet: adding nsid 1 to subsystem testnqn2
>
> # nvme connect -t loop -n testnqn1 -q hostnqn
> [ 36.866302] nvmet: creating controller 1 for subsystem testnqn1 for NQN hostnqn.
> [ 36.872926] nvme nvme3: new ctrl: "testnqn1"
>
> # nvme connect -t loop -n testnqn1 -q hostnqn
> [ 38.227186] nvmet: creating controller 2 for subsystem testnqn1 for NQN hostnqn.
> [ 38.234450] nvme nvme4: new ctrl: "testnqn1"
>
> # nvme connect -t loop -n testnqn2 -q hostnqn
> [ 43.902761] nvmet: creating controller 3 for subsystem testnqn2 for NQN hostnqn.
> [ 43.907401] nvme nvme5: new ctrl: "testnqn2"
>
> # nvme connect -t loop -n testnqn2 -q hostnqn
> [ 44.627689] nvmet: creating controller 4 for subsystem testnqn2 for NQN hostnqn.
> [ 44.641773] nvme nvme6: new ctrl: "testnqn2"
>
> # mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/nvme3n1 /dev/nvme5n1
> [ 53.497038] md/raid1:md0: active with 2 out of 2 mirrors
> [ 53.501717] md0: detected capacity change from 0 to 66060288
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 nvme5n1[1] nvme3n1[0]
> 64512 blocks super 1.2 [2/2] [UU]
>
> Now delete all paths to one of the namespaces:
>
> # echo 1 > /sys/class/nvme/nvme3/delete_controller
> # echo 1 > /sys/class/nvme/nvme4/delete_controller
>
> We have no path, but mdstat says:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active (auto-read-only) raid1 nvme5n1[1]
> 64512 blocks super 1.2 [2/1] [_U]
>
> And this is reported to cause a problem.
>
> With the proposed patch, the following messages appear:
>
> [ 227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device.
> [ 227.516807] md/raid1:md0: Operation continuing on 1 devices.
>
> And mdstat shows only the viable members:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md0 : active (auto-read-only) raid1 nvme5n1[1]
> 64512 blocks super 1.2 [2/1] [_U]
>
> Reported-by: Hannes Reinecke <hare at suse.de>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> drivers/nvme/host/core.c | 3 ++-
> drivers/nvme/host/multipath.c | 1 -
> drivers/nvme/host/nvme.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 4857168f71f2..a2faa3625e39 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -475,7 +475,8 @@ static void nvme_free_ns_head(struct kref *ref)
> struct nvme_ns_head *head =
> container_of(ref, struct nvme_ns_head, ref);
>
> - nvme_mpath_remove_disk(head);
> + if (head->disk)
> + put_disk(head->disk);
> ida_simple_remove(&head->subsys->ns_ida, head->instance);
> cleanup_srcu_struct(&head->srcu);
> nvme_put_subsystem(head->subsys);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 74896be40c17..55045291b4de 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -697,7 +697,6 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
> */
> head->disk->queue = NULL;
> }
> - put_disk(head->disk);
> }
>
> int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a42b75869213..745cda1a63fd 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -670,7 +670,7 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
> struct nvme_ns_head *head = ns->head;
>
> if (head->disk && list_empty(&head->list))
> - kblockd_schedule_work(&head->requeue_work);
> + nvme_mpath_remove_disk(head);
> }
>
> static inline void nvme_trace_bio_complete(struct request *req,
>
I'm okay with that in general, but then again we might run into
situations where an 'all paths down' scenario is actually expected
(think of a temporary network outage on nvme-tcp).
So I guess we need to introduce an additional setting
(queue_if_no_path?) to be specified during the initial connection.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list