[PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
Hannes Reinecke
hare at suse.de
Thu Sep 29 23:29:54 PDT 2022
On 9/28/22 15:58, Sagi Grimberg wrote:
> When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
> it is possible that during the path revalidation, the only paths that is IO
> capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet
> informed to the host, which will cause inflight requests to be requeued (as we
> have available paths but none are IO capable). These requests on the requeue
> list are waiting for someone to resubmit them at some point.
>
> The IO capable paths will eventually notify the ns resize change to the
> host, but there is nothing that will kick the requeue list to resubmit the
> queued requests.
>
> Fix this by always kicking the requeue list, and if no IO capable path exists,
> these requests will be queued again.
>
> A typical log that indicates that IOs are requeued:
> --
> nvme nvme1: creating 4 I/O queues.
> nvme nvme1: new ctrl: "testnqn1"
> nvme nvme2: creating 4 I/O queues.
> nvme nvme2: mapped 4/0/0 default/read/poll queues.
> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
> nvme nvme1: rescanning namespaces.
> nvme1n1: detected capacity change from 2097152 to 4194304
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> nvme nvme2: rescanning namespaces.
> --
>
> Reported-by: Yogev Cohen <yogev at lightbitslabs.com>
> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> I was easily capable to reproduce this regression with a small debug patch to nvmet
> to have a 1 second delay between controller AENs:
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> #include "trace.h"
>
> #include "nvmet.h"
> +#include <linux/delay.h>
>
> struct workqueue_struct *buffered_io_wq;
> struct workqueue_struct *zbd_wq;
> @@ -248,6 +249,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
> nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
> NVME_AER_NOTICE_NS_CHANGED,
> NVME_LOG_CHANGED_NS);
> + msleep(1000);
> }
> }
>
> And expose a subsystem via two ports, one ANA 'optimized', and one ANA 'inaccessible'.
>
> Then test file-backed ns resize duing I/O:
> truncate --size=2G /tmp/f && echo 1 > /sys/kernel/config/nvmet/subsystems/testnqn1/namespaces/1/revalidate_size
>
> drivers/nvme/host/multipath.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 6ef497c75a16..d532a78f24a2 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
> void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
> {
> struct nvme_ns_head *head = ns->head;
> + struct nvme_ns *n;
> sector_t capacity = get_capacity(head->disk);
> int node;
>
> - list_for_each_entry_rcu(ns, &head->list, siblings) {
> - if (capacity != get_capacity(ns->disk))
> - clear_bit(NVME_NS_READY, &ns->flags);
> + list_for_each_entry_rcu(n, &head->list, siblings) {
> + if (capacity != get_capacity(n->disk))
> + clear_bit(NVME_NS_READY, &n->flags);
> }
>
> for_each_node(node)
> rcu_assign_pointer(head->current_path[node], NULL);
> + nvme_kick_requeue_lists(ns->ctrl);
> }
>
> static bool nvme_path_is_disabled(struct nvme_ns *ns)
Hmm. I guess the real issue is that we use 'ns' as both function
argument _and_ iterator, so that after 'list_for_each_rcu()' 'ns' is
pointing to a _different_ namespace, such that we end up kicking that
namespace and not the one used as argument.
But in either case, patch is fine.
Reviewed-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
More information about the Linux-nvme
mailing list