[PATCH 1/3] nvme-core: improve avoiding false remove namespace
Sagi Grimberg
sagi at grimberg.me
Thu Aug 20 00:33:22 EDT 2020
> nvme_revalidate_disk translate return error to 0 if it is not a fatal
> error, thus avoid false remove namespace. If return error less than 0,
> now only ENOMEM be translated to 0, but other error except ENODEV,
> such as EAGAIN or EBUSY etc, also need translate to 0.
> Another reason for improving the error translation: If request timeout
> when connect, __nvme_submit_sync_cmd will return
> NVME_SC_HOST_ABORTED_CMD(>0). At this time, should terminate the
> connect process, but falsely continue the connect process,
> this may cause deadlock. Many functions which call
> __nvme_submit_sync_cmd treat error code(> 0) as target not support and
> continue, but NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR both
> are cancled io by host, to fix this bug, we need set the flag:
> NVME_REQ_CANCELLED, thus __nvme_submit_sync_cmd will translate return
> error to INTR. This is conflict with error translation of
> nvme_revalidate_disk, may cause false remove namespace.
>
> Signed-off-by: Chao Leng <lengchao at huawei.com>
> ---
> drivers/nvme/host/core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 88cff309d8e4..43ac8a1ad65d 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2130,10 +2130,10 @@ static int _nvme_revalidate_disk(struct gendisk *disk)
> * Only fail the function if we got a fatal error back from the
> * device, otherwise ignore the error and just move on.
> */
> - if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR)))
> - ret = 0;
> - else if (ret > 0)
> + if (ret > 0 && (ret & NVME_SC_DNR))
> ret = blk_status_to_errno(nvme_error_status(ret));
> + else if (ret != -ENODEV)
> + ret = 0;
> return ret;
We really need to take a step back here, I really don't like how
we are growing implicit assumptions on how statuses are interpreted.
Why don't we remove the -ENODEV error propagation back and instead
take care of it in the specific call-sites where we want to ignore
errors with proper quirks?
More information about the Linux-nvme
mailing list