[PATCH 1/3] nvme-core: improve avoiding false remove namespace

Chao Leng lengchao at huawei.com
Thu Aug 20 21:36:02 EDT 2020



On 2020/8/20 23:44, Sagi Grimberg wrote:
> 
>>> We really need to take a step back here, I really don't like how
>>> we are growing implicit assumptions on how statuses are interpreted.
>>>
>>> Why don't we remove the -ENODEV error propagation back and instead
>>> take care of it in the specific call-sites where we want to ignore
>>> errors with proper quirks?
>>
>> So the one thing I'm not even sure about is if just ignoring the
>> errors was a good idea to start with.  They obviously are if we just
>> did a rescan and did run into an error while rescanning a namespace
>> that didn't change.  But what if it actually did change?
> 
> Right, we don't know, so if we failed without DNR, we assume that
> we will retry again and ignore the error. The assumption is that
> we will retry when we will reconnect as we don't have a retry mechanism
> for these requests.
Except DNR or ENODEV, we can not know namespace change or not. This is
a low-probability event. In accordance with the principle of minor
influence and no suspicion, we assume namespace not change, maybe
a good choice. If the namespace is changed, the corresponding processing
is performed during the access, which does not cause any problem.
> 
>> So I think a logic like in this patch kinda makes sense, but I think
>> we also need to retry and scan again on these kinds of errors.
> 
> So you are OK with keeping nvme_submit_sync_cmd returning -ENODEV for
> cancelled requests and have the scan flow assume that these are
> cancelled requests?
> 
> At the very least we need a good comment to say what is going on there.
> 
>    Btw,
>> did you ever actually see -ENOMEM in practice?  With the small
>> allocations that we do it really should not happen normally, so
>> special casing for it always felt a little strange.
Agree.
Not only ENOMEM, If we do not know namespace change or not, assume
namespace not change maybe a good choice.
> 
> Never seen it, it's there just because we have allocations in the path.
> 
>> FYI, I've started rebasing various bits of work I've done to start
>> untangling the mess.  Here is my current WIP, which in this form
>> is completely untested:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/nvme-scanning-cleanup
> 
> This does not yet contain sorting out what is discussed here correct?
> .



More information about the Linux-nvme mailing list