Should NVME_SC_INVALID_NS be translated to BLK_STS_IOERR instead of BLK_STS_NOTSUPP so that multipath(both native and dm) can failover on the failure?

Sagi Grimberg sagi at grimberg.me
Mon Apr 22 02:47:36 PDT 2024



On 12/04/2024 11:57, Sagi Grimberg wrote:
>
>
> On 12/04/2024 10:52, Jirong Feng wrote:
>>> So essentially there is no need for the host side patch? 
>>> interesting. Are you sure?
>>
>> At least no failure is observed in a newer version(6.6.0) of kernel 
>> so far. I can only tell that I've tested it for hundreds of times.
>>
>> In addition, I've got some scripts to enable/disable it continually, 
>> we can observe it a few more days.
>>
>>
>>> Can you please also try with mpath iopolicy=round-robin?
>> All my previous tests were done with round-robin. I retested again 
>> today both round-robin and numa, the results are still the same.
>>
>>
>>> I'm asking because I cannot understand what is preventing this path 
>>> from being selected again and
>>> again for I/O....
>>
>> Perhaps we need to dive into the code of old 
>> version(4.18.0-147.3.1.el8_1) and see what's different?
>>
>> Or should I try apply the host side patch to the old version and test 
>> again?
>
> What I think you want is to trace if the path where you disabled the 
> namespace is actually being selected
> over and over again, and failed over...
>
> Can you please activate tracing and see where your mpath commands are 
> actually going from?
>
> I'd trace nvme_setup_cmd, and see that once you disable one nvmet ns, 
> it is not selected by the mpath
> namespace as a valid ns.

Any update on this?



More information about the Linux-nvme mailing list