Should NVME_SC_INVALID_NS be translated to BLK_STS_IOERR instead of BLK_STS_NOTSUPP so that multipath(both native and dm) can failover on the failure?

Sagi Grimberg sagi at grimberg.me
Fri Apr 12 01:57:47 PDT 2024



On 12/04/2024 10:52, Jirong Feng wrote:
>> So essentially there is no need for the host side patch? interesting. 
>> Are you sure?
>
> At least no failure is observed in a newer version(6.6.0) of kernel so 
> far. I can only tell that I've tested it for hundreds of times.
>
> In addition, I've got some scripts to enable/disable it continually, 
> we can observe it a few more days.
>
>
>> Can you please also try with mpath iopolicy=round-robin?
> All my previous tests were done with round-robin. I retested again 
> today both round-robin and numa, the results are still the same.
>
>
>> I'm asking because I cannot understand what is preventing this path 
>> from being selected again and
>> again for I/O....
>
> Perhaps we need to dive into the code of old 
> version(4.18.0-147.3.1.el8_1) and see what's different?
>
> Or should I try apply the host side patch to the old version and test 
> again?

What I think you want is to trace if the path where you disabled the 
namespace is actually being selected
over and over again, and failed over...

Can you please activate tracing and see where your mpath commands are 
actually going from?

I'd trace nvme_setup_cmd, and see that once you disable one nvmet ns, it 
is not selected by the mpath
namespace as a valid ns.



More information about the Linux-nvme mailing list