nvme-pci timeout issue

Chaitanya Kulkarni chaitanyak at nvidia.com
Tue Jan 23 14:19:28 PST 2024


Sagi/Christoph/Jens/Keith,

On 1/10/24 19:54, Chaitanya Kulkarni wrote:
> Hi all,
>
> After running the test that triggers the nvme timeout for nvme-pci, the
> device under test is lingering in an inconsistent state. Here are steps :-
>
> 1. Load the driver.
> 2. Trigger nvme_timeout.
> 3. After timeout handler gets triggered it fails the application
>      with I/O error.
> 4. lsblk and nvme listdoesn't show the device anymore.
> 5. ls does show the device.
> 6. Any write to it fails (e.g. dd) since device has 0 capacity.
>
> Is this accepted behavior? if it is then a malfunctioning device is
> lingering in the system and applications end up accessing it as if
> it is functioning properly. Can we avoid this scenario?
>
> How about we remove the device from the system? If you're all okay with
> it, I'll send a patch for the nvme-pci timeout to remove the device that
> has zero capacity. Otherwise please suggest how to deal with this scenario.
>
> With this confusing behavior I'm not entirely sure what is the expected
> scenario to pass timeout testcase.
>
> Please have a look at detailed test log below.
>
> -ck
>

Can someone please provide an insight on this behavior so we can merge 
testcase into blktests? Please note that Shinichiro also observed the 
same behavior.

-ck




More information about the Linux-nvme mailing list