blktests nvme/039 failure

alan.adamson at oracle.com alan.adamson at oracle.com
Mon Apr 10 16:06:50 PDT 2023


On 4/10/23 4:49 AM, Shin'ichiro Kawasaki wrote:
> Hello Alan,
>
> I noticed that recently nvme/039 fails on my system occasionally (around 40%).
> The failure messages are as follows:
>
> nvme/039 => nvme0n1 (test error logging)                     [failed]
>      runtime  0.176s  ...  0.167s
>      --- tests/nvme/039.out      2023-04-06 10:11:07.925670528 +0900
>      +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad   2023-04-10 20:15:07.679538017 +0900
>      @@ -1,5 +1,2 @@
>       Running nvme/039
>      - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
>      - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
>      - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
>       Test complete
>
> nvme/039 => nvme0n1 (test error logging)                     [failed]
>      runtime  0.167s  ...  0.199s
>      --- tests/nvme/039.out      2023-04-06 10:11:07.925670528 +0900
>      +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad   2023-04-10 20:15:09.114539650 +0900
>      @@ -1,5 +1,4 @@
>       Running nvme/039
>      - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
>        Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
>        Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
>       Test complete
>
> It looks that expected error messages were not reported.
>
> I suspect that the time duration is too short between error injection enable
> and I/O to trigger the error. With the one line change below to add wait after
> the error injection enable, the failures disappear. Do you think such wait is
> the valid fix?
>
>   tests/nvme/rc | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index 210a82a..7043c23 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -652,6 +652,7 @@ _nvme_enable_err_inject()
>           echo "$4" > /sys/kernel/debug/"$1"/fault_inject/dont_retry
>           echo "$5" > /sys/kernel/debug/"$1"/fault_inject/status
>           echo "$6" > /sys/kernel/debug/"$1"/fault_inject/times
> +	sleep 0.1
>   }
>   
>   _nvme_disable_err_inject()

I've been able to reproduce it.  The sleep .1 helps but doesn't 
eliminate the issue.  I did notice whenever there was a failure, there 
was also a "blk_print_req_error: 2 callbacks suppressed" in the log 
which would break the parsing the test needs to do.


Alan





More information about the Linux-nvme mailing list