[PATCH] nvme: add nvme pci timeout testcase

Chaitanya Kulkarni chaitanyak at nvidia.com
Wed Jan 10 16:00:22 PST 2024


On 1/10/2024 4:23 AM, Shinichiro Kawasaki wrote:
> Thanks again for this test case. Please find my review comments. Tomorrow, I
> will try to run this test case.
> 
> On Jan 09, 2024 / 19:57, Chaitanya Kulkarni wrote:
>> Trigger and test nvme-pci timeout with concurrent fio jobs.
> 
> Is there any related kernel commit which motivated this test case? If so,
> can we add a kernel commit or e-mail discussion link as a reference?
> 

no, this part if never tested on the regular basis as it has some
complicated logic I believe it is much needed test ..

>>
>> Signed-off-by: Chaitanya Kulkarni <kch at nvidia.com>
>> ---
>>   tests/nvme/050     | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   tests/nvme/050.out |  2 ++
>>   2 files changed, 45 insertions(+)
>>   create mode 100755 tests/nvme/050
>>   create mode 100644 tests/nvme/050.out
>>
>> diff --git a/tests/nvme/050 b/tests/nvme/050
>> new file mode 100755
>> index 0000000..ba54bcd
>> --- /dev/null
>> +++ b/tests/nvme/050
>> @@ -0,0 +1,43 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-3.0+
>> +# Copyright (C) 2024 Chaitanya Kulkarni.
> 
> Nit: you may want to remove the last '.'

okay ...

> 
>> +#
>> +# Test NVMe-PCI timeout with FIO jobs by triggering the nvme_timeout function.
>> +#
>> +
>> +. tests/nvme/rc
>> +
>> +DESCRIPTION="test nvme-pci timeout with fio jobs."
> 
> To be consitent with other test cases, let's remove the last '.'.

okay ...

> 
>> +
>> +requires() {
>> +	_require_nvme_trtype pci
>> +	_have_fio
>> +	_nvme_requires
> 
> This test case depends on the kernel config FAIL_IO_TIMEOUT. Let's add
> 
>      _have_kernel_option FAIL_IO_TIMEOUT

indeed, I'll add that ...

> 
>> +}
>> +
>> +test() {
>> +	local dev="/dev/nvme0n1"
>> +
>> +	echo "Running ${TEST_NAME}"
>> +
>> +	echo 1 > /sys/block/nvme0n1/io-timeout-fail
>> +
>> +	echo 99 > /sys/kernel/debug/fail_io_timeout/probability
>> +	echo 10 > /sys/kernel/debug/fail_io_timeout/interval
>> +	echo -1 > /sys/kernel/debug/fail_io_timeout/times
>> +	echo  0 > /sys/kernel/debug/fail_io_timeout/space
>> +	echo  1 > /sys/kernel/debug/fail_io_timeout/verbose
> 
> To avoid impact on following test cases, I suggest to restore the original
> values of the fail_io_timeout/* sysfs attributes above at the end of this
> test case. _nvme_err_inject_setup() and _nvme_err_inject_cleanup() in
> test/nvme/rc do similar thing. We can add new helper functions in same manner.

I can add the code for config and restore, but at this point there are
no other testcases using this ?(correct me if I'm wrong), so instead of
bloating the code in nvme/rc let's keep it for this testcase only ?
and add common helpers code later once we have more users for this
case ?

> 
>> +
>> +	# shellcheck disable=SC2046
>> +	fio --bs=4k --rw=randread --norandommap --numjobs=$(nproc) \
> 
> Double quotes for "$(nproc)" will allow to remove the shellcheck exception,
> probably.

not sure I understand this comment, can you please elaborate ?

> 
>> +	    --name=reads --direct=1 --filename=${dev} --group_reporting \
>> +	    --time_based --runtime=1m 2>&1 | grep -q "Input/output error"
>> +
>> +	# shellcheck disable=SC2181
>> +	if [ $? -eq 0 ]; then
>> +		echo "Test complete"
>> +	else
>> +		echo "Test failed"
>> +	fi
>> +	modprobe -r nvme
> 
> If nvme driver is built-in, this unload command does not work.
> Do we really need to unload nvme driver here?
> 

Yes, post timeout handler execution we need to make sure that device
can be removed at the time of module unload, perhaps there is a way to
avoid this when nvme is a built-in module so that test will not execute
above ? any suggestions ?

-ck




More information about the Linux-nvme mailing list