[PATCH blktests V2 1/1] nvme: add nvme pci timeout testcase
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Fri Jan 19 03:11:00 PST 2024
Hi Chaitanya, thanks for this v2 patch.
I ran this new test case with several devices and observed these two:
1) The test case fails with my QEMU NMVE devices. It looks all of the inject
failures get recovered by error handler, then no I/O error is reported to the
fio process. I modified two fail_io_timeout parameters as follows to inject
more failures, and saw the test case fails with the device. I suggest these
parameters.
@@ -49,8 +50,8 @@ test_device() {
save_fi_settings
echo 1 > /sys/block/"${nvme_ns}"/io-timeout-fail
- echo 99 > /sys/kernel/debug/fail_io_timeout/probability
- echo 10 > /sys/kernel/debug/fail_io_timeout/interval
+ echo 100 > /sys/kernel/debug/fail_io_timeout/probability
+ echo 1 > /sys/kernel/debug/fail_io_timeout/interval
echo -1 > /sys/kernel/debug/fail_io_timeout/times
echo 0 > /sys/kernel/debug/fail_io_timeout/space
echo 1 > /sys/kernel/debug/fail_io_timeout/verbose
2) After I ran the test case, I see the test target device left with zero
size. I think this is consistent as your report [*]. I still think device
remove and rescan at the test case end is required to avoid impacts on
following test cases. It will depend on the discussion for your report,
though.
[*] https://lore.kernel.org/linux-nvme/287b9ed9-6eb3-46d0-a6f0-a9d283245b90@nvidia.com/
Please find my some more comments in line:
On Jan 16, 2024 / 20:22, Chaitanya Kulkarni wrote:
> Trigger and test nvme-pci timeout with concurrent fio jobs.
>
> Signed-off-by: Chaitanya Kulkarni <kch at nvidia.com>
> ---
> tests/nvme/050 | 74 ++++++++++++++++++++++++++++++++++++++++++++++
> tests/nvme/050.out | 2 ++
> 2 files changed, 76 insertions(+)
> create mode 100755 tests/nvme/050
> create mode 100644 tests/nvme/050.out
>
> diff --git a/tests/nvme/050 b/tests/nvme/050
> new file mode 100755
> index 0000000..6c44b43
> --- /dev/null
> +++ b/tests/nvme/050
> @@ -0,0 +1,74 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2024 Chaitanya Kulkarni
> +#
> +# Test NVMe-PCI timeout with FIO jobs by triggering the nvme_timeout function.
> +#
> +
> +. tests/nvme/rc
> +
> +DESCRIPTION="test nvme-pci timeout with fio jobs"
I ran this test case a ZNS drive today, and it looks working. I suggest to add:
CAN_BE_ZONED=1
> +
> +sysfs_path="/sys/kernel/debug/fail_io_timeout/"
> +#restrict test to nvme-pci only
> +nvme_trtype=pci
> +
> +# fault injection config array
> +declare -A fi_array
> +
> +requires() {
> + _require_nvme_trtype pci
We can remove this line, since we added "nvme_trtype=pci" above.
> + _have_fio
> + _nvme_requires
> + _have_kernel_option FAIL_IO_TIMEOUT
Today, I found that this test case depends on another kernel option. Let's add,
_have_kernel_option FAULT_INJECTION_DEBUG_FS
> +}
> +
> +device_requires() {
> + _require_test_dev_is_nvme
> +}
Actually, this check is not required here, since it is already done in
group_device_requires() in tests/nvme/rc. It is a small left work to remove
the same device_requires() in nvme/032.
> +
> +save_fi_settings() {
> + for fi_attr in probability interval times space verbose
> + do
> + fi_array["${fi_attr}"]=$(cat "${sysfs_path}/${fi_attr}")
> + done
> +}
> +
> +restore_fi_settings() {
> + for fi_attr in probability interval times space verbose
> + do
> + echo "${fi_array["${fi_attr}"]}" > "${sysfs_path}/${fi_attr}"
> + done
> +}
> +
> +test_device() {
> + local nvme_ns
> + local io_fimeout_fail
> +
> + echo "Running ${TEST_NAME}"
> +
> + nvme_ns="$(basename "${TEST_DEV}")"
> + io_fimeout_fail="$(cat /sys/block/"${nvme_ns}"/io-timeout-fail)"
> + save_fi_settings
> + echo 1 > /sys/block/"${nvme_ns}"/io-timeout-fail
> +
> + echo 99 > /sys/kernel/debug/fail_io_timeout/probability
> + echo 10 > /sys/kernel/debug/fail_io_timeout/interval
> + echo -1 > /sys/kernel/debug/fail_io_timeout/times
> + echo 0 > /sys/kernel/debug/fail_io_timeout/space
> + echo 1 > /sys/kernel/debug/fail_io_timeout/verbose
> +
> + fio --bs=4k --rw=randread --norandommap --numjobs="$(nproc)" \
> + --name=reads --direct=1 --filename="${TEST_DEV}" --group_reporting \
> + --time_based --runtime=1m 2>&1 | grep -q "Input/output error"
When I investigated the failure cause of this test case, I found fio output
is useful. So, I suggest to keep the output in $FULL.
--time_based --runtime=1m >&"$FULL"
> +
> + # shellcheck disable=SC2181
> + if [ $? -eq 0 ]; then
With that $FULL file, the if statement can be as follows. And we don't need to
disable SC2181 either.
if grep -q "Input/output error" "$FULL"; then
> + echo "Test complete"
> + else
> + echo "Test failed"
> + fi
> +
> + restore_fi_settings
> + echo "${io_fimeout_fail}" > /sys/block/"${nvme_ns}"/io-timeout-fail
> +}
> diff --git a/tests/nvme/050.out b/tests/nvme/050.out
> new file mode 100644
> index 0000000..b78b05f
> --- /dev/null
> +++ b/tests/nvme/050.out
> @@ -0,0 +1,2 @@
> +Running nvme/050
> +Test complete
> --
> 2.40.0
>
More information about the Linux-nvme
mailing list