[PATCH] nvme: test nvmet-wq sysfs interface
Chaitanya Kulkarni
chaitanyak at nvidia.com
Mon Nov 11 16:58:58 PST 2024
On 11/7/24 21:21, Shinichiro Kawasaki wrote:
> On Nov 04, 2024 / 11:29, Chaitanya Kulkarni wrote:
>> Add a test that randomly sets the cpumask from available CPUs for
>> the nvmet-wq while running the fio workload. This patch has been
>> tested on nvme-loop and nvme-tcp transport.
>>
>> Signed-off-by: Chaitanya Kulkarni <kch at nvidia.com>
>
> Thanks. As I noted in another mail, this test case generates a kernel
> INFO message, and it looks like catching a new bug. So I think this
> test case worth adding. Please find my review comments in line.
>
>> ---
>> tests/nvme/055 | 99 ++++++++++++++++++++++++++++++++++++++++++++++
>> tests/nvme/055.out | 3 ++
>> 2 files changed, 102 insertions(+)
>> create mode 100755 tests/nvme/055
>> create mode 100644 tests/nvme/055.out
>>
>> diff --git a/tests/nvme/055 b/tests/nvme/055
>> new file mode 100755
>> index 0000000..9fe27a3
>> --- /dev/null
>> +++ b/tests/nvme/055
>> @@ -0,0 +1,99 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0+
>> +# Copyright (C) 2024 Chaitanya Kulkarni
>> +#
>> +# Test nvmet-wq cpumask sysfs attribute with NVMe-oF and fio workload
>> +#
>> +
>> +. tests/nvme/rc
>> +
>> +DESCRIPTION="Test nvmet-wq cpumask sysfs attribute with fio on NVMe-oF device"
>> +TIMED=1
>> +
>> +requires() {
>> + _nvme_requires
>> + _have_fio && _have_loop
>
> Nit, a unneccesaary space.
>
>> + _require_nvme_trtype_is_fabrics
>> +}
>
done.
> I suggest to add set_conditions() here as below. Without it, the test case is
> skipped when users do not se NVMET_TRTYPES variable. If we add set_conditions(),
> _have_loop call in requires() can be removed.
>
> set_conditions() {
> _set_nvme_trtype "$@"
> }
>
done.
>> +
>> +cleanup_setup() {
>> + _nvme_disconnect_subsys
>> + _nvmet_target_cleanup
>> +}
>> +
>> +test() {
>> + local cpumask_path="/sys/devices/virtual/workqueue/nvmet-wq/cpumask"
>> + local original_cpumask
>> + local min_cpus
>> + local max_cpus
>> + local numbers
>> + local idx
>> + local ns
>> +
>> + echo "Running ${TEST_NAME}"
>> +
>> + _setup_nvmet
>> + _nvmet_target_setup
>> + _nvme_connect_subsys
>> +
>> + if [ ! -f "$cpumask_path" ]; then
>> + SKIP_REASONS+=("nvmet_wq cpumask sysfs attribute not found.")
>> + cleanup_setup
>> + return 1
>> + fi
>> +
>> + ns=$(_find_nvme_ns "${def_subsys_uuid}")
>> +
>> + original_cpumask=$(cat "$cpumask_path")
>> +
>> + num_cpus=$(nproc)
>> + max_cpus=$(( num_cpus < 20 ? num_cpus : 20 ))
>> + min_cpus=0
>> + #shellcheck disable=SC2207
>> + numbers=($(seq $min_cpus $max_cpus))
>
> Nit: the shellcheck error can be vaoided with this:
>
> read -a numbers -d '' < <(seq $min_cpus $max_cpus)
>
done.
>> +
>> + _run_fio_rand_io --filename="/dev/${ns}" --time_based --runtime=130s \
>
> The --time_based and --runtime=130s options are not required, because fio helper
> bash functions will add them.
>
> Instead, let's add this line before the fio command.
>
> : ${TIMEOUT:=60}
>
> The fio helper functions will refect this TIMEOUT value to the --runtime
> option. This will allow users to control runtime cost as TIMED=1 indicates.
>
done.
>> + --iodepth=8 --size="${NVME_IMG_SIZE}" &> "$FULL" &
>> +
>> + # Let the fio settle down else we will break in the loop for fio check
>> + sleep 1
>> + for ((i = 0; i < max_cpus; i++)); do
>> + if ! pgrep -x fio &> /dev/null ; then
>
> pgrep command is not in the GNU coreutils. I suggest to keep the fio process id
> in a variable and check with "kill -0" command. The test case block/005 does it.
>
thanks for the pointer, done and it looks much cleaner.
>> + break
>> + fi
>> +
>> + if [[ ${#numbers[@]} -eq 0 ]]; then
>> + break
>> + fi
>> +
>> + idx=$((RANDOM % ${#numbers[@]}))
>> +
>> + #shellcheck disable=SC2004
>> + cpu_mask=$(printf "%X" $((1 << ${numbers[idx]})))
>> + echo "$cpu_mask" > "$cpumask_path"
>
> When I ran this test case, I observed an error at this line:
>
> echo: write error: Value too large for defined data type
>
> I think the cpu_mask calculation is wrong, and it should be like this:
>
> cpu_mask=0
> for ((n = 0; n < numbers[idx]; n++)); do
> cpu_mask=$((cpu_mask + (1 << n)))
> done
> cpu_mask=$(printf "%X" $((cpu_mask)))
>
>
didn't see that error in my execution done.
This version tries to only use one CPU at a time, but with your
suggestion it will now consider more than one CPUs which I think is
useful too.
I modified the calculation to keep it simple for now.
>> + if [[ $(cat "$cpumask_path") =~ ^[0,]*${cpu_mask}\n$ ]]; then
>> + echo "Test Failed: cpumask was not set correctly "
>> + echo "Expected ${cpu_mask} found $(cat "$cpumask_path")"
>> + cleanup_setup
>> + return 1
>> + fi
>> + sleep 3
>> + # Remove the selected number
>> + numbers=("${numbers[@]:0:$idx}" "${numbers[@]:$((idx + 1))}")
>> + done
>> +
>> + killall fio &> /dev/null
>
> killall is not in GNU coreutils either (blktests already uses it at other
> places though...) Does "kill -9" work instead?
>
done.
Thanks for the review comments, I'll send it V2 soon.
-ck
More information about the Linux-nvme
mailing list