[PATCH blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device
Nilay Shroff
nilay at linux.ibm.com
Tue Sep 24 22:55:47 PDT 2024
On 9/24/24 17:44, Shinichiro Kawasaki wrote:
> On Sep 24, 2024 / 14:18, Nilay Shroff wrote:
>> Avoid waiting indefinitely for nvme passthru namespace block device
>> to appear. Wait for up to 5 seconds and during this time if namespace
>> block device doesn't appear then bail out and FAIL the test.
>>
>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>> ---
>> Hi,
>>
>> You may find more details about this issue here[1].
>>
>> I found that blktest nvme/033-037 hangs indefinitely when
>> kernel rejects the passthru target namespace due to the
>> duplicate IDs. This patch helps address this issue by
>> ensuring that we bail out and fail the test if for any
>> reason passthru target namspace is not created on the
>> host. The relevant kernel patchv2 to fix the issue with
>> duplicate IDs while using passthru loop target can be
>> found here[2].
>>
>> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/
>> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/
>>
>> Thanks!
>
> Thanks for the patch :) It's bad that you stumbled into the infinite while loop
> in _nvmet_passthru_target_connect(). I agree that it should be fixed.
>
> Please find my comments in line.
>
Thank you for your review comments!!
>> ---
>> tests/nvme/033 | 7 +++++--
>> tests/nvme/034 | 7 +++++--
>> tests/nvme/035 | 6 +++---
>> tests/nvme/036 | 14 ++++++++------
>> tests/nvme/037 | 6 +++++-
>> tests/nvme/rc | 12 +++++++++++-
>> 6 files changed, 37 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/nvme/033 b/tests/nvme/033
>> index 5e05175..171974e 100755
>> --- a/tests/nvme/033
>> +++ b/tests/nvme/033
>> @@ -62,8 +62,11 @@ test_device() {
>> _nvmet_passthru_target_setup
>>
>> nsdev=$(_nvmet_passthru_target_connect)
>> -
>> - compare_dev_info "${nsdev}"
>> + if [[ -z "$nsdev" ]]; then
>> + echo "FAIL"
>
> I think some more specific failure message will be useful. How about
> "Failed to connect" or so? Same comment for the other test cases 034-037.
Agreed, I will add a meaningful error message in next patch.
>
>> + else
>> + compare_dev_info "${nsdev}"
>> + fi
>>
>> _nvme_disconnect_subsys
>> _nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/034 b/tests/nvme/034
>> index 154fc91..7625204 100755
>> --- a/tests/nvme/034
>> +++ b/tests/nvme/034
>> @@ -32,8 +32,11 @@ test_device() {
>>
>> _nvmet_passthru_target_setup
>> nsdev=$(_nvmet_passthru_target_connect)
>> -
>> - _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> + if [[ -z "$nsdev" ]]; then
>> + echo "FAIL"
>> + else
>> + _run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> + fi
>>
>> _nvme_disconnect_subsys
>> _nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/035 b/tests/nvme/035
>> index ff217d6..6ad9c56 100755
>> --- a/tests/nvme/035
>> +++ b/tests/nvme/035
>> @@ -30,13 +30,13 @@ test_device() {
>>
>> _setup_nvmet
>>
>> - local ctrldev
>
> Good catch :)
>
>> local nsdev
>>
>> _nvmet_passthru_target_setup
>> nsdev=$(_nvmet_passthru_target_connect)
>> -
>> - if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>> + if [[ -z "$nsdev" ]]; then
>> + echo "FAIL"
>> + elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>> echo "FAIL: fio verify failed"
>> fi
>>
>> diff --git a/tests/nvme/036 b/tests/nvme/036
>> index 442ffe7..a67ca12 100755
>> --- a/tests/nvme/036
>> +++ b/tests/nvme/036
>> @@ -30,13 +30,15 @@ test_device() {
>
> This could be a good chance to add "local nsdev".
Yeah will do.
>
>>
>> _nvmet_passthru_target_setup
>> nsdev=$(_nvmet_passthru_target_connect)
>> -
>> - ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> -
>> - if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> - echo "ERROR: reset failed"
>> + if [[ -z "$nsdev" ]]; then
>> + echo "FAIL"
>> + else
>> + ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> +
>> + if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> + echo "ERROR: reset failed"
>> + fi
>> fi
>> -
>
> Nit: unnecessary change here.
Okay, will remove this.
>
>> _nvme_disconnect_subsys
>> _nvmet_passthru_target_cleanup
>>
>> diff --git a/tests/nvme/037 b/tests/nvme/037
>> index f7ddc2d..f0c8a77 100755
>> --- a/tests/nvme/037
>> +++ b/tests/nvme/037
>> @@ -27,7 +27,6 @@ test_device() {
>>
>> local subsys="blktests-subsystem-"
>> local iterations=10
>> - local ctrldev
>
> Good catch. And we can add "local nsdev" here.
yes will do.
>
>>
>> for ((i = 0; i < iterations; i++)); do
>> _nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
>> @@ -37,6 +36,11 @@ test_device() {
>> _nvme_disconnect_subsys \
>> --subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
>> _nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
>> +
>> + if [[ -z "$nsdev" ]]; then
>> + echo "FAIL"
>> + break
>> + fi
>> done
>>
>> echo "Test complete"
>> diff --git a/tests/nvme/rc b/tests/nvme/rc
>> index a877de3..3def0d0 100644
>> --- a/tests/nvme/rc
>> +++ b/tests/nvme/rc
>> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() {
>>
>> _nvmet_passthru_target_connect() {
>> local subsysnqn="$def_subsysnqn"
>> + local timeout="5"
>>
>> while [[ $# -gt 0 ]]; do
>> case $1 in
>> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() {
>> # The following tests can race with the creation
>> # of the device so ensure the block device exists
>> # before continuing
>> - while [ ! -b "${nsdev}" ]; do sleep 1; done
>> + start_time=$(date +%s)
>> + while [ ! -b "${nsdev}" ]; do
>> + sleep .1
>
> Is there any reason to have 0.1 second wait duration? When I changed this to
> "sleep 1", runtime of the test cases did not change on my test system.
>
yes you're right there won't be any noticeable change in the runtime of
the test unless we run this test in a loop for multiple number of times.
I would say that the gain would be only a fraction of a second in this case.
So I will use sleep 1 instead of .1 as you suggested.
>> + end_time=$(date +%s)
>> + if ((end_time - start_time > timeout)); then
>> + echo ""
>
> This echo doesn't look required.
>
>> + return 1
>> + fi
>
> If 1 second wait is okay instead of 0.1 second wait, the if block above can be
> a bit simpler, like,
>
> if ((++count >= timeout)); then
> return 1
> fi
>
> where, "local count=0" should be declared before.
>
yeah with sleep 1 we can make it simple. I will update this in next patch.
>> + done
>>
>> echo "${nsdev}"
>> + return 0
>
> This "return 0" looks redundant. The previous echo succeeds always, then 0 is
> returned anyway. Also, all of the callers check the failure of this function by
> referring to the nsdev string echoed. They do not refer to the return code. So,
> it is not so useful to explicitly show that 0 is returned on success.
>
yes I will incorporate it in the next patch.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list