[PATCH blktests] nvme/{033-037}: timeout while waiting for nvme passthru namespace device

Nilay Shroff nilay at linux.ibm.com
Tue Sep 24 22:55:47 PDT 2024



On 9/24/24 17:44, Shinichiro Kawasaki wrote:
> On Sep 24, 2024 / 14:18, Nilay Shroff wrote:
>> Avoid waiting indefinitely for nvme passthru namespace block device
>> to appear. Wait for up to 5 seconds and during this time if namespace
>> block device doesn't appear then bail out and FAIL the test.
>>
>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>> ---
>> Hi,
>>
>> You may find more details about this issue here[1]. 
>>
>> I found that blktest nvme/033-037 hangs indefinitely when 
>> kernel rejects the passthru target namespace due to the 
>> duplicate IDs. This patch helps address this issue by  
>> ensuring that we bail out and fail the test if for any 
>> reason passthru target namspace is not created on the 
>> host. The relevant kernel patchv2 to fix the issue with 
>> duplicate IDs while using passthru loop target can be
>> found here[2].
>>
>> [1]: https://lore.kernel.org/all/8b17203f-ea4b-403b-a204-4fbc00c261ca@linux.ibm.com/
>> [2]: https://lore.kernel.org/all/20240921070547.531991-1-nilay@linux.ibm.com/
>>
>> Thanks!
> 
> Thanks for the patch :) It's bad that you stumbled into the infinite while loop
> in _nvmet_passthru_target_connect(). I agree that it should be fixed.
> 
> Please find my comments in line.
> 
Thank you for your review comments!!

>> ---
>>  tests/nvme/033 |  7 +++++--
>>  tests/nvme/034 |  7 +++++--
>>  tests/nvme/035 |  6 +++---
>>  tests/nvme/036 | 14 ++++++++------
>>  tests/nvme/037 |  6 +++++-
>>  tests/nvme/rc  | 12 +++++++++++-
>>  6 files changed, 37 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/nvme/033 b/tests/nvme/033
>> index 5e05175..171974e 100755
>> --- a/tests/nvme/033
>> +++ b/tests/nvme/033
>> @@ -62,8 +62,11 @@ test_device() {
>>  	_nvmet_passthru_target_setup
>>  
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	compare_dev_info "${nsdev}"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
> 
> I think some more specific failure message will be useful. How about
> "Failed to connect" or so? Same comment for the other test cases 034-037.
Agreed, I will add a meaningful error message in next patch.
> 
>> +	else
>> +		compare_dev_info "${nsdev}"
>> +	fi
>>  
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/034 b/tests/nvme/034
>> index 154fc91..7625204 100755
>> --- a/tests/nvme/034
>> +++ b/tests/nvme/034
>> @@ -32,8 +32,11 @@ test_device() {
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	else
>> +		_run_fio_verify_io --size="${NVME_IMG_SIZE}" --filename="${nsdev}"
>> +	fi
>>  
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>> diff --git a/tests/nvme/035 b/tests/nvme/035
>> index ff217d6..6ad9c56 100755
>> --- a/tests/nvme/035
>> +++ b/tests/nvme/035
>> @@ -30,13 +30,13 @@ test_device() {
>>  
>>  	_setup_nvmet
>>  
>> -	local ctrldev
> 
> Good catch :)
> 
>>  	local nsdev
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	if ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	elif ! _xfs_run_fio_verify_io "${nsdev}" "${NVME_IMG_SIZE}"; then
>>  		echo "FAIL: fio verify failed"
>>  	fi
>>  
>> diff --git a/tests/nvme/036 b/tests/nvme/036
>> index 442ffe7..a67ca12 100755
>> --- a/tests/nvme/036
>> +++ b/tests/nvme/036
>> @@ -30,13 +30,15 @@ test_device() {
> 
> This could be a good chance to add "local nsdev".
Yeah will do.
> 
>>  
>>  	_nvmet_passthru_target_setup
>>  	nsdev=$(_nvmet_passthru_target_connect)
>> -
>> -	ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> -
>> -	if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> -		echo "ERROR: reset failed"
>> +	if [[ -z "$nsdev" ]]; then
>> +		echo "FAIL"
>> +	else
>> +		ctrldev=$(_find_nvme_dev "${def_subsysnqn}")
>> +
>> +		if ! nvme reset "/dev/${ctrldev}" >> "$FULL" 2>&1; then
>> +			echo "ERROR: reset failed"
>> +		fi
>>  	fi
>> -
> 
> Nit: unnecessary change here.
Okay, will remove this.
> 
>>  	_nvme_disconnect_subsys
>>  	_nvmet_passthru_target_cleanup
>>  
>> diff --git a/tests/nvme/037 b/tests/nvme/037
>> index f7ddc2d..f0c8a77 100755
>> --- a/tests/nvme/037
>> +++ b/tests/nvme/037
>> @@ -27,7 +27,6 @@ test_device() {
>>  
>>  	local subsys="blktests-subsystem-"
>>  	local iterations=10
>> -	local ctrldev
> 
> Good catch. And we can add "local nsdev" here.
yes will do.
> 
>>  
>>  	for ((i = 0; i < iterations; i++)); do
>>  		_nvmet_passthru_target_setup --subsysnqn "${subsys}${i}"
>> @@ -37,6 +36,11 @@ test_device() {
>>  		_nvme_disconnect_subsys \
>>  			--subsysnqn "${subsys}${i}" >>"${FULL}" 2>&1
>>  		_nvmet_passthru_target_cleanup --subsysnqn "${subsys}${i}"
>> +
>> +		if [[ -z "$nsdev" ]]; then
>> +			echo "FAIL"
>> +			break
>> +		fi
>>  	done
>>  
>>  	echo "Test complete"
>> diff --git a/tests/nvme/rc b/tests/nvme/rc
>> index a877de3..3def0d0 100644
>> --- a/tests/nvme/rc
>> +++ b/tests/nvme/rc
>> @@ -394,6 +394,7 @@ _nvmet_passthru_target_setup() {
>>  
>>  _nvmet_passthru_target_connect() {
>>  	local subsysnqn="$def_subsysnqn"
>> +	local timeout="5"
>>  
>>  	while [[ $# -gt 0 ]]; do
>>  		case $1 in
>> @@ -414,9 +415,18 @@ _nvmet_passthru_target_connect() {
>>  	# The following tests can race with the creation
>>  	# of the device so ensure the block device exists
>>  	# before continuing
>> -	while [ ! -b "${nsdev}" ]; do sleep 1; done
>> +	start_time=$(date +%s)
>> +	while [ ! -b "${nsdev}" ]; do
>> +		sleep .1
> 
> Is there any reason to have 0.1 second wait duration? When I changed this to
> "sleep 1", runtime of the test cases did not change on my test system.
> 
yes you're right there won't be any noticeable change in the runtime of 
the test unless we run this test in a loop for multiple number of times.
I would say that the gain would be only a fraction of a second in this case.
So I will use sleep 1 instead of .1 as you suggested.

>> +		end_time=$(date +%s)
>> +		if ((end_time - start_time > timeout)); then
>> +			echo ""
> 
> This echo doesn't look required.
> 
>> +			return 1
>> +		fi
> 
> If 1 second wait is okay instead of 0.1 second wait, the if block above can be
> a bit simpler, like,
> 
>               if ((++count >= timeout)); then
>                       return 1
>               fi
> 
> where, "local count=0" should be declared before.
> 
yeah with sleep 1 we can make it simple. I will update this in next patch.

>> +	done
>>  
>>  	echo "${nsdev}"
>> +	return 0
> 
> This "return 0" looks redundant. The previous echo succeeds always, then 0 is
> returned anyway. Also, all of the callers check the failure of this function by
> referring to the nsdev string echoed. They do not refer to the return code. So,
> it is not so useful to explicitly show that 0 is returned on success.
> 
yes I will incorporate it in the next patch.

Thanks,
--Nilay



More information about the Linux-nvme mailing list