[bug report][regression] blktests nvme/029 failed on latest linux-block/for-next

Thu Nov 21 22:29:04 PST 2024


On 11/22/24 11:08, Yi Zhang wrote:
> On Thu, Nov 21, 2024 at 7:10 PM Nilay Shroff <nilay at linux.ibm.com> wrote:
>>
>>
>>
>> On 11/21/24 08:28, Yi Zhang wrote:
>>> On Wed, Nov 20, 2024 at 10:07 PM Nilay Shroff <nilay at linux.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/19/24 16:34, Yi Zhang wrote:
>>>>> Hello
>>>>>
>>>>> CKI recently reported the blktests nvme/029 failed[1] on the
>>>>> linux-block/for-next, and bisect shows it was introduced from [2],
>>>>> please help check it and let me know if you need any info/test for it, thanks.
>>>>>
>>>>> [1]
>>>>> nvme/029 (tr=loop) (test userspace IO via nvme-cli read/write
>>>>> interface) [failed]
>>>>>     runtime    ...  1.568s
>>>>>     --- tests/nvme/029.out 2024-11-19 08:13:41.379272231 +0000
>>>>>     +++ /root/blktests/results/nodev_tr_loop/nvme/029.out.bad
>>>>> 2024-11-19 10:55:13.615939542 +0000
>>>>>     @@ -1,2 +1,8 @@
>>>>>      Running nvme/029
>>>>>     +FAIL
>>>>>     +FAIL
>>>>>     +FAIL
>>>>>     +FAIL
>>>>>     +FAIL
>>>>>     +FAIL
>>>>>     ...
>>>>>     (Run 'diff -u tests/nvme/029.out
>>>>> /root/blktests/results/nodev_tr_loop/nvme/029.out.bad' to see the
>>>>> entire diff)
>>>>> [2]
>>>>> 64a51080eaba (HEAD) nvmet: implement id ns for nvm command set
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>>   Yi Zhang
>>>>>
>>>>>
>>>> I couldn't reproduce it even after running nvme/029 in a loop
>>>> for multiple times. Are you following any specific steps to
>>>> recreate it?
>>>
>>> From the reproduced data[1], seems it only reproduced on x86_64 and
>>> aarch64, and from the 029.full[2], we can see the failure comes from
>>> the "nvme write" cmd.
>>> [1]
>>> https://datawarehouse.cki-project.org/issue/3263
>>> [2]
>>> # cat results/nodev_tr_loop/nvme/029.full
>>> Reference tag larger than allowed by PIF
>>> NQN:blktests-subsystem-1 disconnected 1 controller(s)
>>> disconnected 1 controller(s)
>>>
>>> I also attached the kernel config file in case you want to try it, thanks.
>>>
>> Thanks for the additional information!
>> Now I could understand the issue and have a probable fix. If possible, can you try
>> the below patch and check if it help resolve this issue?
> 
> Yes, the issue was fixed now.
> 
Thank you for trying out the patch! I will send out the formal patch later today with the fix.
>>
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index 934b401fbc2f..7a8256ae3085 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -901,12 +901,14 @@ static void nvmet_execute_identify_ctrl_nvm(struct nvmet_req *req)
>>  static void nvme_execute_identify_ns_nvm(struct nvmet_req *req)
>>  {
>>         u16 status;
>> +       void *zero_buf;
>>
>>         status = nvmet_req_find_ns(req);
>>         if (status)
>>                 goto out;
>>
>> -       status = nvmet_copy_to_sgl(req, 0, ZERO_PAGE(0),
>> +       zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT);
>> +       status = nvmet_copy_to_sgl(req, 0, zero_buf,
>>                                    NVME_IDENTIFY_DATA_SIZE);
>>  out:
>>         nvmet_req_complete(req, status);
>>
>> Thanks,
>> --Nilay