[bug report] blktests nvme/022 lead kernel WARNING and NULL pointer
Hannes Reinecke
hare at suse.de
Sat May 22 15:59:06 BST 2021
On 5/22/21 2:12 AM, Yi Zhang wrote:
> On Sat, May 22, 2021 at 2:19 AM Sagi Grimberg <sagi at grimberg.me> wrote:
>>
>>
>>>>> What about this?
>>>
>>> Hi Hannes
>>> With this patch, no WARNNING/NULL pointer this time, but still have
>>> 'keep-alive timer expired' and reset failure issue, here is the full
>>> log:
>>>
>>> # ./check nvme/022
>>> nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [failed]
>>> runtime 10.646s ... 11.087s
>>> --- tests/nvme/022.out 2021-05-20 20:16:31.384068807 -0400
>>> +++ /root/blktests/results/nodev/nvme/022.out.bad 2021-05-20
>>> 20:24:27.874250466 -0400
>>> @@ -1,4 +1,5 @@
>>> Running nvme/022
>>> 91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>> uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>> +ERROR: reset failed
>>> Test complete
>>> # cat results/nodev/nvme/022.full
>>> Reset: Network dropped connection on reset
>>> NQN:blktests-subsystem-1 disconnected 1 controller(s)
>>>
>>> [37353.068448] run blktests nvme/022 at 2021-05-20 20:24:16
>>> [37353.146301] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [37353.161765] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [37353.175796] nvme nvme0: creating 128 I/O queues.
>>> [37353.189734] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [37354.216686] nvme nvme0: resetting controller
>>> [37363.270607] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [37363.276521] nvmet: ctrl 1 fatal error occurred!
>>> [37363.281058] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>>
>>> # ./check nvme/021
>>> nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed]
>>> runtime 10.958s ... 11.382s
>>> # dmesg
>>> [38142.862881] run blktests nvme/021 at 2021-05-20 20:37:26
>>> [38142.941038] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [38142.956621] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [38142.970524] nvme nvme0: creating 128 I/O queues.
>>> [38142.984356] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [38144.014601] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>> [38153.030107] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [38153.036018] nvmet: ctrl 1 fatal error occurred!
>>
>> I think that the main reason is that there are 128 queues that are being
>> created, and during that time the keep alive timer ends up expiring as
>> it is shorter (used to be 15 seconds, now 5 by default).
>>
>> nvmet only stops the keep-alive timer when the controller is freed,
>> which is pretty late in the sequence.. The problem is that it needs to
>> be this way because if we shut it down sooner a host can die in the
>> middle of a teardown sequence and we still need to detect that and
>> cleanup ourselves. But maybe we can mod the keep-alive timer for
>> every queue we delete, just in the case the host is not deleting
>> fast enough?
>>
>> Ming, does this solve the issue you are seeing?
>
> Hi Sagi
> The issue was fixed by this patch. :)
>
>> --
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 1853db38b682..f0715e9a4a9c 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -804,6 +804,7 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>> percpu_ref_exit(&sq->ref);
>>
>> if (ctrl) {
>> + ctrl->cmd_seen = true;
>> nvmet_ctrl_put(ctrl);
>> sq->ctrl = NULL; /* allows reusing the queue later */
>> }
>> --
>>
>> We probably need to rename cmd_seen to extend_tbkas (extend traffic
>> based keep-alive).
>>
>
>
Thanks for the confirmation.
I'll send a proper patchset.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list