[PATCH v2 0/5] avoid race for time out

Chao Leng lengchao at huawei.com
Thu Oct 29 02:13:20 EDT 2020



On 2020/10/28 19:36, Yi Zhang wrote:
> Hello
> 
> This series fixed the WARNING issue I reported [1], but now the nvme/012 [2] will be hang there and never finished, here is the log[3]
This is another bug. In two scenarios may cause request hang:
1. If work with nvme native multipath, all path is not availabble.
request will hang until all controller are deleted.
2. if work without multipath, controller is reconnectting.
request will hang until the controller are deleted.
This patch may fix the request hang.
https://lore.kernel.org/linux-nvme/319b8b1869f34a48b26fbd902883ed71@kioxia.com/
This patch has been discussed for too long time.

> [1]
> https://lore.kernel.org/linux-nvme/1934331639.3314730.1602152202454.JavaMail.zimbra@redhat.com/
> 
> [2]
> [root at hpe-xw9400-02 blktests]# nvme_trtype=tcp ./check nvme/012
> nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns)
>      runtime  1199.651s  ...
> 
> [3]
> [  120.550409] run blktests nvme/012 at 2020-10-28 06:50:11
> [  121.138234] loop: module loaded
> [  121.170869] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  121.215930] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  121.288229] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:ffe2b140e76a45649005853f3b871859.
> [  121.302597] nvme nvme0: creating 12 I/O queues.
> [  121.308361] nvme nvme0: mapped 12/0/0 default/read/poll queues.
> [  121.320030] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420
> [  123.278903] XFS (nvme0n1): Mounting V5 Filesystem
> [  123.291608] XFS (nvme0n1): Ending clean mount
> [  123.297321] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038 (0x7fffffff)
> [  183.872118] nvme nvme0: queue 1: timeout request 0x6c type 4
> [  183.877792] nvme nvme0: starting error recovery
> [  183.882376] nvme nvme0: queue 8: timeout request 0x11 type 4
> [  183.888149] nvme nvme0: queue 8: timeout request 0x12 type 4
> [  183.893805] nvme nvme0: queue 8: timeout request 0x13 type 4
> [  183.899469] nvme nvme0: queue 8: timeout request 0x14 type 4
> [  183.905130] nvme nvme0: queue 8: timeout request 0x15 type 4
> [  183.910792] nvme nvme0: queue 8: timeout request 0x16 type 4
> [  183.916453] nvme nvme0: queue 8: timeout request 0x17 type 4
> [  183.922114] nvme nvme0: queue 8: timeout request 0x18 type 4
> [  183.927777] nvme nvme0: queue 8: timeout request 0x19 type 4
> [  183.933450] nvme nvme0: queue 8: timeout request 0x1a type 4
> [  183.939110] nvme nvme0: queue 8: timeout request 0x1b type 4
> [  183.944771] nvme nvme0: queue 8: timeout request 0x1c type 4
> [  183.950431] nvme nvme0: queue 8: timeout request 0x1d type 4
> [  183.956095] nvme nvme0: queue 8: timeout request 0x1e type 4
> [  183.961755] nvme nvme0: queue 8: timeout request 0x1f type 4
> [  183.967414] nvme nvme0: queue 8: timeout request 0x20 type 4
> [  183.973218] block nvme0n1: no usable path - requeuing I/O
> [  183.978623] block nvme0n1: no usable path - requeuing I/O
> [  183.982492] nvme nvme0: Reconnecting in 10 seconds...
> [  183.984022] block nvme0n1: no usable path - requeuing I/O
> [  183.994476] block nvme0n1: no usable path - requeuing I/O
> [  183.999870] block nvme0n1: no usable path - requeuing I/O
> [  184.005264] block nvme0n1: no usable path - requeuing I/O
> [  184.010669] block nvme0n1: no usable path - requeuing I/O
> [  184.016080] block nvme0n1: no usable path - requeuing I/O
> [  184.021463] block nvme0n1: no usable path - requeuing I/O
> [  184.026858] block nvme0n1: no usable path - requeuing I/O
> [  209.472647] nvmet: ctrl 2 keep-alive timer (15 seconds) expired!
> [  209.478662] nvmet: ctrl 2 fatal error occurred!
> [  213.568765] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [  213.574782] nvmet: ctrl 1 fatal error occurred!
> [  238.064572] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:ffe2b140e76a45649005853f3b871859.
> [  256.577658] nvme nvme0: queue 0: timeout request 0x0 type 4
> [  256.583333] nvme nvme0: Connect command failed, error wo/DNR bit: 881
> [  256.589806] nvme nvme0: failed to connect queue: 0 ret=881
> [  256.595326] nvme nvme0: Failed reconnect attempt 1
> [  256.600119] nvme nvme0: Reconnecting in 10 seconds...
> [  266.818455] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:ffe2b140e76a45649005853f3b871859.
> [  266.832356] nvme_ns_head_submit_bio: 30 callbacks suppressed
> [  266.832362] block nvme0n1: no usable path - requeuing I/O
> [  266.843443] block nvme0n1: no usable path - requeuing I/O
> [  266.848848] block nvme0n1: no usable path - requeuing I/O
> [  266.854244] block nvme0n1: no usable path - requeuing I/O
> [  266.859663] block nvme0n1: no usable path - requeuing I/O
> [  266.865059] block nvme0n1: no usable path - requeuing I/O
> [  266.870454] block nvme0n1: no usable path - requeuing I/O
> [  266.875845] block nvme0n1: no usable path - requeuing I/O
> [  266.881234] block nvme0n1: no usable path - requeuing I/O
> [  266.886632] block nvme0n1: no usable path - requeuing I/O
> [  266.892237] nvme nvme0: creating 12 I/O queues.
> [  266.903744] nvme nvme0: mapped 12/0/0 default/read/poll queues.
> [  266.911929] nvme nvme0: Successfully reconnected (2 attempt)
> [  327.747177] nvme nvme0: queue 2: timeout request 0x1e type 4
> [  327.752883] nvme nvme0: starting error recovery
> [  327.757450] nvme nvme0: queue 4: timeout request 0x63 type 4
> [  327.763511] nvme_ns_head_submit_bio: 14 callbacks suppressed
> [  327.763520] block nvme0n1: no usable path - requeuing I/O
> [  327.774614] block nvme0n1: no usable path - requeuing I/O
> [  327.780053] block nvme0n1: no usable path - requeuing I/O
> [  327.785450] block nvme0n1: no usable path - requeuing I/O
> [  327.790876] block nvme0n1: no usable path - requeuing I/O
> [  327.796316] block nvme0n1: no usable path - requeuing I/O
> [  327.801727] block nvme0n1: no usable path - requeuing I/O
> [  327.807231] block nvme0n1: no usable path - requeuing I/O
> [  327.812627] block nvme0n1: no usable path - requeuing I/O
> [  327.818025] block nvme0n1: no usable path - requeuing I/O
> [  353.859745] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [  353.865761] nvmet: ctrl 1 fatal error occurred!
> 
> 
> On 10/22/20 10:14 AM, Chao Leng wrote:
>> First avoid race between time out and tear down for rdma and tcp.
>> Second avoid repeated request completion in time out for rdma and tcp.
>>
>> V2:
>>     - add avoiding repeated request completion in time out
>>
>> Chao Leng (3):
>>    nvme-core: introduce sync io queues
>>    nvme-rdma: avoid race between time out and tear down
>>    nvme-tcp: avoid race between time out and tear down
>>
>> Sagi Grimberg (2):
>>    nvme-rdma: avoid repeated request completion
>>    nvme-tcp: avoid repeated request completion
>>
>>   drivers/nvme/host/core.c |  8 ++++++--
>>   drivers/nvme/host/nvme.h |  1 +
>>   drivers/nvme/host/rdma.c | 14 +++-----------
>>   drivers/nvme/host/tcp.c  | 16 ++++------------
>>   4 files changed, 14 insertions(+), 25 deletions(-)
>>
> 
> .



More information about the Linux-nvme mailing list