Request timeout seen with NVMEoF TCP

Potnuri Bharat Teja bharat at chelsio.com
Fri Dec 11 02:26:27 EST 2020


On Friday, December 12/11/20, 2020 at 03:17:11 +0530, Sagi Grimberg wrote:
> >>> Hi All,
> >>> I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5
> >>> kernel.
> >>> I see the same behavior with nvme tree too (branch:nvme-5.11)
> >>> I last ran this with 5.8, where it was running fine.
> >>>
> >>> Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
> >>> discover, connect and run fio or Iozone. Traces are seen within couple of minutes
> >>> after starting the test.
> >>
> >> Hey Potnuri,
> >>
> >> Can you also attach the target side logs? it seems like an I/O times out
> >> for no apparent reason..
> > 
> > I see nothing much logged on target.
> 
> Hey Potnuri,
> 
> This issue is consistent with what Baharat reported.
> 
> Can you please check if the below solves the issue?
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..9193b05d7bda 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1122,6 +1122,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
>                                  pending = true;
>                          else if (unlikely(result < 0))
>                                  break;
> +               } else {
> +                       /*
> +                        * submission path is sending, we need to
> +                        * continue or resched because the submission
> +                        * path direct send is not concerned with
> +                        * rescheduling...
> +                        */
> +                       pending = true;
>                  }
> 
>                  result = nvme_tcp_try_recv(queue);
> --

Hi Sagi,
With above patch I still see the issue but less frequently. Without patch I was 
able to consistently reproduce the timouts with 4 target devices. With patch I 
see IO running fine for 4 targets. Tried the same test with 8 target devices 
and I see the below timeout. I've observed only one instance of timeout. So, 
I'll let it run for somemore time or rerun and update.

Target dmesg:
---
[ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
^^^^^^^^^^^^^^^
---

Initiator dmesg:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
---


Thanks.



More information about the Linux-nvme mailing list