nvme-tcp request timeouts

Wed Oct 12 09:55:18 PDT 2022

On Wed, Oct 12, 2022 at 09:33:19AM +0300, Sagi Grimberg wrote:
> Hey Seth, thanks for reporting.
> 
> > > > > Hi Seth,
> > > > > 
> > > > > On 10/11/22 08:31, Seth Forshee wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I'm seeing timeouts like the following from nvme-tcp:
> > > > > > 
> > > > > > [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> > > > > > [ 6369.513283] nvme nvme5: starting error recovery
> > > > > > [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> > > > > > [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> > > > > > [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> > > > > > [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> > > > > > 
> > > > > > This is with 6.0, using nvmet-tcp on a different machine as the target.
> > > > > > I've seen this sporadically with several test cases. The fio fio-rand-RW
> > > > > > example test is a pretty good reproducer when numjobs in increased (I'm
> > > > > > setting it equal to the number of CPUs in the system).
> > > > > > 
> > > > > > Let me know what I can do to help debug this. I'm currently adding some
> > > > > > tracing to the driver to see if I can get an idea of the sequence of
> > > > > > events that leads to this problem.
> > > > > > 
> > > > > > Thanks,
> > > > > > Seth
> > > > > > 
> > > > > 
> > > > > Can you bisect it ? that will help to understand the commit causing
> > > > > issue.
> > > > 
> > > > I don't know of any "good" version right now. I started with a 5.10
> > > > kernel and saw this, and tested 6.0 and still see it. I found several
> > > > commits since 5.10 which fix some kind of timeouts:
> > > > 
> > > > a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
> > > > 70f437fb4395 nvme-tcp: fix io_work priority inversion
> > > > 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
> > > > 
> > > > 5.10 still has timeouts with these backported, so whatever the problem
> > > > is it has existed at least that long. I suppose I could go back to older
> > > > kernels with these backported if that's going to be the best path
> > > > forward here.
> > > > 
> > > > Thanks,
> > > > Seth
> > > 
> > > Can you please share the fio config you are using ?
> > 
> > Sure. Note that I can reproduce it with a lower number of numjobs, but
> > higher numbers make it easier, so I set it to the number of CPUs present
> > on the system I'm using to test.
> > 
> > 
> > [global]
> > name=fio-rand-RW
> > filename=fio-rand-RW
> > rw=randrw
> > rwmixread=60
> > rwmixwrite=40
> > bs=4K
> > direct=0
> 
> Does this happen with direct=1?

So far I haven't been able to reproduce it so far with direct=1.

> > numjobs=128
> > time_based
> > runtime=900
> > 
> > [file1]
> > size=10G
> > ioengine=libaio
> > iodepth=16
> > 
> 
> Is it possible that the backend nvmet device is not fast enough?
> Are your backend devices nvme? Can you share nvmetcli ls output?

Currently I have a quick-and-dirty setup. The systems I have to test
with don't have a spare nvme device right now but do have a lot of RAM,
so the backend is a loop device backed by a file in tmpfs. But it should
be plenty fast. I'm working on getting a setup with an nvme backend
device.

o- / ......................................................................................................................... [...]
  o- hosts ................................................................................................................... [...]
  | o- hostnqn ............................................................................................................... [...]
  o- ports ................................................................................................................... [...]
  | o- 2 ................................................... [trtype=tcp, traddr=..., trsvcid=4420, inline_data_size=16384]
  |   o- ana_groups .......................................................................................................... [...]
  |   | o- 1 ..................................................................................................... [state=optimized]
  |   o- referrals ........................................................................................................... [...]
  |   o- subsystems .......................................................................................................... [...]
  |     o- testnqn ........................................................................................................... [...]
  o- subsystems .............................................................................................................. [...]
    o- testnqn ............................................................. [version=1.3, allow_any=1, serial=2c2e39e2a551f7febf33]
      o- allowed_hosts ....................................................................................................... [...]
      o- namespaces .......................................................................................................... [...]
        o- 1  [path=/dev/loop0, uuid=8a1561fb-82c3-4e9d-96b9-11c7b590d047, nguid=ef90689c-6c46-d44c-89c1-4067801309a8, grpid=1, enabled]

> Also, just to understand if things are slow or stuck, perhaps you
> can artificially increase the io_timeout to say 60s or 120s.

I could still reproduce it with io_timeout set to 120s.

Thanks,
Seth