nvme-tcp: i/o errors and stalled host from failure to send command pdus

Thu Sep 1 06:22:31 PDT 2022

On Thu, Sep 1, 2022 at 8:37 AM Jonathan Nicklin
<jnicklin at blockbridge.com> wrote:
>
> On Wed, Aug 31, 2022 at 7:42 PM Sagi Grimberg <sagi at grimberg.me> wrote:
> >
> >
> > >
> > > Here's a trace including state from nvme_tcp_try_send_cmd_pdu(), like so.
> > >
> > > req->dbg.send = 1;
> > > ret = kernel_sendpage(queue->sock, virt_to_page(pdu), offset, len, flags);
> > > req->dbg.sendpage = ret;
> > >
> > >
> > > nvme-trace.out
> > >
> > > tar-1785    [005] .....   232.768892:nvme_tcp_queue_rq:
> > >   nvme1: qid=6 tag=18 op=0 data_len=20480
> > > tar-1785    [005] .....   232.768894: nvme_tcp_queue_rq:
> > >   nvme1: qid=6 tag=17 op=0 data_len=32768
> > > tar-1785    [005] .N...   232.768895: nvme_tcp_queue_rq:
> > >   nvme1: qid=6 tag=14 op=0 data_len=32768
> > > kworker/5:1H-475     [005] .....   232.768923: nvme_tcp_queue_rq:
> > >   nvme1: qid=6 tag=13 op=0 data_len=32768
> > > kworker/5:1H-475     [005] .....   232.768924: nvme_tcp_queue_rq:
> > >   nvme1: qid=6 tag=12 op=0 data_len=12288
> > > tar-1785    [007] .....   232.769141: nvme_tcp_queue_rq:
> > >   nvme1: qid=8 tag=69 op=0 data_len=20480
> > >
> > >
> > > dmesg.out (reordered to match nvme-trace.out)
> > >
> > > [  262.889536] nvme nvme1: state: tag 0x18 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> > >   empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> >
> > Something is not adding up...
> > lock=1, means that we get into nvme_tcp_send_all(), but somehow we are
> > not able to send it to the wire?
> >
> > I'd look into this and add indication that it is pulled from the
> > req_list, and in turn from the send_list. It looks like it is somehow
> > picked out and silteny dropped, because as long as it is on the list(s)
> > another context should have picked it up at some point...
> >
> > > [  262.889337] nvme nvme1: state: tag 0x17 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> > >   empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> > > [  262.889110] nvme nvme1: state: tag 0x14 io_cpu 5 smp_id 5 smp_id2 7 sync 1
> > >   empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> > > [  262.888864] nvme nvme1: state: tag 0x13 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> > >   empty 0 lock 0 last 0 more 1 send 1 sendpage 72
> > > [  262.888727] nvme nvme1: state: tag 0x12 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> > >   empty 0 lock 0 last 1 more 1 send 1 sendpage 72
>
> Ugh. It was misleading because the send state tracking was
> re-initialized after the direct send case. Here's a better trace with
> more information. It appears as though sendpage() is always called
> with the NOTLAST flag set. I suppose this could mean that there is
> some confusion or race with nvme_tcp_queue_more()?
>
> tar-1872          [004] ..... 389.559784: nvme_tcp_queue_rq:
>  nvme1: qid=5 tag=28 op=0 data_len=16384
> tar-1872          [004] .N... 389.559786: nvme_tcp_queue_rq:
>  nvme1: qid=5 tag=27 op=0 data_len=16384
> kworker/4:2H-1873 [004] ..... 389.559798: nvme_tcp_queue_rq:
>  nvme1: qid=5 tag=26 op=0 data_len=32768
> kworker/4:2H-1873 [004] ..... 389.559798: nvme_tcp_queue_rq:
>  nvme1: qid=5 tag=25 op=0 data_len=32768
> kworker/4:2H-1873 [004] ..... 389.559799: nvme_tcp_queue_rq:
>  nvme1: qid=5 tag=24 op=0 data_len=32768
>
>
> [  419.790505] nvme nvme1: state: tag 0x28 io_cpu 4 smp_id 4 smp_id2 4 sync 1
>  empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
> [  419.790299] nvme nvme1: state: tag 0x27 io_cpu 4 smp_id 4 smp_id2 0 sync 1
>  empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
> [  419.789809] nvme nvme1: state: tag 0x25 io_cpu 4 smp_id 4 smp_id2 4 sync 1
>  empty 0 lock 0 last 0 more 1 send 1 send_ret 72 send_flags 0x28040
> [  419.790065] nvme nvme1: state: tag 0x26 io_cpu 4 smp_id 4 smp_id2 4 sync 1
>  empty 0 lock 0 last 0 more 1 send 1 send_ret 72 send_flags 0x28040
> [  419.789663] nvme nvme1: state: tag 0x24 io_cpu 4 smp_id 4 smp_id2 4 sync 1
>  empty 0 lock 0 last 1 more 1 send 1 send_ret 72 send_flags 0x28040

Additional tracing suggests it could be a race or issue with
queue->more_requests.

[  675.762509] nvme nvme1: state: tag 0x5f io_cpu 1 smp_id 1 smp_id2 1 sync 1
  empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
  q_send_list 0 q_req_list 0 q_more_requests 1
[  675.720866] nvme nvme1: state: tag 0x5e io_cpu 1 smp_id 1 smp_id2 5 sync 1
  empty 1 lock 1 last 0 more 0 send 1 send_ret 72 send_flags 0x28040
  q_send_list 0 q_req_list 0 q_more_requests 1
[  675.679256] nvme nvme1: state: tag 0x5d io_cpu 1 smp_id 1 smp_id2 1 sync 1
  empty 0 lock 0 last 0 more 1 send 1 send_ret 72 send_flags 0x28040
  q_send_list 1 q_req_list 0 q_more_requests 1
[  675.621054] nvme nvme1: state: tag 0x5c io_cpu 1 smp_id 1 smp_id2 1 sync 1
  empty 0 lock 0 last 0 more 1 send 1 send_ret 72 send_flags 0x28040
 q_send_list 1 q_req_list 0 q_more_requests 1
[  675.593414] nvme nvme1: state: tag 0x5a io_cpu 1 smp_id 1 smp_id2 1 sync 1
  empty 0 lock 0 last 1 more 1 send 1 send_ret 72 send_flags 0x28040
  q_send_list 0 q_req_list 0 q_more_requests 1